# 02. Use cellranger to analyze 10X scRNA-seq data

* Assume you have already performed demultiplex using <code>cellranger mkfastq</code> or <code>cellranger-arc mkfastq</code> and obtain <code>fastq</code> files of your libary. <br />
* <code>cellranger count</code> will automatically performed alignment, features count, and quality control of your <code>fastq</code> reads. <br />
* The output would be a folder containing mapping bam, count matrix and report html that you can use for different kind of analysis <br />

* In this example, the current directory we are in is 
<code>/projects/ps-renlab2/y2xie/projects/77.LC/43.scHiC_species_mixing</code>.
The fastq files are stored in the folder <code>01.rawdata</code>:
![](images/fastq_dir.png)

### To analyze library like <code>LC552</code>, we will run <code>cellranger count</code>:

In [None]:
/projects/ps-renlab/y2xie/packages/cellranger-6.1.2/cellranger count --id=LC552 \
--project=AAAWY3CHV --transcriptome=/projects/ps-renlab/y2xie/projects/genome_ref/GRCh38_and_mm10 \
--fastqs=01.rawdata/AAAWY3CHV --sample=LC552 --include-introns --chemistry=ARC-v1

#### In the command above, 
<ol>
      <li>We specify the <code>cellranger</code> packages location in case you didnt install this package. It will tell the command where to find <code>cellranger</code></li>
      <li><code>--id</code> is your output folder name. I use the library name for convenience.</li>
      <li><code>--project</code> is name of the project folder from which to pick <code>fastq</code>. By default, <code>mkfastq</code> will use part of the <code>bcl2</code> folder name to be project name. Since our <code>bcl2</code> is <code>230316_VH00454_139_AAAWY3CHV</code>, the project name is set to be <code>AAAWY3CHV</code>. **this argument is not necessary**</li>
      <li><code>--transcriptome</code> is the reference to be mapped to. Since this is a Hela-mESC mixed library, we use the mixed genome reference.</li>
      <li><code>--fastqs</code> is the path to your <code>fastq</code>.</li>
      <li><code>--sample</code> is your library name. This should be the same as the name you used for demultiplexing.</li>
     <li><code>--include-introns</code> is set becasue we are using single nucleu instead of whole cell. Many transcripts are unspliced and have a high ratio of introns.</li>
     <li><code>--chemistry=ARC-v1</code> is set because we are using 10X Multiome kit and need to tell cellranger we are not using the Gene expression kit.</li>
</ol>

### You can also create a script named <code>cellranger_count.sh</code>to submit the job to tscc:
<ol>
      <li>First on command line, type in <code>vi cellranger_count.sh</code></li>
      <li>Then in a prompted windowns, typi <code>i</code> to indicate you are doing editting</li>
      <li>Type in or copy paste:</li>

In [None]:
#!/bin/bash
#PBS -q hotel
#PBS -N cellranger_RNA
#PBS -l nodes=1:ppn=8
#PBS -l walltime=24:00:00

cd /projects/ps-renlab2/y2xie/projects/77.LC/43.scHiC_species_mixing
/projects/ps-renlab/y2xie/packages/cellranger-6.1.2/cellranger count --id=LC552 --project=AAAWY3CHV \
--transcriptome=/projects/ps-renlab/y2xie/projects/genome_ref/GRCh38_and_mm10 \
--fastqs=01.rawdata/AAAWY3CHV --sample=LC552 --include-introns --chemistry=ARC-v1

<ol>
    <li> Then press Esc, to indicate you are finishing editting.</li>
    <li>Finally type <code>:wq!</code>, to indicate you are saving the editting. Press Enter to exit.</li>
</ol>

#### In the script above, there is something different from run it interactively without submitting a script:
<ol>
    <li>We first <code>cd</code> into the working directory, becasue <code>cellranger count</code> will output results default to your current directory. When submitting script, your "current" directory becomes your <code>home</code> directory.</li>
</ol>

### Submit the script using: <code>qsub cellranger_count.sh</code>
* Once the job is done, you should be able to find the output directory in our working directory, here is <code>/projects/ps-renlab2/y2xie/projects/77.LC/43.scHiC_species_mixing</code>. The output directory should have a structure like this:
![LC552](images/cellranger-count_output.png)

* What we care about is the <code>outs</code> directory. Check it:
![LC552/outs](images/cellranger-count_output2.png)

#### In the output above, 
<ol>
      <li><code>web_summary.html</code> contains the library QC results. We should check this to make sure everything looks good first!</li>
      <li><code>filtered_feature_bc_matrix</code> and <code>filtered_feature_bc_matrix.h5</code>
are the cellranger filtered cell-by-gene matrix, different format. The cells in these directory/files are those pass filter.</li>
      <li><code>filtered_feature_bc_matrix</code> and <code>filtered_feature_bc_matrix.h5</code>
are the raw cell-by-gene matrix, different format. All cells (no matter PF or not) are in these directory/files.</li>
    <li><code>possorted_genome_bam.bam</code> is the aligned bam files containing different kind of sequence information</li>
      </ol>

* Thats it! We will then used the cell-by-gene matrix to perform different kinds of analysis, which would not be covered here.