# DeRR for 10X scTCR-Seq

In tutorial, we will use the SRX16249628 in the PRJNA858872 dataset as an example to introduce the usage of DeRR.

Assuming that we have already performed data analysis using CellRanger, the output results are located in ../2.Result/PRJNA858872/SRX16249628.

The first step is to use the SplitVDJbam.py script to obtain the fastq files for each cell based on the output results of CellRanger, and making a Mainfest.tsv file as input for DeRR

```Shell
python SplitVDJbam.py ../2.Result/PRJNA858872/SRX16249628/all_contig.bam \
    --list ../2.Result/PRJNA858872/SRX16249628/cell_barcodes.json \
    --out ../2.Result/PRJNA858872/SRX16249628/fastq_split \
    --file Mainfest.tsv
```

Once we have obtained the Mainfest.tsv file, we can use DeRR for analysis. 

> Before running it, we need to ensure that `bwa` and `samtools` are accessible in the current environment. If they are not accessible (i.e., cannot be accessed using 'bwa' and 'samtools'), we need to modify the corresponding items in the `config.json` file.

```Shell
 python DeRR.py --inf Mainfest.tsv --out SRX16249628_TCRprofiling.tsv --threads 8
```

In this run, we used 8 threads and default parameters. The adjustable parameters of DeRR are shown in the example table below. These parameters typically accommodate most situations, but in specific cases (such as poor data quality), users can improve result accuracy (but will reduce the number of cell expressing dual-TCR) by increasing T, reducing M, and decreasing N. 

| Parameter name | Explanation                                                                                         | Default value |
|----------------|-----------------------------------------------------------------------------------------------------|---------------|
| k              | k-mer length in Network Flow                                                                         | 25            |
| c              | Extended length in Network Flow                                                                      | 5             |
| T              | Threshold for filtering out low-frequency TCRs                                                     | 90%           |
| M              | Threshold for filtering out erroneous TCRs based on quantity. TCRs with frequency below 1/M will be removed | 100           |
| N              | Threshold for filtering out erroneous TCRs based on sequence similarity. TCRs with edit distance less than L/N will be removed (L is sequence length) |      6        |

The output of DeRR is a TSV file, which can be easily read and analyzed.

In [3]:
import pandas as pd
result = pd.read_csv("SRX16249628_TCRprofiling", sep = '\t')

In [8]:
result.head()

Unnamed: 0,v_call,j_call,junction_aa,junction,duplicate_count,locus,cell_id,productive,sequence_id,sequence,rev_comp,d_call,sequence_alignment,germline_alignment,v_cigar,d_cigar,j_cigar
0,TRAV9-2*02,TRAJ54*01,CALSGEIQGAQKLVF,TGTGCTCTGAGTGGAGAAATTCAGGGAGCCCAGAAGCTGGTATTT,168,TRA,AAACCTGAGCGGCTTC-1,True,,,,,,,,,
1,TRBV10-3*02,TRBJ2-3*01,CAIRATDFSTDTQYF,TGTGCCATCAGAGCGACAGACTTTAGCACAGATACGCAGTATTTT,129,TRB,AAACCTGAGCGGCTTC-1,True,,,,,,,,,
2,TRBV10-3*02,TRBJ2-3*01,CAIADLTDTQYF,TGTGCCATCGCGGACCTGACAGATACGCAGTATTTT,97,TRB,AAACCTGAGCGGCTTC-1,True,,,,,,,,,
3,TRAV38-2/DV8*01,TRAJ43*01,CAYRSADNDMRF,TGTGCTTATAGGAGCGCGGACAATGACATGCGCTTT,450,TRA,AAACCTGAGCTCCTCT-1,True,,,,,,,,,
4,TRAV9-2*01,TRAJ57*01,CALSDPPRGGGSEKLVF,TGTGCTCTGAGTGACCCTCCCCGAGGGGGCGGATCTGAAAAGCTGG...,13,TRA,AAACCTGAGCTCCTCT-1,True,,,,,,,,,


Then, we can obtain the TCR expression profile for each cell using the cell_id, and filter out the cells expressing dual-TCRs from it.

It is worth noting that when DeRR calculates the TCR status for each cell, the TRA and TRB chains are analyzed separately. Therefore, for a single chain of a cell, there may be three possible situations (single, dual, multi). Here, we take the example of cells with dual TRA chains.

In [9]:
dual_TRA = []
for barcode, cell_tcr in result[ result.locus =='TRA' ].groupby('cell_id'):
    if cell_tcr.shape[0] == 2:
        dual_TRA.append(barcode)
        
print( len(dual_TRA) )

621
