# Immune repertoire annotation

This tutorial covers some basic aspects of Immune Repertoire Sequencing (RepSeq) data analysis focused on T-cell receptor (TCR) repertoires:
Repertoire diversity analysis
- Segment usage analysis
- Repertoire overlap analysis
- Annotation of antigen-specific TCR sequences


Table filling rules:
* Column names should match those on previous slide
* Sample id should be one of s1::s16
* Two distinct donor IDs should be used, naming doesn’t matter
* Subset should be either CD4 or CD8
* Phenotype should be either memory or naive
* CMV status should be either CMV+ or CMVI Unknown/ambiguous fields should be left blank


## Prerequesties

- R packages
```R
install.packages(c("data.table","dplyr","reshape2","ggplot2","NMF","scales","forcats","parallel","stringr"))
```

- clone [repo](https://github.com/mikeraiko/repseq-annotation-tutorial.git)

- use tutorial.Rmd

## Filling out the table

First, we detect replicas by this plot. Pairs are clusterized together.

![dendro_heatmap](img/dendrohm.png)

Based on the diversity index we can assume memory or naive. Memory cells have higher TCR diveristy.

![diversity](img/diversity.png)

Using dendrogramm with heatmap of gene expression we can differentiate cd4 and cd8 cells. Some of these halves is cd4 and the other is cd8

![heatmap_genes](img/heatmap_genes.png)

Googling reported that cd8 cells are primary immune response cells during influenza, thus samples s3, s7, having the most TCR specific for influenza a might be the marker.

Also EBV specificity almost perfectly differentiates clades on the dendrogram. Though there is no obvious reason for this.

![specificity](img/antigens.png)

And so we fill the table as such

| sample | donor | subset | phenotype | CMVstatus |
|---|---|---|---|---|
| s1 | d1 |  | memory | cmv+ |
| s2 | d1 |  | memory | cmv+ |
| s3 | d1 |  | naive | cmv+ |
| s4 | d2 |  | naive | cmv- |
| s5 | d1 |  | memory | cmv+ |
| s6 | d2 |  | naive | cmv- |
| s7 | d1 |  | naive | cmv+ |
| s8 | d2 |  | memory | cmv- |
| s9 | d1 |  | memory | cmv+ |
| s10 | d2 |  | memory | cmv- |
| s11 | d1 |  | memory | cmv+ |
| s12 | d1 |  | memory | cmv+ |
| s13 | d1 |  | memory | cmv+ |
| s14 | d1 |  | memory | cmv+ |
| s15 | d2 |  | memory | cmv- |
| s16 | d2 |  | memory | cmv- |


The only field which doesn't fill in is s1 that doesn't have CMV on the Antigen specificity plot but we can assume ~~(π equals 5 so that further results would add up)~~ that it's just too small to be shown.