# Exploring Cell-Types in UNS data

The cell annotated as `UNS` by Cano-Gamez et al. were T cells that were neither stimulated by cytokines nor had an activated TCR. To facilitate the analysis, these cells were analyzed seperatly. 

**Loading the necessary libraries**

In [None]:
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# settings can be adapted individually
sc.settings.verbosity = 3            
sc.logging.print_header()             
sc.settings.set_figure_params(dpi = 100, format = 'png')

**Load preprocessed scRNA-seq data** 
<br>
*See notebook "Data preprocessing" for this analysis part*

In [None]:
canogamez = sc.read_h5ad("result_files/canogamez_preprocessing.h5ad") # change to your data path 

In [None]:
# create a path to store the preprocessed file
results_file = '/canogamez_UNS.h5ad' # change to your data path 

## Dimension Reduction 

### PCA

In [None]:
sc.tl.pca(canogamez, svd_solver = 'arpack')

**Separate the data in UNS and stimulated**

In [None]:
# select all UNS cells 
resting = canogamez.obs['cytokine.condition'] == 'UNS'

In [None]:
# create AnnData consisting only of UNS cells
canogamez_uns = canogamez[resting,:]

**Plot PCA**

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize = (20,4), gridspec_kw = {'wspace':1})
ax1_dict = sc.pl.pca(canogamez_uns, color = 'cell.type', ax = ax1, show = False, annotate_var_explained = True)
ax2_dict = sc.pl.pca(canogamez_uns, color = 'cytokine.condition', ax = ax2, 
                     show = False,annotate_var_explained = True)
ax3_dict = sc.pl.pca(canogamez_uns, color = 'donor.id', ax = ax3, show = False, annotate_var_explained = True)

### UMAP

In [None]:
sc.pp.neighbors(canogamez_uns, n_neighbors = 10, n_pcs = 40)
sc.tl.umap(canogamez_uns)

In [None]:
sc.pl.umap(canogamez_uns, color = ['cytokine.condition', 'cell.type'])

## Louvain Clustering

The Louvain algorthim was chosen for clustering as it was used by Cano-Gamez et al. 

**! Caution: Rerunning your code will change the cluster composition due to randomness of the algorthim !**

In [None]:
sc.tl.louvain(canogamez_uns, key_added = "louvain_1.0", random_state = 1)

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'cell.type']) 

### Explore the composition of the clusters

In [None]:
def count_pie(anndata, clustering, category):

    """generates a data frame with counts for a specific category within  
       the clusters and, plots values as pie chart"""

    # generate data frame with information for cluster
    clusters_df = anndata.obs[str(clustering)].to_frame()
    clusters_df[str(category)] = anndata.obs[str(category)]

    # generate empty dataframe for counted values
    number_clusters = len(np.unique(anndata.obs[str(clustering)]))
    row_names = list(np.unique(anndata.obs[str(clustering)]))
    row_names_long = ['cluster ' + name for name in row_names]
    col_names = list(anndata.obs[str(category)].cat.categories)
    df_cell_count = pd.DataFrame(0, columns=col_names,
    index=row_names_long)

    # fill dataframe with counts of the given categorie
    for i in range(0, number_clusters):
        cluster = clusters_df[str(clustering)] == str(i)
        cells_cluster = clusters_df[cluster]
        count_cells = cells_cluster.value_counts()
        for ic, vc in count_cells.items():
            df_cell_count.at['cluster ' + ic[0], ic[1]] = vc

    # plot as piechart
    from natsort import natsorted

    df_cell_count_T = df_cell_count.T
    df_cell_count_T
    df_cell_count_T.reindex(natsorted(df_cell_count_T.columns, ), axis=1)

    amount_plots = len(df_cell_count)
    amount_cols = 4
    amount_rows = int(np.ceil(amount_plots / amount_cols))
    fig, axes = plt.subplots(nrows=amount_rows, ncols=amount_cols,
                figsize=(15, 15))
    fig.tight_layout()

    for index, column in enumerate(df_cell_count_T):
        current_ax = axes[index // amount_cols, index % amount_cols]
        current_ax.set_title('{}'.format(column))
        current_data = df_cell_count_T[column]
        current_labels = list(current_data.index)
        current_data = list(current_data)
        current_ax.pie(current_data, labels=current_labels,
        autopct='%1.1f%%', startangle=90)
        current_ax.axis('equal')

    return df_cell_count, plt.show()

In [None]:
count_pie(canogamez_uns, 'louvain_1.0', 'cell.type')

Most clusters can be clearly assigned to one particluar cell type. Only cluster 5 seems to conist of a mixture of memory and naive cells. 

### Rank genes 

In [None]:
sc.tl.rank_genes_groups(canogamez_uns, groupby = 'louvain_1.0', method = 'wilcoxon', use_raw=True)

In [None]:
sc.tl.dendrogram(canogamez_uns, groupby = 'louvain_1.0')

In [None]:
sc.pl.rank_genes_groups_matrixplot(canogamez_uns, n_genes = 3,cmap = 'bwr',
                                   standard_scale = "var", values_to_plot = 'scores')

Mostly ribosomal proteins. No indication of cell types

## Cluster Annotation

**Annotate naive cells**

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'cell.type'], legend_loc = 'on data') 

- Cluster 0 and 2 seem to represent the naive cells 
    - similar gene expression 
    - about 80 % naive cells
- Cluster 7 mostly memory according annotation `cell.type` but similar expression as 0 and 2 -> assign to Tn    
- other clusters have to be assigned to nTreg, TCM, TEM and TEMRA as was done in the paper 

### Used marker genes

The marker genes from literature are based on the following website: <https://www.biocompare.com/Editorial-Articles/569888-A-Guide-to-T-Cell-Markers/>

|       T cell type /   differentiation state       | Marker genes  mentioned in Cano-Gamez et al. |  
|:-------------------------------------------------:|:--------------------------------------------:|
| central memory T cells  (Tcm)                     | PASK                                         | 
| effector memory T cells  (Tem)                    | IL7R, KLRB1, TNFSF13B                        |  
| terminally differentiated effector cells  (TEMRA) | CCL4, GZMA                                   |  
| natural T regulatory cells (nTreg)                | FOXP3, CTLA4                                 |  

| T cell type /  differentiation state  | Marker genes literature   |  
|---------------------------------------|---------------------------|
| T naive (Tn)                          | CCR7                      |  
| central memory T cells (Tcm)          | FAS, IL2RB, PRDM1         |   
| effector memory T cells (Tem)         | CXCR3, ITGAL, CCR5, TBX21 |   

### Map marker genes on clusters

TEMRA

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'GZMA', 'CCL4']) 

-> markers clearly expressed in cluster 5

nTreg

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'FOXP3', 'CTLA4'])

-> marker genes expressed in cluster 6

Tn

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0',  'CCR7'])

-> marker genes mostly expressed in cluster 0, 2 and 7

Tcm 

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'FAS', 'IL2RB', 'PRDM1', 'PASK'])

-> Tcm marker expression present in cluster 3 and 1

TEM literature

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'CXCR3', 'ITGAL', 'CCR5','TBX21'])

TEM paper

In [None]:
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'IL7R', 'KLRB1', 'TNFSF13B'])

-> Tem cells can be assinged to cluster 4

### Add annotation to AnnData 

Create annotation

In [None]:
# adjust to individually identified clusters 
cluster_annotation = {
     '0': 'TN',
     '1': 'TCM',
     '2': 'TN',
     '3': 'TCM',
     '4': 'TEM',
     '5': 'TEMRA',
     '6': 'nTreg',
     '7': 'TN'
}

Add annotation to data

In [None]:
canogamez_uns.obs['cell type'] = canogamez_uns.obs['louvain_1.0'].map(cluster_annotation).astype('category')

Plotting

In [None]:
sc.pl.umap(canogamez_uns, color = 'cell type', legend_loc = 'on data', title = 'Annotated UNS cells',
           frameon = False, legend_fontsize = 10)

Similar annotation as by Cano-Gamez et al., see: <https://cytokines.cellgeni.sanger.ac.uk/resting>

### Save data

In [None]:
canogamez_uns.write(results_file)