# Exploring cell-types in stimulated data

The stimulated cells part of the Cano-Gamez et al. data contains T cells that were activated by anti-Cd3/CD28 beads. Cytokines were added to force the development of subtypes. 

| Annotation ‘cytokine.condition’ | TCR activation / anti-CD3/anti-CD28 beads |                         Added Cytokines                        |
|:-------------------------------:|:-----------------------------------------:|:--------------------------------------------------------------:|
|               UNS               |                     No                    |                              None                              |
|               Th0               |                    Yes                    |                              None                              |
|               Th2               |                    Yes                    |                      IL4, anti-IFN-\gamma                      |
|               Th17              |                    Yes                    | IL-6, IL-23, IL-1\beta, TGF-\beta, anti-IL-4, anti-IFN- \gamma |
|              iTreg              |                    Yes                    |                         TGF-\beta, IL-2                        |

**Loading the necessary libraries**

In [None]:
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# settings can be adapted individually
sc.settings.verbosity = 3            
sc.logging.print_header()             
sc.settings.set_figure_params(dpi = 100, format = 'png')

**Load preprocessed scRNA-seq data** 
<br>
*See notebook "Data preprocessing" for this analysis part*

In [None]:
canogamez = sc.read_h5ad("result_files/canogamez_preprocessing.h5ad") # change to your data path 

In [None]:
# create a path to store the preprocessed file
results_file = '/canogamez_stim.h5ad' # change to your data path 

**Separate the data in stimulated and UNS cells**

In [None]:
# select all UNS cells 
resting = canogamez.obs['cytokine.condition'] == 'UNS'
# invert to get stimulated cells 
stimulated = np.invert(resting)

In [None]:
# create AnnData consisting only of stimulated cells
canogamez_act = canogamez[stimulated,:]

## Dimension Reduction

### PCA

In [None]:
sc.tl.pca(canogamez_act, svd_solver = 'arpack')

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize = (20,4), gridspec_kw = {'wspace':1})
ax1_dict = sc.pl.pca(canogamez_act, color = 'cell.type', ax = ax1, show = False)
ax2_dict = sc.pl.pca(canogamez_act, color = 'cytokine.condition', ax = ax2, show = False)
ax3_dict = sc.pl.pca(canogamez_act, color = 'donor.id', ax = ax3, show = False)

In [None]:
sc.pl.pca(canogamez_act, color = 'cytokine.condition', projection = '3d',annotate_var_explained = True)

In [None]:
sc.pl.pca(canogamez_act, color = 'cytokine.condition', components = ['1,2','3,4','5,6','7,8'], ncols = 2)

In [None]:
sc.pl.pca_variance_ratio(canogamez_act, log = True)

### UMAP

In [None]:
sc.pp.neighbors(canogamez_act, n_neighbors = 10, n_pcs = 40)
sc.tl.umap(canogamez_act)

In [None]:
sc.pl.umap(canogamez_act, color = ['cytokine.condition','cell.type', 'donor.id'], wspace = 0.5)

## Louvain Clustering

The Louvain algorthim was chosen for clustering as it was used by Cano-Gamez et al. 

**! Caution: Rerunning your code will change the cluster composition due to randomness of the algorthim !**

In [None]:
sc.tl.louvain(canogamez_act, key_added = "louvain_1.0", random_state=1)

In [None]:
sc.pl.umap(canogamez_act, color=['louvain_1.0', 'cytokine.condition','cell.type', 'donor.id'], wspace=0.5)

### Explore composition of the clusters

In [None]:
def count_pie(anndata, clustering, category):

    """generates a data frame with counts for a specific category within  
       the clusters and, plots values as pie chart"""

    # generate data frame with information for cluster
    clusters_df = anndata.obs[str(clustering)].to_frame()
    clusters_df[str(category)] = anndata.obs[str(category)]

    # generate empty dataframe for counted values
    number_clusters = len(np.unique(anndata.obs[str(clustering)]))
    row_names = list(np.unique(anndata.obs[str(clustering)]))
    row_names_long = ['cluster ' + name for name in row_names]
    col_names = list(anndata.obs[str(category)].cat.categories)
    df_cell_count = pd.DataFrame(0, columns=col_names,
    index=row_names_long)

    # fill dataframe with counts of the given categorie
    for i in range(0, number_clusters):
        cluster = clusters_df[str(clustering)] == str(i)
        cells_cluster = clusters_df[cluster]
        count_cells = cells_cluster.value_counts()
        for ic, vc in count_cells.items():
            df_cell_count.at['cluster ' + ic[0], ic[1]] = vc

    # plot as piechart
    from natsort import natsorted

    df_cell_count_T = df_cell_count.T
    df_cell_count_T
    df_cell_count_T.reindex(natsorted(df_cell_count_T.columns, ), axis=1)

    amount_plots = len(df_cell_count)
    amount_cols = 4
    amount_rows = int(np.ceil(amount_plots / amount_cols))
    fig, axes = plt.subplots(nrows=amount_rows, ncols=amount_cols,
                figsize=(15, 15))
    fig.tight_layout()

    for index, column in enumerate(df_cell_count_T):
        current_ax = axes[index // amount_cols, index % amount_cols]
        current_ax.set_title('{}'.format(column))
        current_data = df_cell_count_T[column]
        current_labels = list(current_data.index)
        current_data = list(current_data)
        current_ax.pie(current_data, labels=current_labels,
        autopct='%1.1f%%', startangle=90)
        current_ax.axis('equal')

    return df_cell_count, plt.show()

In [None]:
count_pie(canogamez_act, 'louvain_1.0', 'cell.type')

In [None]:
count_pie(canogamez_act, 'louvain_1.0', 'cytokine.condition')

### Rank genes

In [None]:
sc.tl.rank_genes_groups(canogamez_act, groupby = 'louvain_1.0', method = 'wilcoxon', use_raw = True)

In [None]:
sc.pl.rank_genes_groups_matrixplot(canogamez_act, n_genes = 3, cmap = 'bwr', standard_scale = 'var', 
                                   values_to_plot = 'scores')

Use identified genes to annotate `IFN high`, `HSP high` and `Mitotic` clusters

### Cluster Annotation 

**Used marker genes**

|       T cell type /   differentiation state       |      Marker genes  mentioned in Cano-Gamez et al.      |   
|:-------------------------------------------------:|:------------------------------------------------------:|
| Tn/Th2                                            | GATA3, MAOA, LIMA1, MRPS26                             |   
| Tn/Th17                                           | TNFRSF8, PALLD, RORA                                   |   
| Tn/iTreg                                          | FOXP3, LMCD1, LGALS3, CCL5                             |   
| Tn/Th17/iTreg                                     | IL2, DUSP2, TNF                                        |   
| Tm/Th17/iTreg                                     | CCL5, LGALS3, TNFRSF8, BACH2, BATF3, AHR, IL17F, CTLA4 |   
| central memory T cells  (Tcm)                     | PASK                                                   |   
| effector memory T cells  (Tem)                    | IL7R, KLRB1, TNFSF13B                                  |   
| terminally differentiated effector cells  (TEMRA) | PRF1, CCL4, GZMA, GZMH                                 |   
| natural T regulatory cells  (nTreg)               | FOXP3, CTLA4                                           |   

| T cell type /  differentiation state  | Marker genes literature   |   
|---------------------------------------|---------------------------|
| T naive (Tn)                          | CCR7                      |   
| central memory T cells (Tcm)          | FAS, IL2RB, PRDM1         |   
| effector memory T cells (Tem)         | CXCR3, ITGAL, CCR5, TBX21 |   

**Map marker genes on clusters**

Tn/Th2

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'GATA3', 'MAOA', 'LIMA1', 'MRPS26' ], legend_loc = 'on data', 
          wspace = 0.5)

Tn/Th17

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'TNFRSF8', 'PALLD', 'RORA'], legend_loc = 'on data', 
          wspace = 0.5)

Tn/iTreg

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0','IL2', 'DUSP2', 'TNF'], legend_loc = 'on data', 
          wspace = 0.5)

Tn/Th17/iTreg

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0','IL2', 'DUSP2', 'TNF'], legend_loc = 'on data', wspace = 0.5)

Tm/Th17/iTreg

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0','CCL5', 'LGALS3', 'TNFRSF8', 'BACH2', 'BATF3', 'AHR', 
                                 'IL17F', 'CTLA4'], legend_loc = 'on data', wspace = 0.5)

Tcm

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0','FAS', 'IL2RB', 'PRDM1', 'PASK'], legend_loc = 'on data', 
           wspace = 0.5)

Tem

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'CXCR3', 'ITGAL', 'CCR5', 'TBX21', 'IL7R', 'KLRB1', 'TNFSF13B'], 
           legend_loc = 'on data', wspace = 0.5)

TEMRA

In [None]:
sc.pl.umap(canogamez_act, color = ['louvain_1.0','PRF1', 'CCL4', 'GZMA', 'GZMH'], legend_loc = 'on data',
           wspace = 0.5)

nTreg

In [None]:
sc.pl.umap(canogamez_act, color=['louvain_1.0','FOXP3', 'CTLA4'], legend_loc='on data', 
          wspace=0.5) 

**Add annotation to AnnData**

Create annotation

In [None]:
# adjust to individually identified clusters 
cluster_annotation = {
    '8':'Th0/Tn',
    '3':'Th0/Tcm1',
    '16':'Th0/Tcm2',
    '9':'Th0/Tem',
    '5':'Th0/Temra',
    '15':'Th17/iTreg/Tn',
    '13':'Th17/iTreg/Tcm1',
    '7':'Th17/iTreg/Tcm2',
    '0':'Th17/iTreg/Tem',
    '10':'Th17/iTreg/Temra',
    '6':'iTreg/Tn',
    '1':'Th17/iTreg/Tn',
    '4':'Th2/Tn',
    '12':'nTreg',
    '11':'IFN high',
    '2':'HSP high',
    '14':'Mitotic'
}

Add annotation to data

In [None]:
canogamez_act.obs['cell type'] = canogamez_act.obs['louvain_1.0'].map(cluster_annotation).astype('category')

Plotting

In [None]:
sc.pl.umap(canogamez_act, color='cell type', legend_loc='on data',
           frameon=False, legend_fontsize=5)

### Save data

In [None]:
canogamez_act.write(results_file)