# Single-cell RNA-seqs analysis using Python 

Adapted from:  
Single-cell best practices  
https://www.sc-best-practices.org/preamble.html

## Dimensionality reduction  
Needs conda env `sc_py_training`. 

#In this exercise you will perform downstream analysis ( from dimensionality reduction --> annotation )
The data you will use come from the study above:
Single-cell expression atlas link : 
https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-6945/results/tsne

link to the paper :https://europepmc.org/article/MED/30443254 

(open these links because you might need them later) 

In [None]:
# Import the scanpy module and set up the parameters for the general settings
import scanpy as sc

sc.settings.verbosity = 0  
sc.settings.set_figure_params(
    dpi=80,
    facecolor="white",
    frameon=False,
)

In [None]:
##Load the data output from the previous exercise - from yesterday's session 


In [None]:
adata = sc.read("INSERT_PATH")

In [None]:
# Q1. how many genes(features) and how many cells have we got in the anndata object? 
# A1. 

In [None]:
adata

In [None]:
# Q1. which layer are you going to use for dimensionality reduction/PCA? 
#A1 

In [None]:
adata.X = adata.layers["INSERT_LAYER"]

#### 5.1 PCA

In [None]:
#Before applying PCA we need to set "highly_deviant" genes as "highly_variable" genes otherwise PCA will not use the reduced features

In [None]:
#pass the correct layer to the correct side of the command
adata.var["INSERT_VAR_COLUMN_NAME"] = adata.var["INSERT_VAR_COLUMN_NAME_2"]

#Now you are almost ready to run the pca, got to the scanpy documentation and read through the parameters: 
https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.pca.html

Next choose a parameter for svd_solver based on the practical session from before, are there other parameters that might be relevant ? 

In [None]:
sc.pp.pca(adata, svd_solver="INSERT_PARAMETER", use_highly_variable="INSERT BOOLEAN")

In [None]:
#Visualise PCA, try different inputs(except for "total_counts") for the parameter color from the adata.obs columns 
#e.g sc.pl.pca_scatter(adata, color="total_counts") 

In [None]:
sc.pl.pca_scatter(adata, color=["COLUMN_NAME","COLUMN_NAME","COLUMN_NAME"])

#### 5.2 t-SNE

Continue to tsne calculation, again got to the respective documentation at scanpy website read through the parameters 
and select the correct for "use_rep"

In [None]:
sc.tl.tsne(adata, use_rep="INSERT_adata.obsm")

In [None]:
sc.pl.tsne(adata, color=["COLUMN_NAME","COLUMN_NAME","COLUMN_NAME"])

#### 5.3 UMAP

In [None]:
sc.pp.neighbors(adata)
sc.tl.umap(adata)

In [None]:
sc.pl.umap(adata, color=["COLUMN_NAME","COLUMN_NAME","COLUMN_NAME"])

#### 5.4 Inspecting quality control metrics 

In [None]:
# Can you plot the percentage of mitohondrial counts and the predicted doublets in the UMAP plot?What do you see?

In [None]:
sc.pl.umap(
    adata,
    color=["COLUMN_NAME","COLUMN_NAME","COLUMN_NAME"],
)

In [None]:
adata.write("INSERT_PATH")

#Do you observe cells on the UMAP plot that should be removed ? 



### 6. Clustering

In [None]:
import scanpy as sc

sc.settings.verbosity = 0
sc.settings.set_figure_params(dpi=80, facecolor="white", frameon=False)

In [None]:
adata

In [None]:
#Select a layer of the anndata object that is appropriate for clustering #we want to use scran normalisation

In [None]:
# Move X to another layer
adata.layers["counts_norm"] = adata.X

# Use the scran_normalization layer as the new main data layer, X
adata.X = adata.layers["INSERT_LAYER"]
adata

In [None]:
#Calculate the UMAP, choose a number of PCs for the calculation of the neighborhood graph

In [None]:
sc.pp.neighbors(adata, n_pcs=INSERT_NUMBER)
sc.tl.umap(adata)

In [None]:
sc.tl.leiden(adata, key_added="leiden_res0_25", resolution=INSERT_NUMBER)
sc.tl.leiden(adata, key_added="leiden_res0_5", resolution=INSERT_NUMBER)
sc.tl.leiden(adata, key_added="leiden_res1", resolution=INSERT_NUMBER)
adata

In [None]:
sc.pl.umap(
    adata,
    color=["leiden_res0_25", "leiden_res0_5", "leiden_res1"],
    legend_loc="on data",
)

 Hopefully you now see nice UMAP plot and you have completed 2/3 of the exercise. Now it's time for the annotation

In [None]:
youmadeit = chr(0x1F603)
print(youmadeit)

In [None]:
adata

 ### Annotation: for this part you will need to first find the marker genes for each cluster and then compare them to the known markers that you can find in the paper (link at the beggining of this notebook). 

In [None]:
#remove genes that do not have a gene name[mini tutorial on data cleaning]
adata = adata[:,adata.var['gene_symbols'].notna()]
adata 

In [None]:
#print adata.var
adata.var["gene_symbols"]

In [None]:
#make gene symbols the index of the anndata
adata.var["ensebl_ids"] = adata.var.index 
adata.var.index = adata.var["gene_symbols"]

In [None]:
#need to make them a string and then also make them unique
adata.var.index.astype(str)
adata.var_names = adata.var_names.astype(str)
adata.var_names_make_unique()

In [None]:
#now you should find the Differentially expressed genes by cluster

In [None]:
sc.tl.rank_genes_groups(
    adata, groupby="leiden_res0_5", method="wilcoxon", key_added="dea_leiden_0_5"
)

In [None]:
sc.pl.rank_genes_groups_dotplot(
    adata, groupby="leiden_res0_5", n_genes=10, key="dea_leiden_0_5"
)

In [None]:
# Hopefully know you can see the dotplot with the top 10 marker genes per cluster
#Can you compare it with the known marker genes from the paper?
#which cluster number corresponds to which differentiation stage?
#annotate the cell types in the UMAP using the identified marker genes from you analysis,the paper and google or other marker gene resources

In [None]:
#plot genes shown in the paper
sc.pl.umap(
    adata,
    color=["leiden_res0_5","genotype","INSERT GENE","INSERT GENE","INSERT GENE","INSERT GENE","INSERT GENE","INSERT GENE",
           "INSERT GENE"],
    legend_loc="on data",
)

In [None]:
cl_annotation = {
    "8": "INSERT CELL TYPE NAME",
    "1":"INSERT CELL TYPE NAME",
    "2":"INSERT CELL TYPE NAME",
     "3":"INSERT CELL TYPE NAME"
}

In [None]:
adata.obs["manual_celltype_annotation"] = adata.obs.leiden_res0_5.map(cl_annotation)

In [None]:
sc.pl.umap(
    adata,
    color=["leiden_res0_5","manual_celltype_annotation","Il2ra","Cd8b1","Cd8a","Cd4","Ccr7","Itm2a","Hba-a1"],
    legend_loc="on data",
)

#Try to annotate as much clusters as you can. For further reading: What other type of analysis will be relavant here?
- trajectory analysis  https://www.sc-best-practices.org/trajectories/pseudotemporal.html

🎉 🎉 🎉 🎉 🎉 🎉 Enjoy your lunch break ! 