# Practical 1 : integrating single-cell and spatial analyses in python
In this practical, you will consecutively analyse single-cell (10X genomics chromium) and spatial transcriptomics (10X genomics visium) from a pleural mesothelioma tumor.

We will focus on a single donor, patient 3B, a 80+ y.o. male without asbestos exposure on which 1 single-cell and 1 spatial sequencing were performed.

The analyses will be performed using scanpy. See https://scanpy.readthedocs.io/en/stable/tutorials/index.html for documentation and tutorials as well as the excellent single-cell best practices workflow (https://www.sc-best-practices.org/cellular_structure/clustering.html) for help on the relevant functions.

## Part I: single-cell analyses

The data was already processed (see scripts in files/scripts/practical1/2025/), with QC and filtering of cells and features as well as ambiant RNA correction and doublet flagging.

In the fist part of this practical, you will vizualise, clusterise and annotate the dataset.

### Loading modules

We will mostly use module scanpy, attributing it the alias sc. scanpy has submodules for preprocessing (submodule pp, accessed using sc.pp), for its main tools functions (submodule tl, accessed with sc.tl), and for plotting (submodule pl, accessed with sc.pl). See the scanpy doc for a full list of functions from each submodule https://scanpy.readthedocs.io/

In [1]:
import os
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt



### Loading the data
The processed data is in the /data/Training-MG/files/data/Practical1/scRNAseq/2025/processed folder.

Q1: load the data using the read function (scanpy)

Q2: Print the anndata object. How many cells are there? How many features?

Q3: What are the different layers present? What do they correspond to?

Q4: Plot the distribution of gene expression for gene BAP1 (see https://scanpy-tutorials.readthedocs.io/en/latest/plotting/core.html for plotting options from submodule sc.pl), for the different layers (layer option). Which one would favor for downsteam analysis and why?

### Clustering and vizualisation

Q5: compute the K-nearest neighbour graph (function neighbors from the sc.pp submodule) and leiden clustering function (function leiden from the sc.tl submodule) .

Q6: compute (function umap from the scanpy pp submodule) and plot the results using UMAP (function umap from the pl submodule). 

### Identifying tumor cells

Q7: Perform copy number calling using module infercnvpy. 

Q8: Visualise the CNV profile of the tumor. Do you find some of the known alterations driving PM (see e.g. https://www.nature.com/articles/s41588-023-01321-1#Fig3 panel b for genes frequently amplified or deleted, and https://www.genecards.org/ to find their locations) ?

### Annotating cells

For the first part of the cell annotation, we will use the celltypist module. See https://www.celltypist.org/ for the documentation and tutorials

In [None]:
import celltypist
from celltypist import models

Q9: Download the immune single-cell reference "Immune_All_High.pkl" using the download_models function. Print the object. What cell types are in the reference? 

Q10: Perform an annotation using the CellTypist trained classifier for immune cells (you can use the "coarser" model)

Q11: Check the quality of the annotation. Are there cell types with dubious annotations, and if so, why do you think they are hard to annotate?

Q12: plot the proportion of cell types. Does it match your expectations?