# Spatial mapping of cell types across the hypothalamus (0/3) - preprocessing

DY  @ Webb lab 

Adapted based on https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_estimating_signatures.html

202108

## Outline

The [**cell2location** workflow](https://github.com/vitkl/cell2location_paper/blob/master/paper/FigS1_cell2location_workflow.pdf) consists of three sections:

**Data preprocessing (0/3): this notebook**


I. Estimating reference expression signatures of cell types (1/3)

II. Spatially mapping cell types (2/3)

III. Results and downstream analysis (3/3)



In [1]:
! which python3
! module list

/users/dyu20/anaconda/cellpymc/bin/python3
java/8u111        python/2.7.12     anaconda/3-5.2.0  cudnn/7.6.5       
matlab/R2017b     intel/2017.0      cuda/10.2         


In [2]:
import sys
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import os
import gc

# this line forces theano to use the GPU and should go before importing cell2location
os.environ["THEANO_FLAGS"] = 'device=cuda,floatX=float32,force_device=True'
# if using the CPU uncomment this:
#os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'

import cell2location

import matplotlib as mpl
from matplotlib import rcParams
import matplotlib.pyplot as plt
import seaborn as sns

# silence scanpy that prints a lot of warnings
import warnings
warnings.filterwarnings('ignore')

Using cuDNN version 7605 on context None
Mapped name None to device cuda: TITAN RTX (0000:3D:00.0)


# Preprocess snRNAseq metadata from the Seurat object <a name="1.-Preprocess-single-cell-reference-data"></a>

For the 11 major clusters annotation, substitute the neuronal cluster with specific subcluster information. In other words, update the cluster information with the neuronal subcluster annotation.

In [3]:
# Set paths to data and results used through the document:
sp_data_folder = './data/mouse_brain_visium_wo_cloupe_data/'
results_folder = './results/mouse_brain_snrna/'

Now, we load the snRNAseq hypothalamus reference. The h5ad file was converted from Seurat object using R script, and the allcell_meta.csv and neuron_meta.csv are created usingt eh Seurat@metadata: 
```R
library(Seurat)
library(SeuratData)
library(SeuratDisk)

SaveH5Seurat(hypo.by.celltype_v3.2, filename = "hypo.by.celltype_v3.2.h5Seurat")
Convert("hypo.by.celltype_v3.2.h5Seurat", dest = "h5ad")

metadata <- as.data.frame(hypo.by.celltype_v3.2@meta.data)
write.csv(metadata, file = "allcells_meta.csv")

metadata <- as.data.frame(hypo.neurons.final@meta.data)
write.csv(metadata, file = "neurons_meta.csv")


cluster_letters <- as.data.frame(neuro.integrated.20210802@active.ident)
names(cluster_letters) <- 'tree.ident'
hypo_neuron <- AddMetaData(
  object = neuro.integrated.20210802,
  metadata = cluster_letters,
  col.name = 'tree.ident')

metadata <- as.data.frame(hypo_neuron@meta.data)
write.csv(metadata, file = "neuron_meta.csv")
```

In [4]:
## snRNAseq reference (raw counts)
adata_snrna_raw = sc.read(sp_data_folder +'hypo_all_RNA.h5ad')

In [6]:
# load the metadata from 11 major clusters
allmeta_data = pd.read_csv(sp_data_folder +'allcell_meta.csv')
allmeta_data = allmeta_data.set_index('Unnamed: 0')
meta_ori = allmeta_data.copy()
# assign the major cluster annotation to the new "major_group" column
allmeta_data["major_group"] = allmeta_data["group"]

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,stim,percent.mt,integrated_snn_res.1.5,seurat_clusters,group,major_group
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AAACCTGAGACTAGAT-1_1,Aged_1,5452,2747,Aged,0.752018,46,46,Neuron,Neuron
AAACCTGAGGCCCGTT-1_1,Aged_1,2100,1163,Aged,0.666667,32,32,Neuron,Neuron
AAACCTGAGTAAGTAC-1_1,Aged_1,1265,821,Aged,4.505929,3,3,Oligodendrocyte,Oligodendrocyte
AAACCTGAGTGAACGC-1_1,Aged_1,1810,1219,Aged,4.309392,11,11,OPC,OPC
AAACCTGAGTGGCACA-1_1,Aged_1,759,554,Aged,7.905138,0,0,Oligodendrocyte,Oligodendrocyte


In [7]:
# read the metadata from the neuronal subcluster information
nmeta_data = pd.read_csv(sp_data_folder +'neuron_meta.csv')
nmeta_data = nmeta_data.set_index("Unnamed: 0")
nmeta_data = nmeta_data[['tree.ident']]
nmeta_data = nmeta_data.rename(columns={'tree.ident': 'group'})

Unnamed: 0_level_0,group
Unnamed: 0,Unnamed: 1_level_1
AAACCTGAGACTAGAT-1_1,Avp/Oxt
AAACCTGAGGCCCGTT-1_1,Nrg1/Nnat
AAACCTGCAGCATGAG-1_1,Meis2/Pou6f2
AAACCTGGTCCGTGAC-1_1,Npas3/Gm32647
AAACCTGTCGATGAGG-1_1,Chrm3/Schip1


In [17]:
# update the "group" column in the allmeta_data with the neuronal subcluster information
allmeta_data.update(nmeta_data)
allmeta_data.head()

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,stim,percent.mt,integrated_snn_res.1.5,seurat_clusters,group,major_group
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AAACCTGAGACTAGAT-1_1,Aged_1,5452,2747,Aged,0.752018,46,46,Avp/Oxt,Neuron
AAACCTGAGGCCCGTT-1_1,Aged_1,2100,1163,Aged,0.666667,32,32,Nrg1/Nnat,Neuron
AAACCTGAGTAAGTAC-1_1,Aged_1,1265,821,Aged,4.505929,3,3,Oligodendrocyte,Oligodendrocyte
AAACCTGAGTGAACGC-1_1,Aged_1,1810,1219,Aged,4.309392,11,11,OPC,OPC
AAACCTGAGTGGCACA-1_1,Aged_1,759,554,Aged,7.905138,0,0,Oligodendrocyte,Oligodendrocyte


For some neurons, there is no subcluster information assigned, so just name them as neuron in the "group" column.

In [None]:
allmeta_data.to_csv('all_meta_20210816.csv')

In [18]:
# update the adata_snrna_raw.obs with the new metadata CSV
adata_snrna_raw.obs = allmeta_data
adata_snrna_raw.obs

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,stim,percent.mt,integrated_snn_res.1.5,seurat_clusters,group,major_group
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AAACCTGAGACTAGAT-1_1,Aged_1,5452,2747,Aged,0.752018,46,46,Avp/Oxt,Neuron
AAACCTGAGGCCCGTT-1_1,Aged_1,2100,1163,Aged,0.666667,32,32,Nrg1/Nnat,Neuron
AAACCTGAGTAAGTAC-1_1,Aged_1,1265,821,Aged,4.505929,3,3,Oligodendrocyte,Oligodendrocyte
AAACCTGAGTGAACGC-1_1,Aged_1,1810,1219,Aged,4.309392,11,11,OPC,OPC
AAACCTGAGTGGCACA-1_1,Aged_1,759,554,Aged,7.905138,0,0,Oligodendrocyte,Oligodendrocyte
...,...,...,...,...,...,...,...,...,...
TTTGTTGGTAGGCAAC-1_8,Young_4,18262,5312,Young,0.520206,10,10,Rnf220/Ntng1,Neuron
TTTGTTGGTGCGCTCA-1_8,Young_4,2539,1677,Young,0.236314,21,21,Sst/Npy,Neuron
TTTGTTGTCACAGTGT-1_8,Young_4,9571,3730,Young,0.031345,1,1,Trh/Gpc5,Neuron
TTTGTTGTCATCCTGC-1_8,Young_4,21802,5948,Young,0.068801,4,4,Gal/Nts,Neuron
