### Notebook to format `cellbender` processed h5 files to anndata for project `23-0092`

- **Developed by:** Carlos Talavera-López Ph.D
- **Würzburg Institute for Systems Immunology & Julius-Maximilian-Universität Würzburg**
- v230830

### Import required modules

In [1]:
import anndata
import numpy as np
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt

### Set up working environment

In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 180, color_map = 'magma_r', dpi_save = 300, vector_friendly = True, format = 'svg')

-----
anndata     0.9.2
scanpy      1.9.4
-----
PIL                 10.0.0
asttokens           NA
backcall            0.2.0
comm                0.1.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.7.post1
decorator           5.1.1
executing           1.2.0
h5py                3.9.0
importlib_resources NA
ipykernel           6.25.1
ipywidgets          8.1.0
jedi                0.19.0
joblib              1.3.2
kiwisolver          1.4.5
llvmlite            0.40.1
matplotlib          3.7.2
mpl_toolkits        NA
natsort             8.4.0
numba               0.57.1
numpy               1.24.4
packaging           23.1
pandas              2.0.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.10.0
prompt_toolkit      3.0.39
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9

In [3]:
def X_is_raw(adata):
    return np.array_equal(adata.X.sum(axis=0).astype(int), adata.X.sum(axis = 0))

### Read in samples

In [4]:
sample_metadata = pd.read_csv('../data/Sample_Genotype.csv', sep = ',', index_col = 0)
sample_metadata.columns = ['genotype']
sample_metadata.head()

Unnamed: 0_level_0,genotype
Sample,Unnamed: 1_level_1
A9_2,WT
A10_2,WT
A11_2,Mdx
A12_2,Mdx
B1_2,MdxSCID


In [5]:
path = '../data/'
filenames = sample_metadata.index
adatas = [sc.read_10x_h5(path + filename + '/' + filename + '_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5') for filename in filenames]
for i in range(len(adatas)):
    adatas[i].obs['sample'] = sample_metadata.index[i]
    for col in sample_metadata.columns:
        adatas[i].obs[col] = sample_metadata[col][i]
adata = adatas[0].concatenate(adatas[1:], batch_categories = sample_metadata.index)
adata.shape

reading ../data/A9_2/A9_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)
reading ../data/A10_2/A10_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)
reading ../data/A11_2/A11_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)
reading ../data/A12_2/A12_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)
reading ../data/B1_2/B1_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)
reading ../data/B2_2/B2_2_mm_nuclei-23-0092_CB_ctl230829.raw_filtered.h5
 (0:00:00)



See the tutorial for concat at: https://anndata.readthedocs.io/en/latest/concatenation.html


(188253, 32285)

In [6]:
adata.obs['sample'] = adata.obs['sample'].astype('category')
adata.obs['sample'].cat.categories

Index(['A10_2', 'A11_2', 'A12_2', 'A9_2', 'B1_2', 'B2_2'], dtype='object')

In [7]:
adata.obs['genotype'] = adata.obs['genotype'].astype('category')
adata.obs['genotype'].cat.categories

Index(['Mdx', 'MdxSCID', 'WT'], dtype='object')

In [8]:
X_is_raw(adata)

True

### Save merged object

In [9]:
adata.write('../data/heart_mm_nuclei-23-0092_CB_ctl230830.raw.h5ad')