### Notebook to format the _Litvinukova et al 2020_ LV datasets.
- **Developed by**: Carlos Talavera-López Ph.D
- **Institute of Computational Biology - Computational Health Centre - Helmholtz Munich**
- v230114

### Import required modules

In [1]:
import anndata
import numpy as np
import pandas as pd
import scanpy as sc

### Set up working environment

In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 160, color_map = 'RdPu', dpi_save = 300, vector_friendly = True, format = 'svg', fontsize = 8)

-----
anndata     0.8.0
scanpy      1.9.1
-----
PIL                 9.2.0
asttokens           NA
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
cffi                1.15.1
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.4
decorator           5.1.1
entrypoints         0.4
executing           1.2.0
google              NA
h5py                3.7.0
hypergeom_ufunc     NA
igraph              0.10.2
ipykernel           6.17.1
jedi                0.18.2
joblib              1.2.0
kiwisolver          1.4.4
leidenalg           0.9.0
llvmlite            0.39.1
matplotlib          3.6.2
mpl_toolkits        NA
natsort             8.2.0
nbinom_ufunc        NA
ncf_ufunc           NA
numba               0.56.4
numpy               1.23.5
packaging           21.3
pandas              1.5.2
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        2.5.4
prompt_toolkit 

### Read in HLCA object

In [3]:
lv_raw = sc.read_h5ad('/home/cartalop/data/carlos/single_cell/heart/regions/RA/HHH_RA_ctl230101.raw.h5ad')
lv_raw

AnnData object with n_obs × n_vars = 120887 × 15744
    obs: 'domain_label', 'cell_states', 'region', 'proc'

In [4]:
lv_ref = lv_raw[lv_raw.obs['proc'].isin(['Litvinukova2020'])]
lv_ref

View of AnnData object with n_obs × n_vars = 38989 × 15744
    obs: 'domain_label', 'cell_states', 'region', 'proc'

### Clean up object

- Remove unnecessary fields in `adata.obs` and `adata.var`
- Remove `adata.obsm`, `adata.varm` 

In [5]:
lv_ref.obs = lv_ref.obs[['domain_label', 'cell_states']]
lv_ref

AnnData object with n_obs × n_vars = 38989 × 15744
    obs: 'domain_label', 'cell_states'

### Export object

In [6]:
lv_ref.write('/home/cartalop/data/carlos/single_cell/heart/regions/RA/HHH_RA_Litvinukova20_ctl230114.raw.h5ad')