# CheckAtlas examples : Evaluate and compare different version of your atlas

In this example, we show how to run checkatlas in a folder where different version of the same atlas can be found. We thus reproduce the folder of every atlas currently analyzed. We use as a reference the Fetal Scanpy atlas downloaded in example 1, and create different version by filtering out cells and modifying <i>obsm</i> names.

## Download datasets

IF not already done, download Fetal dataset

Fetal<br>
From: 
<a href=https://cellxgene.cziscience.com/collections/13d1c580-4b17-4b2e-85c4-75b36917413f>Single cell derived mRNA signals across human kidney tumors - Young et al. (2021) Nat Commun</a>

In [None]:
%%bash
mkdir data1
cd data1/
curl -o Fetal.h5ad "https://corpora-data-prod.s3.amazonaws.com/a51c6ece-5731-4128-8c1e-5060e80c69e4/local.h5ad?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIATLYQ5N5X2OOOPPN6%2F20220601%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220601T093718Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEUaCXVzLXdlc3QtMiJIMEYCIQDa%2F6DcbFSKCAKSZOahE4BDQFPDlrdqP840e1amyvNkewIhAJiNjHqlfjZOfnbk38aCkxRTXMYy7HtENvd8EdwMipXDKusDCD4QARoMMjMxNDI2ODQ2NTc1IgwjMpK93hi3mahqnTwqyAN3k8W2x%2FHWeEEdTeLEIU5Xzf%2BO5IShslHQatUWt%2FOAqUdDklzDnXZc8SrP0h0c%2F%2BfXOpqdWdXH5xJs2ZJBC9WGdi26LPlJWc293IrZx71kYIGMdRuQDaTB9g3osr%2B9I9Um48t5g%2F4aTHeHuBxjAh4a6ELJcGVio%2B2%2BHap5an9EgHAHbXsWGxwlj%2FNgxLSfPJF6ycnzPeT6Yn%2FK%2BK9EiqgHmi95luxg4seHcun5ZZnEO3Tnc3v2xJ%2BfIghpmikYQvWLcUi34ppOVnAPMa4d%2FXkgtuQUe0Sg0xLhzulsxyBDgp4yHqtkyvC89lq98cFjGB5qsnMwSrT2RiGPjFSAEVhPPwCky7GQZ3MGAEC50A8UXDcnprLObV%2BSDpfjg9OTDl4WRCQTiEEFAsLDlwOnCRyO0bSnItksbanr0wmSkCmUsJ6Ju%2FNxtui1VQZFx5dhfKU8xWOv%2B5nBCxED2%2Bv6LmalNzKqtlDI5Jkx3qok1%2FsUTBKP9eEDiM3aoSm4b95kIariV7pusfAttyJwu5oEM5YkR1XYNum7bJYNxEh64zRRM9b7zSOYlB3T%2BNl41QlPY6Ssc183lm10PbwkteRGlJAeBconJmwJS4Aw%2BOXblAY6pAHF4Ih5mLvCi0RKHu8VNXo9YhaQV%2FKDwBOXbrJ3ejikSLCANYGiKFhFT2ch5CZ8PKJDJsyYujWQ%2BjzBNc0K2JDtj4dOJdRFSspbgx%2BKg1Dl6sY3KJrCX%2Fn4YCe59vQehAGEH8GMYEpJemZShh%2B0V7m8Uc8jyV7fL81GQ%2BCs7zUDSNzjNNFte1uxPmrnt5Rth5tdrZW66IHm6wToY3sM1sixVBIfyA%3D%3D&X-Amz-Signature=fa37236f3b672df8f9fbfe2e6168836a7d2ef887d338c70ac7a6e3e7a4d4f234"

Create different version to mimic currently analyzed atlas folder.

In [1]:
%%bash
mkdir data2/
mkdir data2/V1
mkdir data2/V2
mkdir data2/V3

In [15]:
import scanpy as sc
adata = sc.read_h5ad('data1/Fetal.h5ad')
adata

AnnData object with n_obs × n_vars = 27203 × 32828
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_clusters', 'Experiment', 'Stage', 'Selection', 'compartment', 'age', 'CnT', 'DTLH', 'UBCD', 'SSBpr', 'End', 'MSC', 'RVCSB', 'SSBpod', 'SSBm.d', 'ICa', 'ErPrT', 'ICb', 'NPC', 'Pod', 'annotCell', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'tissue_ontology_term_id', 'sex_ontology_term_id', 'organism_ontology_term_id', 'is_primary_data', 'assay_ontology_term_id', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'ethnicity', 'development_stage'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference'
    uns: 'X_normalization', 'cell_type_ontology_term_id_colors', 'schema_version', 'title'
    obsm: 'X_pca', 'X_umap'

In [36]:
# V1 is 2/3 of cells with new umap
adata_v1 = sc.pp.subsample(adata, fraction=0.6, copy=True)
sc.tl.pca(adata_v1, svd_solver='arpack')
sc.pp.neighbors(adata_v1, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(adata_v1)
adata_v1.write('data2/V1/Fetal_v1.h5ad')
adata_v1

AnnData object with n_obs × n_vars = 16321 × 32828
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_clusters', 'Experiment', 'Stage', 'Selection', 'compartment', 'age', 'CnT', 'DTLH', 'UBCD', 'SSBpr', 'End', 'MSC', 'RVCSB', 'SSBpod', 'SSBm.d', 'ICa', 'ErPrT', 'ICb', 'NPC', 'Pod', 'annotCell', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'tissue_ontology_term_id', 'sex_ontology_term_id', 'organism_ontology_term_id', 'is_primary_data', 'assay_ontology_term_id', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'ethnicity', 'development_stage'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference'
    uns: 'X_normalization', 'cell_type_ontology_term_id_colors', 'schema_version', 'title', 'pca', 'neighbors', 'umap'
    obsm: 'X_pca', 'X_um

In [33]:
# V2 is Fetal without raw data
adata_v2 = adata.copy()
adata_v2.raw = None
adata_v2.write('data2/V2/Fetal_v2.h5ad')
adata_v2

AnnData object with n_obs × n_vars = 27203 × 32828
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_clusters', 'Experiment', 'Stage', 'Selection', 'compartment', 'age', 'CnT', 'DTLH', 'UBCD', 'SSBpr', 'End', 'MSC', 'RVCSB', 'SSBpod', 'SSBm.d', 'ICa', 'ErPrT', 'ICb', 'NPC', 'Pod', 'annotCell', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'tissue_ontology_term_id', 'sex_ontology_term_id', 'organism_ontology_term_id', 'is_primary_data', 'assay_ontology_term_id', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'ethnicity', 'development_stage'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference'
    uns: 'X_normalization', 'cell_type_ontology_term_id_colors', 'schema_version', 'title'
    obsm: 'X_pca', 'X_umap'

In [37]:
# V3 is Fetal without cell_type annotation
adata_v3 = adata.copy()
del adata_v3.obs['author_cell_type']
del adata_v3.obs['cell_type']
del adata_v3.obs['cell_type_ontology_term_id']
adata_v3.write('data2/V3/Fetal_v3.h5ad')
adata_v3

AnnData object with n_obs × n_vars = 27203 × 32828
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_clusters', 'Experiment', 'Stage', 'Selection', 'compartment', 'age', 'CnT', 'DTLH', 'UBCD', 'SSBpr', 'End', 'MSC', 'RVCSB', 'SSBpod', 'SSBm.d', 'ICa', 'ErPrT', 'ICb', 'NPC', 'Pod', 'annotCell', 'disease_ontology_term_id', 'ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'tissue_ontology_term_id', 'sex_ontology_term_id', 'organism_ontology_term_id', 'is_primary_data', 'assay_ontology_term_id', 'assay', 'disease', 'organism', 'sex', 'tissue', 'ethnicity', 'development_stage'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_biotype', 'feature_is_filtered', 'feature_name', 'feature_reference'
    uns: 'X_normalization', 'cell_type_ontology_term_id_colors', 'schema_version', 'title'
    obsm: 'X_pca', 'X_umap'

## Run checkatlas

If checkatlas is installed in your environment, you just need to run this cell. This will produce all metric tables and figures needed.

In [38]:
%%bash
python -m checkatlas data2/

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.


Checking your single-cell atlases in data2/!
Searching Seurat and Scanpy files
Found 3
Check if checkatlas folders exist
Fetal_v1
Clean scanpy:data2/V1/Fetal_v1.h5ad
Fetal_v2
Clean scanpy:data2/V2/Fetal_v2.h5ad
Fetal_v3
Clean scanpy:data2/V3/Fetal_v3.h5ad
--- Load Fetal_v1 in data2/V1/
Run summary
Calc QC
Calc Silhouette for Fetal_v1 author_cell_type
Calc Davies Bouldin for Fetal_v1 author_cell_type
Calc Silhouette for Fetal_v1 cell_type
Calc Davies Bouldin for Fetal_v1 cell_type
Calc Silhouette for Fetal_v1 cell_type_ontology_term_id
Calc Davies Bouldin for Fetal_v1 cell_type_ontology_term_id
Calc Silhouette for Fetal_v1 seurat_clusters
Calc Davies Bouldin for Fetal_v1 seurat_clusters
NOT WORKING YET - Calc Rand Index for Fetal_v1 cell_type
NOT WORKING YET - Calc Rand Index for Fetal_v1 cell_type_ontology_term_id
NOT WORKING YET - Calc Rand Index for Fetal_v1 seurat_clusters
NOT WORKING YET - Calc Kruskal Stress for Fetal_v1 X_pca
NOT WORKING YET - Calc Kruskal Stress for Fetal_v1 X_u

## Run MultiQC

Once checkatlas has been run, all tables and fig cazn be found in the checkatlas_files folder. MultiQC will retrieve these files and create the html summary files.
WARNING: Install and run only MultiQC from https://github.com/becavin-lab/MultiQC/tree/checkatlas. Otherwise checkatlas files will not be taken into account.

In [39]:
%%bash
multiqc -f --cl-config "ignore_images: false" -c multiqc_config.yaml -n "CheckAtlas_example_2" -o "CheckAtlas_example_2" data2/

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.

  CheckAtlas-version/// MultiQC 🔍 | v1.13.dev0

|           multiqc | Search path : /Users/christophebecavin/Documents/checkatlas/examples/data2
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 28/28  
|        checkatlas | Found 3 reports
|           multiqc | Compressing plot data
|           multiqc | Report      : CheckAtlas_example_2/CheckAtlas_example_2.html
|           multiqc | Data        : CheckAtlas_example_2/CheckAtlas_example_2_data
|           multiqc | MultiQC complete


If multiqc ran without error an html report has been created in CheckAtlas_example1/CheckAtlas_example1.html<br>
<big>Open it and check your atlases ! </big>

In [40]:
from IPython.display import IFrame
IFrame(src="CheckAtlas_example_2/CheckAtlas_example_2.html", width='100%', height='500px')