# CheckAtlas examples : Evaluate and compare different atlases

In this example, we show how to run checkatlas in a folder containing 3 different data type: Seurat, Scanpy, Cellranger. The three atlas files come from the PBMC 3K sample.

## Download datasets

The Cellranger file is directly downloaded from 10xGenomics database.

In [1]:
cd /data/analysis/data_becavin/checkatlas_test/tuto

/data/analysis/data_becavin/checkatlas_test/tuto


In [4]:
%%bash
mkdir -p data3
mkdir -p data3/pbmc_3k_cellranger
mkdir -p data3/pbmc_3k_cellranger/outs
cd data3/pbmc_3k_cellranger/outs
curl -o filtered_feature_bc_matrix.h5 "https://cf.10xgenomics.com/samples/cell-exp/3.0.2/5k_pbmc_v3/5k_pbmc_v3_filtered_feature_bc_matrix.h5"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.2M  100 17.2M    0     0  12.1M      0  0:00:01  0:00:01 --:--:-- 12.1M


Scanpy version is downloaded from cellxgene github.

In [5]:
%%bash
cd data3/
curl --location -o pbmc_3k_scanpy.h5ad "https://github.com/chanzuckerberg/cellxgene/raw/main/example-dataset/pbmc3k.h5ad"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 23.5M  100 23.5M    0     0  13.9M      0  0:00:01  0:00:01 --:--:-- 84.5M


Seurat version is downloaded from Satija's lab dropbox.

In [6]:
%%bash
cd data3/
curl --location -o pbmc_3k_seurat.rds "https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=1"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   123    0   123    0     0    375      0 --:--:-- --:--:-- --:--:--   375
100   342  100   342    0     0    361      0 --:--:-- --:--:-- --:--:--     0
100  274M  100  274M    0     0  17.2M      0  0:00:15  0:00:15 --:--:-- 17.6M


## Run checkatlas

If checkatlas is installed in your environment, you just need to run this cell. This will produce all metric tables and figures needed.

In [3]:
%%bash
python -m checkatlas --debug --NOMULTIQC data3/

|--- DEBUG    Program arguments: Namespace(path='data3/', config='', multiqc='CheckAtlas_MultiQC', resume=False, thread=1, debug=True, NOADATA=False, NOQC=False, NOREDUCTION=False, NOMETRIC=False, NOMULTIQC=True, qc_display=['violin_plot', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt'], obs_cluster=['cell_type', 'CellType', 'celltype', 'ann_finest_level', 'cellranger_graphclust', 'seurat_clusters', 'louvain', 'leiden', 'orig.ident'], metric_cluster=['silhouette', 'davies_bouldin'], metric_annot=['rand_index'], metric_dimred=['kruskal_stress'])
|--- DEBUG    Check checkatlas folders in:data3/
|--- INFO     Searching Seurat, Cellranger and Scanpy files
|--- DEBUG    Include Atlas: pbmc_3k_seurat from data3/pbmc_3k_seurat.rds
|--- DEBUG    Include Atlas: pbmc_3k_scanpy from data3/pbmc_3k_scanpy.h5ad
|--- DEBUG    Include Atlas: pbmc_3k_cellranger from data3/pbmc_3k_cellranger/outs/filtered_feature_bc_matrix.h5
|--- INFO     Found 1 potential scanpy files with .h5ad extension
|--- I


    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.


|--- INFO     Load pbmc_3k_seurat in data3/
|--- INFO     Run checkatlas pipeline for pbmc_3k_seurat Seurat atlas
|--- DEBUG    Create Summary table for pbmc_3k_seurat
|--- DEBUG    Create Adata table for pbmc_3k_seurat
|--- DEBUG    Create QC violin plot for pbmc_3k_seurat
|--- DEBUG    Create QC tables for pbmc_3k_seurat
|--- DEBUG    Create UMAP figure for pbmc_3k_seurat
|--- DEBUG    Add obs_key seurat_clusters with cat [1] "0" "1" "2" "3" "4" "5" "6" "7" "8"

|--- DEBUG    Add obs_key seurat_clusters with cat [1] "0" "1" "2" "3" "4" "5" "6" "7" "8"

|--- DEBUG    Calc clustering metrics for pbmc_3k_seurat
|--- DEBUG    Calc silhouette for pbmc_3k_seurat with obs seurat_clusters and obsm umap
|--- DEBUG    Calc davies_bouldin for pbmc_3k_seurat with obs seurat_clusters and obsm umap
|--- DEBUG    Add obs_key seurat_clusters with cat [1] "0" "1" "2" "3" "4" "5" "6" "7" "8"

|--- DEBUG    Calc annotation metrics for pbmc_3k_seurat
|--- DEBUG    Add obsm [1] "pca"  "umap"

|--- DEBUG 

    

## Run MultiQC

Once checkatlas has been run, all tables and fig cazn be found in the checkatlas_files folder. MultiQC will retrieve these files and create the html summary files.
WARNING: Install and run only MultiQC from https://github.com/becavin-lab/MultiQC/tree/checkatlas. Otherwise checkatlas files will not be taken into account.

In [5]:
%%bash
multiqc -f --cl-config "ignore_images: false" -n "CheckAtlas_example_3" -o "CheckAtlas_example_3" data3/


  [34m/[0m[32m/[0m[31m/[0m ]8;id=273237;https://multiqc.info\[1mMultiQC[0m]8;;\ 🔍 [2m| v1.15.dev0[0m

[34m|           multiqc[0m | Search path : /data/analysis/data_becavin/checkatlas_test/tuto/data3
[2K[34m|[0m         [34msearching[0m | [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [32m21/21[0m  21[0m  
[?25h[34m|        checkatlas[0m | Found 3 summary tables
[34m|        checkatlas[0m | Found 3 adata tables
[34m|        checkatlas[0m | Found 3 QC violin plots
[34m|        checkatlas[0m | Found 3 QC counts tables
[34m|        checkatlas[0m | Found 3 QC genes tables
[34m|        checkatlas[0m | Found 3 QC mito tables
[34m|        checkatlas[0m | Found 1 UMAP figures
[34m|        checkatlas[0m | Found 1 t-SNE figures
[34m|        checkatlas[0m | Found 2 metric cluster tables
[34m|           multiqc[0m | Compressing plot data
[34m|           multiqc[0m | [33mDeleting    : CheckAtlas_example_3/CheckAtlas_example_3.html   

If multiqc ran without error an html report has been created in CheckAtlas_example1/CheckAtlas_example1.html<br>
<big>Open it and check your atlases ! </big>

In [15]:
from IPython.display import IFrame

IFrame(
    src="CheckAtlas_example_3/CheckAtlas_example_3.html",
    width="100%",
    height="500px",
)