In [1]:
import numpy as np
import pandas as pd

import anndata as ad
import scanpy as sc
import scanorama

Load the preprocessed AnnData objects from their respective H5AD files.  
- `cao_adata`: Loaded from `data/cao_hvg_600.h5ad`.  
- `dis_adata`: Loaded from `data/dis_hvg.h5ad`.  
- `ian_adata`: Loaded from `data/ian_hvg.h5ad`.

In [2]:
cao_adata = ad.read_h5ad('data/cao_hvg_600.h5ad')
dis_adata = ad.read_h5ad('data/dis_hvg.h5ad')
ian_adata = ad.read_h5ad('data/ian_hvg.h5ad')

This code creates a list called adata_list that contains three AnnData objects: cao_adata, dis_adata, and ian_adata. 

These objects likely represent single-cell datasets from different sources or conditions.

In [9]:
adata_list = [cao_adata, dis_adata, ian_adata]

5.949708260587717

This code applies Scanorama's batch correction to the list of AnnData objects stored in adata_list. 

It integrates the datasets to correct for batch effects while preserving biological variability.

In [None]:
scanorama.integrate_scanpy(adata_list)

This code concatenates the list of AnnData objects in adata_list into a single AnnData object, int_adata, using Scanpy's sc.concat function. The index_unique='_' parameter ensures that duplicate observation names from different datasets are made unique by appending an underscore.

In [None]:
int_adata = sc.concat(adata_list, index_unique='_')

This code extracts the Scanorama-integrated feature matrices (X_scanorama) from each AnnData object in adata_list and stores them in the list scanorama_int. It then concatenates these matrices into a single array and assigns it to the "Scanorama" slot in int_adata.obsm, making the integrated representation accessible for downstream analysis.

In [None]:
# Get all the integrated matrices.
scanorama_int = [ad.obsm['X_scanorama'] for ad in adata_list]

# make into one matrix.
int_adata.obsm["Scanorama"] = np.concatenate(scanorama_int)

This code saves the integrated AnnData object int_adata to an H5AD file at the specified path. The file "scanorama_integrated_hvg.h5ad" will store the integrated data, allowing for future loading and analysis without needing to re-run the integration process.

In [None]:
# Write the AnnData object to an H5AD file
int_adata.write_h5ad("D:/newgenes/data/to_integrate_full/scanorama_integrated_hvg.h5ad")