icat

Identifying Cell-states Across Treatments

ICAT is a tool developed to better identify cell states in scRNAseq experiments where perturbations or some other biologic heterogeneity is present, such as gene knock-outs.

The method works by first identifying a set of conrol-defined cell states by performing unsupervised clustering. These identified cell states are then fed into a sparse gene weighting algorithm, Neighborhood Component Feature Selection (NCFS), to highly weight the most predictive genes, while also removing variance from non-explanatory genes. We then transform the data matrix using this weight vector, and perform semi-supervised clustering such that the originally identified control labels remain constant, but cells from experimental conditions are free to cluster with any other cells regardless of treatment status.

Installation

ICAT can be installed on linux machines using pip with the following command:

pip install icat-sc

Paper

To learn more about the algorithm, and how it compares to other methods, see the original paper at Bioinformatics

How to use

ICAT makes heavy use of the excellent scanpy library along with the associated AnnData data structure.

An example code block walks through running icat on a simulated dataset. The final clustering is stored in the sslouvain column of the returned AnnData object.

    from icat import simulate
    from icat import models
    import scanpy as sc
    import numpy as np
    data_model = simulate.SingleCellDataset(
        populations=2,
        genes=1000,
        dispersion=np.random.choice([1, 2, 3], 1000)
    )
    controls = data_model.simulate()
    controls.obs['treatment'] = 'control'
    perturbed = simulate.perturb(controls)
    perturbed.obs['treatment'] = 'perturbed'
    adata = controls.concatenate([perturbed])
    sc.pp.log1p(adata)

visualizing dataset

    # specify model parameters -- see documentation for more information
    model = models.icat(
        ctrl_value="control",
        ncfs_kws={'reg': 1, 'sigma': 3},
        neighbor_kws={'n_neighbors': 15}, 
        cluster_kws={'resolution': 0.75},
    )
    # cluster cells by providing treatment information
    out = model.cluster(adata, adata.obs['treatment'])
    print(out.obs['sslouvain'].unique())

visualizing results

While ICAT does not automatically compute UMAP, tsne, or other reduced dimension visualizations during clustering, it is possible to pass the upweighted count matrix (found in adata.obsm["X_icat"]) to these algorithms. In the case of UMAP, the returned adata object already has neighbors defined in this upweighted space, so calculating a new UMAP is simple:

sc.tl.umap(out)
sc.pl.umap(out, color=['sslouvain', 'Population'])

Hyper Parameter Optimization

For working with your own data, we recommend finding appropriate Louvain and NCFS hyper parameters prior to running the complete ICAT workflow. All hyper parameters used in the original pre-print can be found as supplemental tables.

We have also provided grid search functions to find the "best" n_neighbor and resolution parameters for Louvain and Semi-supervised Louvain clustering steps, as well as a function to find the "best" kernel width (sigma) and regularization parameters (reg).

from icat import optimize
sc.pp.pca(controls)

# Find the "best" `n_neighbor` and `resolution` parameter for clustering control cells by
# optimizing the Calinski-Harabasz Index over a grid of `n` and `r` values
louvain_n, louvain_r = optimize.optimize_louvain(
    controls,
    min_neighbors=3,
    max_neighbors=50,
    neighbor_step=2,
    min_res=0.3,
    max_res=1.2,
    res_step=0.02,
)

# cluster control cells with "best" values
sc.pp.neighbors(controls, n_neighbors=louvain_n)
sc.tl.louvain(controls, resolution=louvain_r)

# find "best" `sigma` and `reg` NCFS values by measuring the MCC in a
# weighted KNN over k-fold cross validation
sigma, reg =  optimize.optimize_ncfs(
    controls,
    controls.obs.louvain,
    n_neighbors=5,
    n_splits=3,
    sigma_vals=[0.5, 1, 1.5, 2, 2.5, 3],
    reg_vals=[0.25, 0.5, 1, 1.5, 2, 2.5, 3],
)

By default, ICAT uses the same n_neighbors and resolution parameters during semi-supervised clustering as it does during control clustering. In practice, this leads to good results (see paper). However, if users would like to optimize these parameters separately, we've include the below function:

# optimize louvain parameters for semi-supervised clustering in NCFS
# space -- include complete dataset now 
adata.obs.loc[controls.obs.index, "control_clusters"] = controls.obs.louvain
sslouvain_n, sslouvain_r = optimize_sslouvain(
    adata,
    adata.obs.control_clusters,
    reg,
    sigma,
    max_cells=750,
    min_neighbors=3,
    max_neighbors=50,
    neighbor_step=2,
    min_res=0.3,
    max_res=1.2,
    res_step=0.02,
)

Name	Name	Last commit message	Last commit date
Latest commit dakota-hawkins Update README.md Oct 19, 2023 e2ccf9f · Oct 19, 2023 History 547 Commits
docs/images	docs/images	updated readme	Jun 1, 2022
icat	icat	grabbing sslouvain defaults from louvain + neighbors	Dec 6, 2022
.gitignore	.gitignore	manjajaro move	Aug 3, 2021
LICENSE	LICENSE	manjajaro move	Aug 3, 2021
README.md	README.md	Update README.md	Oct 19, 2023
requirements.txt	requirements.txt	updated requirements	Dec 6, 2022
setup.py	setup.py	added louvain dependency.	Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

icat

Installation

Paper

How to use

Hyper Parameter Optimization

About

Releases

Packages

Languages

License

BradhamLab/icat

Folders and files

Latest commit

History

Repository files navigation

icat

Installation

Paper

How to use

Hyper Parameter Optimization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages