In [None]:
from SAM import SAM
sam = SAM()
sam.load_data('../example_data/GSE74596_data.csv.gz')
sam.preprocess_data()
sam.load_annotations('../example_data/GSE74596_ann.csv')
sam.run()

## Clustering cells

After running SAM, we can cluster cells using one of four different methods:

In [None]:
sam.louvain_clustering(res = 1, method = 'modularity')
sam.kmeans_clustering(4)
sam.density_clustering()
sam.hdbknn_clustering()

`density_clustering` applies DBSCAN to the UMAP projection. 


`hdbknn_clustering` applies a slightly-modified version of HDBSCAN to the PC matrix.

The clustering results are stored in the AnnData object `sam.adata` OR within the dictionary `sam.output_vars`:

In [None]:
"""
sam.adata.obs['louvain_clusters']
sam.adata.obs['kmeans_clusters']
sam.adata.obs['density_clusters']
sam.adata.obs['hdbknn_clusters']

sam.output_vars['louvain_clusters']
sam.output_vars['kmeans_clusters']
sam.output_vars['density_clusters']
sam.output_vars['hdbknn_clusters']
"""

To visualize the results:

In [None]:
sam.scatter( c = sam.output_vars['louvain_clusters'] )
sam.scatter( c = sam.output_vars['kmeans_clusters'] )
sam.scatter( c = sam.output_vars['density_clusters'] )
sam.scatter( c = sam.output_vars['hdbknn_clusters'] )

## Marker gene identification

SAM has two built-in functions to identify marker genes. One uses a random forest classifier to determine the importance of each gene in defining each cluster, whereas the other uses a fold-change approach to rank genes based on their enrichment in each cluster.

In [None]:
# Random forest using louvain clustering labels:
sam.identify_marker_genes_rf(labels = 'louvain_clusters');
sam.identify_marker_genes_ratio(labels = 'louvain_clusters');

markers = sam.output_vars['marker_genes_rf']
#markers = sam.output_vars['marker_genes_ratio']

The result has dimensions (n_clusters x n_genes).

For example, `markers[3,0]` is the top-ranked marker gene for cluster 3 and `markers[0,5]` is the fifth-ranked marker gene for cluster 0 using the random forest classifier approach.

Visualizing the top 3 markers for the first two clusters:

In [None]:
for i in range(2):
    for j in range(3):
        sam.show_gene_expression(markers[i,j])

## Identifying correlated groups of genes

SAM also has a built-in function to identify correlated groups of genes: `sam.corr_bin_genes`. This function is automatically called at the end of `sam.run`. We can plot the highest-ranked genes from each bin using `sam.plot_correlated_groups`

In [None]:
#sam.corr_bin_genes()
sam.plot_correlated_groups()

This function can also be used to identify genes that are correlated with a particular gene of interest by using the `input_gene` argument.

In [None]:
genes = sam.corr_bin_genes(input_gene=markers[0,0])

for i in range(3):
    sam.show_gene_expression(genes[i])