In [1]:
%matplotlib inline


performing filtering using besca
================================

This example demonstrates the entire process of filtering out cells/genes ob subpar quality
before proceeding with analysis. 




In [2]:
import besca as bc
import scanpy.api as sc
import matplotlib.pyplot as plt

#load example dataset
adata = bc.datasets.pbmc3k_raw()

#set standard filtering parameters
min_genes = 600
min_cells = 2
min_UMI = 600
max_UMI = 6500
max_mito = 0.05
max_genes = 1900

visualization of thresholds
---------------------------

First the chosen thresholds are visualized to ensure that a suitable cutoff has been chosen.



In [3]:
#visualize filtering thresholds
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6))= plt.subplots(ncols=3, nrows=2)
fig.set_figwidth(15)
fig.set_figheight(8)
fig.tight_layout(pad=4.5)

bc.pl.kp_genes(adata, min_genes=min_genes, ax = ax1)
bc.pl.kp_cells(adata, min_cells=min_cells, ax = ax2)
bc.pl.kp_counts(adata, min_counts=min_UMI, ax = ax3)
bc.pl.max_counts(adata, max_counts=max_UMI, ax = ax4)
bc.pl.max_mito(adata, max_mito=max_mito, annotation_type='SYMBOL', species='human', ax = ax5)
bc.pl.max_genes(adata, max_genes=max_genes)

adding percent mitochondrial genes to dataframe for species human


application of filtering thresholds
-----------------------------------

Using the chosen thresholds the data is filtered. Before and after filtering results are depicted to compare.



In [4]:
#visualize data before filtering
sc.pl.violin(adata, ['n_counts', 'n_genes', 'percent_mito'], multi_panel=True, jitter = 0.4)

print('The AnnData object currently contains:', str(adata.shape[0]), 'cells and', str(adata.shape[1]), 'genes')
print(adata)

#perform filtering
adata = bc.pp.filter(adata, max_counts=max_UMI, max_genes=max_genes, max_mito=max_mito,min_genes=min_genes, min_counts=min_UMI, min_cells=min_cells)

#visualize data after filtering
sc.pl.violin(adata, ['n_counts', 'n_genes', 'percent_mito'], multi_panel=True, jitter = 0.4)

print('The AnnData object now contains:', str(adata.shape[0]), 'cells and', str(adata.shape[1]), 'genes')
print(adata)

The AnnData object currently contains: 737280 cells and 32738 genes
AnnData object with n_obs × n_vars = 737280 × 32738 
    obs: 'CELL', 'CONDITION', 'experiment', 'donor', 'n_counts', 'n_genes', 'percent_mito'
    var: 'ENSEMBL', 'SYMBOL'
started with  737280  total cells and  32738  total genes
removed 15 cells that expressed more than 1900 genes
removed 734965 cells that did not express at least 600  genes
removed 4 cells that had more than 6500  counts
removed 0 cells that did not have at least 600 counts
removed 17843 genes that were not expressed in at least 2 cells
removed  17  cells that expressed  5.0 percent mitochondrial genes or more
finished with 2279  total cells and 14895 total genes
The AnnData object now contains: 2279 cells and 14895 genes
AnnData object with n_obs × n_vars = 2279 × 14895 
    obs: 'CELL', 'CONDITION', 'experiment', 'donor', 'n_counts', 'n_genes', 'percent_mito'
    var: 'ENSEMBL', 'SYMBOL', 'n_cells'
