# 10X Genomics - test dataset "3000 PBMCs From a Healthy Individual"

You can find this data and many other single cell datasets here:

https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k

## 1 Import Python packages/libraries to use in analysis

In [None]:
import scanpy as sc
import pandas as pd
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import colors
import seaborn as sns
import warnings;
warnings.filterwarnings('ignore');
from gprofiler import GProfiler
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=170, color_map='viridis')  # low dpi (dots per inch) yields small inline figures
sc.logging.print_header()
results_file = './PBMCs3000.h5ad'
results_file_denoised = './PBMCs3000_deno.h5ad'

Load the data

In [None]:
adata = sc.read_10x_mtx('./PBMC/hg19/',  # the directory with the `.mtx` file
    var_names='gene_symbols',                  # use gene symbols for the variable names (variables-axis index)
    cache=True)          

Make sure that the gene names are unique

In [None]:
adata.var_names_make_unique()  # this is unnecessary if using 'gene_ids'

Take a look at the data

In [None]:
adata

## 2 Pre-processing 

Below, try to run your own pre-processing on the data. Take a look at the mitochondrial content (percent mito), the number of genes in each cell, and the number of reads (counts) in each cell. Next, make thresholding decisions and decide what genes and cells to filter out. Below I have helped with the commands to filter out cells and genes but you need to decide what numbers to enter where you see a "?".

In [None]:
# Filter cells according to identified QC thresholds:
print('Total number of cells: {:d}'.format(adata.n_obs))

sc.pp.filter_cells(adata, min_counts = ?)
print('Number of cells after min count filter: {:d}'.format(adata.n_obs))

sc.pp.filter_cells(adata, max_counts = ?)
print('Number of cells after max count filter: {:d}'.format(adata.n_obs))

adata = adata[adata.obs['percent_mito'] < ?]
print('Number of cells after MT filter: {:d}'.format(adata.n_obs))

sc.pp.filter_cells(adata, min_genes = ?)
print('Number of cells after gene filter: {:d}'.format(adata.n_obs))

In [None]:
#Filter genes:
print('Total number of genes: {:d}'.format(adata.n_vars))

# Min ? cells - filters out ? count genes
sc.pp.filter_genes(adata, min_cells=?)
print('Number of genes after cell filter: {:d}'.format(adata.n_vars))