# Example analysis notebook
### Author: Bill Flynn (bill.flynn@jax.org)

Updates:
- 2018-09-28: creation

This notebook is a template for analysis notebooks using the [`scanpy`](https://github.com/theislab/scanpy) package and an extension to their API I've written tentatively called [`scanpy_recipes`](https://https://github.com/TheJacksonLaboratory/scanpy_recipes).

The aim of the additional functions added to the `scanpy` API by `scanpy_recipes` are intended to simplify scRNA-seq analysis.  Additionally, it's meant to integrate seamlessly with the normal `scanpy` API and as such it only extends functionality.

This package is meant to be used with Jupyter notebooks, and in particular with the use of cell tags (`View -> Cell Toolbar... -> Tags`) in order to hide certain cells when generating a final analysis report.

In [1]:
from scanpy_recipes.api import sc

  from ._conv import register_converters as _register_converters


The first major change to the analysis pipeline is to specify data upfront about the samples you will analyze.  To standardize this process, I've written a little parse and template.  

Below, we initialize the parser and print the template.  We then copy the template into a new cell, fill it in, then read it back into the parser which returns a `config` object.  This object is essentially just a dict of dicts which holds the template information.  We can then pass this `config` object to other functions and they will query it when needed for information.  Ultimately, most of this information gets passed onto and stored inside the `AnnData` objects which we will use to store the scRNA-seq data.

Eventually, much of this metadata will be generated automatically through our pipelines but while that is still coming up to speed, we will define it here below.

In [2]:
ac = sc.AnalysisConfig()
ac.print_template()

config_string = """
[names]
customer_name = EXAMPLE_TEXT
analyst_name = EXAMPLE_TEXT
analysis_name = EXAMPLE_TEXT

[sample_names]
EXAMPLE_SAMPLE1 =
EXAMPLE_SAMPLE2 =
EXAMPLE_SAMPLE3 =

[genomes]
EXAMPLE_SAMPLE1 = EXAMPLE_GENOME1
EXAMPLE_SAMPLE2 = EXAMPLE_GENOME2
EXAMPLE_SAMPLE3 = EXAMPLE_GENOME3

[species]
hg19 = hsapiens
GRCh38 = hsapiens
mm10 = mmusculus

[input_dirs]
EXAMPLE_SAMPLE1 = EXAMPLE_DIR1
EXAMPLE_SAMPLE2 = EXAMPLE_DIR2
EXAMPLE_SAMPLE3 = EXAMPLE_DIR3

[output_dirs]
EXAMPLE_SAMPLE1 = EXAMPLE_DIR1
EXAMPLE_SAMPLE2 = EXAMPLE_DIR2
EXAMPLE_SAMPLE3 = EXAMPLE_DIR3

"""
The following sections are required:
['names', 'sample_names', 'genomes', 'species', 'input_dirs', 'output_dirs']


In [3]:
config_string = """
[names]
customer_name = Anonymous person
analyst_name = Bill Flynn
analysis_name = Test-analysis

[sample_names]
PR18016 =
DM18007 =
DM18008 =

[genomes]
PR18016 = GRCh38
DM18007 = mm10
DM18008 = mm10

[species]
hg19 = hsapiens
GRCh38 = hsapiens
mm10 = mmusculus

[input_dirs]
PR18016 = /projects/flynnb/singlecell/IBC/PR18016/
DM18007 = DM18007/
DM18008 = DM18008/

[output_dirs]
PR18016 = /projects/flynnb/singlecell/IBC/PR18016_outputs2/
DM18007 = DM18007_outputs/
DM18008 = DM18008_outputs/
"""
config = ac.read(config_string)

In [4]:
test = sc.load_10x_data("PR18016", config)

Variable names are not unique. To make them unique, call `.var_names_make_unique`.


In [11]:
test.uns

{'analysis_version': 1,
 'analyst': 'Bill Flynn',
 'customer_name': 'Anonymous person',
 'date_created': '2018-09-28T18-40-45',
 'empty_genes': 10672,
 'genome': 'GRCh38',
 'input_dir': '/projects/flynnb/singlecell/IBC/PR18016',
 'input_file': '/projects/flynnb/singlecell/IBC/PR18016/filtered_gene_bc_matrices_h5.h5',
 'output_dir': '/projects/flynnb/singlecell/IBC/PR18016_outputs2',
 'raw_cells': 2145,
 'raw_genes': 33694,
 'sampleid': 'PR18016',
 'species': 'hsapiens'}

In [6]:
sc.qc.gen_qc(test)

In [7]:
test.uns

OrderedDict([('sampleid', 'PR18016'),
             ('genome', 'GRCh38'),
             ('species', 'hsapiens'),
             ('analyst', 'Bill Flynn'),
             ('customer_name', 'Anonymous person'),
             ('analysis_version', 1),
             ('date_created', '2018-09-28T18-40-45'),
             ('input_file',
              '/projects/flynnb/singlecell/IBC/PR18016/filtered_gene_bc_matrices_h5.h5'),
             ('input_dir', '/projects/flynnb/singlecell/IBC/PR18016'),
             ('output_dir',
              '/projects/flynnb/singlecell/IBC/PR18016_outputs2'),
             ('raw_cells', 2145),
             ('raw_genes', 33694),
             ('empty_genes', 10672)])

In [8]:
sc.qc.run_qc(test, )

(2145, 33694) (2145, 19727)


In [9]:
test.uns

{'analysis_version': 1,
 'analyst': 'Bill Flynn',
 'customer_name': 'Anonymous person',
 'date_created': '2018-09-28T18-40-45',
 'empty_genes': 10672,
 'genome': 'GRCh38',
 'input_dir': '/projects/flynnb/singlecell/IBC/PR18016',
 'input_file': '/projects/flynnb/singlecell/IBC/PR18016/filtered_gene_bc_matrices_h5.h5',
 'output_dir': '/projects/flynnb/singlecell/IBC/PR18016_outputs2',
 'raw_cells': 2145,
 'raw_genes': 33694,
 'sampleid': 'PR18016',
 'species': 'hsapiens'}

In [None]:
s