/
examples.rst
154 lines (117 loc) · 5.24 KB
/
examples.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
Examples
******************************
Analysis example
==============================
The following is an example of how to use ``ngs_toolkit`` in a ATAC-seq project.
While straightforward, it still allows considerable customization due to the modularity of the toolkit and the parametrization of most functions (this example uses default values everywhere nonetheless).
.. note::
``ngs_toolkit`` from version 0.25.0 on uses the `PEP 2.0 specification <http://pep.databio.org/en/2.0.0/specification/>`_. If you have a PEP made in an earlier version, you must update it in order to use ``ngs_toolkit``>=0.25.0.
We have the following `PEP project <http://pep.databio.org/>`_ config YAML file:
.. code-block:: yaml
pep_version: "2.0.0"
name: example_project
description: example_project
username: user
email: user@email.com
sample_table: annotation.csv
subsample_table:
comparison_table: comparison_table.csv
submission_subdir: submission
results_subdir: data
output_dir: example_project
pipeline_interfaces: /home/user/workspace/open_pipelines/pipeline_interface.yaml
sample_attributes:
- sample_name
- genotype
- replicate
group_attributes:
- genotype
- replicate
sample_modifiers:
imply:
- if:
organism: 'human'
then:
genome: 'hg38'
derive:
attributes: [data_source]
sources:
local: data/{sample_name}.bam
bsf: /scratch/lab_bsf/samples/{flowcell}/{flowcell}_{lane}_samples/{flowcell}_{lane}#{BSF_name}.bam
The following sample annotation CSV file, 'annotation.csv':
.. csv-table:: Annotation table for example
:header: "sample_name", "protocol", "genotype", "replicate", "organism", flowcell, lane
"ATAC-seq_KOA_r1", "ATAC-seq", KO_A", "1", "human", "C0AXX", "1"
"ATAC-seq_KOA_r2", "ATAC-seq", KO_A", "2", "human", "C0AXX", "1"
"ATAC-seq_KOB_r1", "ATAC-seq", KO_B", "1", "human", "C0AXX", "1"
"ATAC-seq_KOB_r2", "ATAC-seq", KO_B", "2", "human", "C0AXX", "1"
"ATAC-seq_WT_r1", "ATAC-seq", WT", "1", "human", "C0AXX", "1"
"ATAC-seq_WT_r2", "ATAC-seq", WT", "2", "human", "C0AXX", "1"
And the following comparison table, 'comparison_table.csv':
.. csv-table:: Comparison table for example
:header: "comparison_name", "comparison_side", "sample_name", "sample_group"
:widths: 30, 30, 30, 30
"KOA_vs_WT", "1", "ATAC-seq_KOA_r1", "KO_A"
"KOA_vs_WT", "1", "ATAC-seq_KOA_r2", "KO_A"
"KOA_vs_WT", "0", "ATAC-seq_WT_r1", "WT"
"KOA_vs_WT", "0", "ATAC-seq_WT_r2", "WT"
"KOB_vs_WT", "1", "ATAC-seq_KOB_r1", "KO_B"
"KOB_vs_WT", "1", "ATAC-seq_KOB_r2", "KO_B"
"KOB_vs_WT", "0", "ATAC-seq_WT_r1", "WT"
"KOB_vs_WT", "0", "ATAC-seq_WT_r2", "WT"
ATAC-seq analysis example
-------------------------------
.. code-block:: python
import os
from ngs_toolkit.atacseq import ATACSeqAnalysis
# Start project and analysis objects
analysis = ATACSeqAnalysis(from_pep="project_config.yaml")
# Generate consensus peak set and annotate it
## get consensus peak set from all samples
analysis.get_consensus_sites()
## annotate peak set with genomic context
analysis.get_peak_genomic_location()
## annotate peak set with chromatin context
analysis.get_peak_chromatin_state(
os.path.join(
analysis.data_dir,
"external",
"E032_15_coreMarks_mnemonics.bed"))
## annotate peak set with genes
analysis.get_peak_gene_annotation()
# Use accessibility quantitatively
## get coverage values for each peak in each sample of ATAC-seq
analysis.measure_coverage()
# Normalize accessibility (quantile normalization + GC correction, requires cqn R library)
analysis.normalize(method="cqn")
# Annotate normalized accessibility with sample and region info
# # annotate dataframe with peak metadata
analysis.annotate_features()
# # annotate dataframe with sample metadata
analysis.accessibility = analysis.annotate_samples()
# UNSUPERVISED ANALYSIS
# # plot pairwise sample correlations,
# # perform dimensionality reduction (MDS, PCA)
# # and plot samples in this spaces, annotated with their attributes
analysis.unsupervised_analysis()
# SUPERVISED ANALYSIS
# # differential analysis with DESeq2
analysis.differential_analysis()
# # plot scatter, volcano, MA, heatmaps on the differential regions
# # by groups and with individual samples, with normalized values
# # and scalled values (Z-score).
analysis.plot_differential(
alpha=0.05,
corrected_p_value=True,
fold_change=1)
# # perform enrichment analysis on differnetial region sets
# # using LOLA, MEME-AME, HOMER and Enrichr
analysis.differential_enrichment(
directional=True,
max_diff=1000,
sort_var="pvalue")
# # for each type of enrichment results,
# # plot bar and scatter plots of odds ratio vs p-value,
# # heatmaps of enrichment across terms for each comparison
# # and comparison correlation in enrichment terms
analysis.plot_differential_enrichment()