Skip to content

Latest commit

 

History

History
139 lines (128 loc) · 15.8 KB

config_options.md

File metadata and controls

139 lines (128 loc) · 15.8 KB

Back to main doc.


Config options


general

Config keyword Description
organism_defaults Name of a file in SeA-SnaP/defaults/ where default values for an organism are defined
organism: (overwrite organism_defaults)
|---name name of the organism, e.g. "human"
|---genus name of the genus, e.g. "Homo Sapiens"
|---taxon taxon number, e.g. 9606
|---files:
...|---genome path to a genome file (.fa); required
...|---transcriptome path to a transcriptome file; if empty "" it will be automatically generated
...|---gtf path to a gtf file with genome annotation; required
...|---bed path to a bed file; only required for infer_experiment
...|---seqc_gtf path to a gtf file; only required for qc
|---star_index path to a folder with indices for STAR; auto-generated if empty
|---salmon_index path to a folder with indices for Salmon; auto-generated if empty
|---R:
...|---annotations annotation string for R's AnnotationDbi, e.g. "org.Hs.eg.db"
pipeline_param: (general pipeline settings)
|---out_path_pattern path pattern for output files. Wildcards can be used inside braces {...}.
Available wildcards: {step}, {extension} and
---mapping: {sample}, {mate}, {batch}, {flowcell}, {lane}, {library}
---DE: {contrast}, {mapping}
default mapping: mapping/{step}/{sample}.{mate}/out/{step}.{sample}.{mate}.{extension}
default DE: DE/{contrast}/{step}/out/{step}.{contrast}.{extension}
|---log_path_pattern path pattern for log files. Wildcards can be used inside braces {...}.
Available wildcards: {step}, {extension} and
---mapping: {sample}, {mate}, {batch}, {flowcell}, {lane}, {library}
---DE: {contrast}, {mapping}
default mapping: mapping/{step}/{sample}.{mate}/report/{step}.{sample}.{mate}.{extension}
default DE: DE/{contrast}/{step}/report/{step}.{contrast}.{extension}
|---in_path_pattern path pattern for input files. Wildcards can be used inside braces {...}.
Available wildcards mapping: {sample}, {mate}, {batch}, {flowcell}, {lane}, {library}
Available wildcards DE: same as out_path_pattern for mapping
default mapping: ../input/{sample}/{sample}.{mate}
default DE: mapping/{step}/{sample}.{mate}/out/{step}.{sample}.{mate}.{extension}
|---report_snippets directory containing Rmd snippets; default: SeA-SnaP/report/
|---input_choice: (set choices for the choose_input() path handler method)
...|---mapping For DE: list of rules to use as an input for DESeq2; first entry used if no wildcard {mapping} was set in the out_path_pattern
Options: "import_gene_counts" for input from STARs gene counts, import_featurecounts for input from FeatureCounts and "import_sf" for input from Salmon sf files

mapping pipeline

Config keyword Description
pipeline_param: (general pipeline settings)
|---mapping_results list of options which algorithm to use. Options: "salmon-transcript_counts" for Salmon. "star-gene_counts" for STAR.
|---QC_results list of options which QC steps to perform. Options: "fastqc", "dupradar", "infer_experiment"
rule_options: (set parameters for rules)
|---star:
...|---cmd_opt a string with additional command line options for STAR
...|---trim trim fastq files? Options: "yes" or "no"
|---star_index:
...|---cmd_opt a string with additional command line options for STAR index generation; e.g. set "--sjdbOverhang <read len -1>"
|---salmon:
...|---cmd_opt a string with additional command line options for Salmon
...|---trim trim fastq files? Options: "yes" or "no"
|---salmon_index:
...|---cmd_opt a string with additional command line options for Salmon index generation

DE pipeline

Config keyword Description
experiment: (settings about the experiment)
|---covariate_file:
...|---star covariate file to use for input from STAR; default: covariate_file.txt
...|---salmon covariate file to use for input from Salmon; default: covariate_file.txt
|---design_formula design formula to be used by DESeq2; default: "~ group"
|---columns: (define level order for specific columns; default: empty)
...|--- <column name> list with level names of the column in the wished order (first level will be the reference by default)
filters: (set filters for input data)
|---low_counts exclude genes with counts lower than ; default: 0
|---min_counts include only features with at least min_counts counts in at least min_count_n samples default: not set
|---min_count_n (see above)
|---experiment_blacklist exclude certain entries of the covariate file from analysis
given as a dictionary of the form: {: [, ...]}
e.g. exclude samples: {"group": ["sample1", "sample1"]}
default: {}
|---experiment_whitelist only allow certain entries of the covariate file for analysis
given as a dictionary of the form: {: [, ...]}
contrasts: (define which contrasts to produce and how)
|---contrast_list: List of contrast definitions (see following); default: empty
...|---(list entry)
......|---title name of contrast, e.g. "nonclassical vs classical"
......|---ratio
.........|---column column in covariate file, e.g. "condition"
.........|---numerator numerator of the contrast (a level of column), e.g. "nonclassical"
.........|---denominator denominator of the contrast (a level of column), e.g. "classical"
......|---coef alt. to ratio; the coefficient of DESeq2 results, e.g. "condition_classical_vs_nonclassical"
......|---vector alt. to ratio; a list with entries corresponding to columns in the design matrix, defining the linear combination, e.g. [1,1,0,-1,0]
......|---goseq whether to run GO and KEGG enrichment analysis with goseq; "true" or "false"
......|---cluster_profiler
.........|---run whether to run cluster_profiler; "true" or "false"
.........|---MSigDb: test for MSigDb annotation:
............|---categories: list of MSigDb categories to test; e.g. ["H","C1","C2"]; if categories is not set, tests all categories
............|---type whether to run gene set enrichment analysis or overrepresentation analysis; options: "gsea" or "ora"; default: "gsea"
.........|---GO: test for GO annotation:
............|---ontologies: list of GO ontologies to test; e.g. ["MF","BP","CC"]; if ontologies is not set, tests all three
............|---type whether to run gene set enrichment analysis or overrepresentation analysis; options: "gsea" or "ora"; default: "gsea"
............|---pval: p-value cutoff to use; default 0.05
............|---qval: q-value cutoff to use, only for ORA; default 0.2
.........|---KEGG: test for KEGG pathway annotation:
............|---type whether to run gene set enrichment analysis or overrepresentation analysis; options: "gsea" or "ora"; default: "gsea"
............|---kegg_organism_code kegg code for the organism; e.g. "mmu" for mouse or "hsa" for human
............|---pval: p-value cutoff to use; default 0.05
............|---qval: q-value cutoff to use, only for ORA; default 0.2
.........|---KEGG_modules: test for KEGG module annotation:
............|---type whether to run gene set enrichment analysis or overrepresentation analysis; options: "gsea" or "ora"; default: "gsea"
............|---kegg_organism_code kegg code for the organism; e.g. "mmu" for mouse or "hsa" for human
............|---pval: p-value cutoff to use; default 0.05
............|---qval: q-value cutoff to use, only for ORA; default 0.2
......|---... any key from defaults (overwrite them for this contrast)
|---defaults:
...|---max_p_adj FDR cutoff 'alpha' for DESeq2's results function; default: 0.1
...|---ranking_by rank results by (column in results table): log2FoldChange for log2 fold change (the effect size estimate), pvalue for the p-value, padj for the multiple testing corrected p-value
...|---ranking_order R expression for ordering (with ranking_by) with 'x' as input, e.g. "-abs(x)"
...|---results_parameters:
......|---lfcThreshold test for log fold change higher than ; default: 0
......|---altHypothesis alternative hypothesis of the test
Options: greater, less, greaterAbs, lessAbs (see results)
default: greaterAbs
......|---independentFiltering perform independent filtering; "yes" or "no"
...|---lfcShrink_parameters:
......|---type algorithm to use for log fold change shrinkage; Options: none, apeglm, ashr, normal
default: "none"
...|---ORA:
......|---fdr_threshold FDR threshold to determine which results to use for over-representation analysis; default: 0.1
report: (define which snippets to include in the report)
|---report_snippets List of report snippets (Rmd files) in the report/ directory. Snippets will be appended in the order defined in this list (see section Adding Rmd Snippets)
|---defaults:
...|---... default lists of report snippets (Rmd files) added to the report in blocks; a dict with block name as key and snippet list as value
|---snippet_parameters: parameters used in report snippets:
...|---Normalisation_QC: parameters used in Normalisation_QC sub-folder:
......|---n_most_varying analyse the n most varying genes
......|---annotation_columns columns of the covariate table used to annotate PCA and clustering
...|---contrast: parameters used in contrast sub-folder:
......|---filter_results: parameters to filter DESeq2 results table:
.........|---qval threshold on qvalue, display rows with q-value < qval only; default 0.1
......|---filter_goseq: parameters to filter goseq results table:
.........|---qval threshold on qvalue, display rows with q-value < qval only; default 0.1

Back to main doc.