Output file descriptions
Contents:
This text file contains the overall gene enrichment results determined by MutEnricher.
Columns:
-
Gene
: Gene name from GTF. -
coordinates
: Genomic coordinates of gene, from first to last annotated exon. -
num_nonsilent
: Total non-silent mutations in gene across samples. -
num_bg
: Total silent mutations identified within gene coordinates in samples. -
full_length
: Total gene length in basepairs (corresponding to (2)). -
coding_length
: Total length of gene coding domains (e.g. sum of CDS regions in GTF). -
bg_type
: String indicating method used to estimate gene's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Gene background mutation rate used in negative binomial tests. -
gene_pval
: Raw p-value of negative binomial test for gene. -
FDR_BH
: Benjamini-Hochberg FDR-corrected p-value for gene. -
num_samples
: Number of samples possessing a non-silent somatic mutation in gene. -
nonsilent_position_counts
: Semi-colon-separated list of genomic positions containing non-silent mutations along with counts; in format [position]_[count]. -
nonsilent_mutation_counts
: Semi-colon-separated list of genomic positions with base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing non-silent somatic mutations in gene.
This text file contains the results of the hotspot enrichment procedure.
Columns:
-
Gene
: Gene name from GTF. -
hotpsot
: Genomic coordinates of tested hotspot. -
num_mutations
: Number of non-silent somatic mutations considered in hotspot test. -
hotspot_length
: Length of hotspot window. -
effective_length
: Length of hotspot window adjusted for cohort size (i.e. hotspot length times number of samples). -
bg_type
: String indicating method used to estimate gene's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Background mutation rate used in negative binomial test for hotspot. -
pval
: Raw p-value of negative binomial test for hotspot. -
FDR_BH
: Benjamini-Hochberg FDR-corrected p-value for hotspot. -
num_samples
: Number of samples possessing a non-silent somatic mutation in hotspot window. -
position_counts
: Semi-colon-separated list of genomic positions in hotspot containing non-silent mutations, including counts; in format [position]_[count]. -
mutation_counts
: Semi-colon-separated list of genomic positions in hotspot, including base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing non-silent somatic mutations in gene.
This text file contains combined significance results for the overall gene region (1 above) and candidate hotspots (if found, 2 above) using Fisher's method.
Columns:
-
Gene
: Gene name from GTF. -
coordinates
: Genomic coordinates of gene, from first to last annotated exon. -
num_nonsilent
: Total non-silent mutations in gene across samples. -
num_bg
: Total silent mutations identified within gene coordinates in samples. -
full_length
: Total gene length in basepairs (corresponding to (2)). -
coding_length
: Total length of gene coding domains (e.g. sum of CDS regions in GTF). -
bg_type
: String indicating method used to estimate gene's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Gene background mutation rate used in negative binomial tests. -
gene_pval
: Raw p-value of negative binomial test for gene. -
hotspot_pvals
: Semi-colon-separated list of p-values associated with identified gene hotspots (NA if no hotspots found). -
Fisher_pval
: Fisher combined p-value of (9) and (10) values. -
Fisher_FDR
: Benjamini-Hochberg FDR-corrected Fisher p-value. -
num_samples
: Number of samples possessing a non-silent somatic mutation in gene. -
nonsilent_position_counts
: Semi-colon-separated list of genomic positions containing non-silent mutations along with counts; in format [position]_[count]. -
nonsilent_mutation_counts
: Semi-colon-separated list of genomic positions with base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing non-silent somatic mutations in gene.
This is a python pickle object containing the mutation data and calculations used in the enrichment analysis. The file contains a python list of Gene
class variables, as defined in the coding analysis code. If users are interested in inspecting this information, this file can be loaded in python with:
# In Python 2
import sys, os, cPickle
sys.path.insert(0, '/MutEnricher/install/path/') # Make MutEnricher install path available
sys.path.insert(0, '/MutEnricher/install/path/math_funcs/') # Add path to math functions as well
genes = cPickle.load(open('/path/to/output/example_gene_data.pkl','rb')) # Load gene data pickle file
# In Python 3
import sys, os
import _pickle as cPickle
sys.path.insert(0, '/MutEnricher/install/path/') # Make MutEnricher install path available
sys.path.insert(0, '/MutEnricher/install/path/math_funcs/') # Add path to math functions as well
genes = cPickle.load(open('/path/to/output/example_gene_data.pkl','rb')) # Load gene data pickle file
If a user would like to find information for a particular gene, this information then can be obtained as so:
gene_of_interest = 'KRAS'
index = None
for g in genes:
if g.name == gene_of_interest:
index = g.index
break
kras = genes[index]
The above code extracts the Gene
object for the gene KRAS. The user can now observe internal information associated with this gene.
Text file containing run information, including MutEnricher version, input files, optional parameter values, and notes about the number of genes/hotspots tested.
This text file contains the combined enrichments results for the overall region (from the negative binomial enrichment procedure) and the weighted average proximity clustering procedure. P-values are combined with Fisher's method.
Columns:
-
Region
: Genomic coordinates of region (from input BED file). -
region_name
: Name assigned to region (from 4th column of BED file if present; assigned internally otherwise). -
num_mutations
: Total number of somatic mutations in region across samples. -
length
: Length of region in basepairs. -
effective_length
: Length of region multiplied by number of samples. -
bg_type
: String indicating method used to estimate region's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Region background mutation rate used in negative binomial tests. -
region_pval
: Raw p-value from negative binomial test of region. -
WAP
: Statistic from weighted average proximity procedure performed on region. -
WAP_pval
: Permutation p-value of WAP procedure. -
Fisher_pval
: Fisher combined p-value of (8) and (10) values. -
FDR_BH
: Benjamini-Hochberg FDR-corrected Fisher p-value. -
num_samples
: Number of samples possessing a somatic mutation in region. -
position_counts
: Semi-colon-separated list of genomic positions containing somatic mutations along with counts; in format [position]_[count]. -
mutation_counts
: Semi-colon-separated list of genomic positions with base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing somatic mutations in region.
This text file contains the results of the hotspot enrichment procedure using negative binomial tests.
Columns:
-
Hotpsot
: Genomic coordinates of hotspot. -
region
: Genomic coordinates of full region associated with hotspot. -
region_name
: Name assigned to region (from 4th column of BED file if present; assigned internally otherwise). -
num_mutations
: Total number of somatic mutations in region across samples. -
hotspot_length
: Length of hotspot window. -
effective_length
: Length of hotspot window adjusted for cohort size (i.e. hotspot length times number of samples). -
bg_type
: String indicating method used to estimate region's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Hotspot background mutation rate used in negative binomial tests. -
pval
: Raw p-value of negative binomial test for hotspot. -
FDR_BH
: Benjamini-Hochberg FDR-corrected p-value for hotspot. -
num_samples
: Number of samples possessing a somatic mutation in hotspot. -
position_counts
: Semi-colon-separated list of genomic positions containing somatic mutations along with counts; in format [position]_[count]. -
mutation_counts
: Semi-colon-separated list of genomic positions with base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing somatic mutations in hotspot.
This text file contains combined significance results for the overall regional (1 above) and candidate hotspots (if found, 2 above) using Fisher's method.
Columns:
-
Region
: Genomic coordinates of region (from input BED file). -
region_name
: Name assigned to region (from 4th column of BED file if present; assigned internally otherwise). -
num_mutations
: Total number of somatic mutations in region across samples. -
length
: Length of region in basepairs. -
effective_length
: Length of region multiplied by number of samples. -
bg_type
: String indicating method used to estimate region's background rate; one ofglobal,
local,
orclustered_regions.
-
bg_prob
: Region background mutation rate used in negative binomial tests. -
region_pval
: Raw p-value from negative binomial test of region. -
WAP
: Statistic from weighted average proximity procedure performed on region. -
WAP_pval
: Permutation p-value of WAP procedure. -
hotspot_pvals
: Semi-colon-separated list of p-values associated with identified hotspots (NA if no hotspots found). -
Fisher_pval
: Fisher combined p-value of values (8), (10), and (11). -
Fisher_FDR
: Benjamini-Hochberg FDR-corrected Fisher p-value. -
num_samples
: Number of samples possessing a somatic mutation in region. -
position_counts
: Semi-colon-separated list of genomic positions containing somatic mutations along with counts; in format [position]_[count]. -
mutation_counts
: Semi-colon-separated list of genomic positions with base alterations and counts; in format [position] _ [reference base(s)] _ [alternate base(s)] _ [count]. -
samples
: Semi-colon-separated list of sample IDs containing somatic mutations in region.
This is a python pickle object containing the mutation data and calculations used in the enrichment analysis. The file contains a python list of Region
class variables, as defined in the non-coding analysis code. If users are interested in inspecting this information, this file can be loaded in python with:
# In Python 2
import sys, os, cPickle
sys.path.insert(0, '/MutEnricher/install/path/') # Make MutEnricher install path available
sys.path.insert(0, '/MutEnricher/install/path/math_funcs/') # Add path to math functions as well
regions = cPickle.load(open('/path/to/output/example_region_data.pkl','rb')) # Load region data pickle file
# In Python 3
import sys, os
import _pickle as cPickle
sys.path.insert(0, '/MutEnricher/install/path/') # Make MutEnricher install path available
sys.path.insert(0, '/MutEnricher/install/path/math_funcs/') # Add path to math functions as well
regions = cPickle.load(open('/path/to/output/example_region_data.pkl','rb')) # Load region data pickle file
If a user would like to find information for a particular region, this information then can be obtained as so:
region_of_interest = 'chr5:1295773-1296014'
index = None
for r in regions:
if r.name == region_of_interest:
index = r.index
break
reg = regions[index]
The above code extracts the Region
object for the defined region. The user can now observe internal information associated with this non-coding region.
Text file containing run information, including MutEnricher version, input files, optional parameter values, and notes about the number of regions/hotspots tested.