Skip to content

Rbbt-Workflows/sequence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Functionalities regarding genomic sequence analysis.

Finds genomic features overlapping genomic positions, like exons, reconstructs offsets into transcripts, and computes the amino-acid changaes of variants. Additionally finds mutations in exon junctions, and genes with high frequencies of mutations.

Tasks

Positions, ranges, and mutations are specified using a common format. Each is identified by several fields separated by the character :. Positions are represented as chromosome:position i.e. 12:4234412. Mutations have an additional field representing the mutant allele chromosome:position:allele i.e. 12:4234412:T (the reference allele is redundant, as it is specified by the chromosome and position). Indels are represented as in the following examples: +A or +ATC for one and three base insertions repectively or - and --- for one or three deletions repectively. Chromosome ranges are specified as chromosome:start:end as in 12:4234412:4244412.

It supports multiple organisms. The format of the organism input is the organism short code (Hsa for Homo sapiens, or Mmu for Mus musculus) optionally followed by the date of the build. For example, Hsa/jan2013 for a recent build or Hsa/may2009 for the hg18 build.

The watson input is used to specify if the variants are described in reference to the watson or forward strand, or in reference to the strand that holds the overlapping gene. Using the wrong convention may make some mutations coincide with the reference. The is_watson method can take a guess by checking this criteria.

Specifying the vcf parameter will interpret the input as a VCF file, and will run the genomic_mutations task to extract the mutations from it

The main tasks are: mutated_isoforms_fast, splicing_mutations, and affected_genes

reference

Report the reference base at the provided positions

add_reference

Add reference to mutations as (ref>mut)

gene_strand_reference

Report the reference base at the provided positions on the gene coding strand

The gene coding strand is determined by checking for overlaping transcripts at that position. In case of overlap the forward or watson strand is used.

is_watson

Guess wether the mutations provided are given in the watson strand or the gene strand

genes

Report genes overlapping positions

exons

Report exons overlapping positions

transcripts

Report transcripts overlapping positions

exon_junctions

Report exon junctions overlapping positions

genes_at_ranges

Report genes overlapping ranges

type

Report the type of base change: transition, transversion, indel, unknown or none at all

transcript_offsets

Computes the offset inside the coding sequence of the transcripts overlapped the genomic mutations that overlap them.

mutated_isoforms

Computes the consequence of genomic mutations in terms of amino-acid changes in protein isoforms

All isoforms of a gene are reported unless principal is selected, in which case, only consequences in princial isoforms will be reported (as defined by Appris)

splicing_mutations

Find mutations that may affect the splicing of protein coding transcripts

mutated_isoforms_fast

One-step implementation of the mutated_isoforms task

All isoforms of a gene are reported unless principal is selected, in which case, only consequences in princial isoforms will be reported (as defined by Appris)

affected_genes

Finds genes affeted by genomic mutations, either by amino-acid changes on their protein products, or by changes in splicing sequences

binomial_significance

For a list of mutations, find genes that suffer a higher rate of mutation than expected. Considers only relevant mutations

Uses a binomial distribution with a global probablity of mutation estimated from the data. Considers only exon bases

binomial_significance_syn

For a list of mutations, find genes that suffer a higher rate of mutation than expected. Considers also synonymous mutations

Uses a binomial distribution with a global probablity of mutation estimated from the data. Considers only exon bases

expanded_vcf

Expands the INFO and FORMAT/Sample fields of VCF files in to a standard TSV format

genomic_mutations

Extract genomic mutations from a VCF file that match a quality criteria

About

Utilities to investigate the consequence of genomic mutations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages