Skip to content

cfbuenabadn/sc_splicing_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 

Repository files navigation

Miscellaneous tools for alternative splicing

GTF2psix.py

Requirements

Requires gtfparse and tqdm, in addition to common Anaconda modules NumPy, Pandas, Argparse and gzip.

Description

This script creates a Psix-compatible annotation of cassette exons and constitutive introns directly from a GTF file. This annotation consists of a table specifying the location (chromosome, start and end) of splice junctions. Splice junctions are annotated as supporting the inclusion of a cassette exon (_I1 and _I2), supporting its exclusion (_SE), or constitutive (_CI). You can download ready-to-use mouse (mm10) and human (hg38) annotations here.

The annotation is a table file with the following format:

name intron event gene
ENSG00000157881_NMD_1_I1 1:2514467-2515257:- ENSG00000157881_NMD_1 ENSG00000157881
ENSG00000157881_NMD_1_I2 1:2515402-2515561:- ENSG00000157881_NMD_1 ENSG00000157881
ENSG00000157881_NMD_1_SE 1:2514467-2515561:- ENSG00000157881_NMD_1 ENSG00000157881
ENSG00000157881_ProteinCoding_1_I1 1:2521316-2521717:- ENSG00000157881_ProteinCoding_1 ENSG00000157881
ENSG00000157881_ProteinCoding_1_I2 1:2521801-2526463:- ENSG00000157881_ProteinCoding_1 ENSG00000157881
ENSG00000157881_ProteinCoding_1_SE 1:2521316-2526463:- ENSG00000157881_ProteinCoding_1 ENSG00000157881
... ... ... ...

Each row corresponds to a splice junction or intron. The first column is a name assigned to the splice junction, that is based on the name of the gene that contains the removed intron, and whether it supports the inclusion of a cassette exon (_I1 and _I2), supports its exclusion (_SE), or it is a constitutive intron (_CI). The second column are the intron coordinates in the genome. The third column has the name of the splicing element: for example, ENSG00000157881_ProteinCoding_1 is a protein coding cassette exon, and it is formed by three introns: ENSG00000157881_ProteinCoding_1_I1, ENSG00000157881_ProteinCoding_1_I2 and ENSG00000157881_ProteinCoding_1_SE. The fourth column contains the gene name.

Runing GTF2psix

To create an annotation from a GTF file, download GTF2psix.py and run as follows

python GTF2psix.py --gtf annotation.gtf -o psix_annotation

Optional arguments

--gene_name specifies the tag for gene names to use in the GTF file. For example --gene_name gene_name will use the gene_name tag from the 9th column of the GTF file. The default is gene_id.

--gene_type_tag. Some GTF files have different tags for the gene types. E.g., gene_type or gene_biotype. Specify the tag with this option. Default: gene_type.

--transcript_type_tag. Some GTF files have different tags for the transcript types. E.g., transcript_type or transcript_biotype. Specify the tag with this option. Default: transcript_type.

Use the argument --gene_type to limit the annotation to a specific type of genes. E.g., for an annotation of protein coding genes only, use --gene_type protein_coding.

You can remove some chromosomes from the annotation using the argument --exclude_chromosome. E.g., --exclude_chromosome chrM,chrY will exclude chrM and chrY from the annotation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages