Skip to content

ed-lau/jcast

Repository files navigation

Junction Centric Alternative Splicing Translator v.0.3.5

JCAST (Junction Centric Alternative Splicing Translator) takes in alternative splicing events and returns custom protein sequence databases for isoform analysis.

See https://ed-lau.github.io/jcast/ for Documentation and Usage.

Installation

Requirements

Install Python 3.7+ and pip. See instructions on Python website for specific instructions for your operating system.

JCAST can be installed from PyPI via pip. We recommend using a virtual environment.

$ pip install jcast

Running

Launch JCAST as a module (Usage/Help):

$ python -m jcast

Alternatively:

$ jcast

Example command:

$ python -m jcast  data/encode_human_pancreas/ data/gtf/Homo_sapiens.GRCh38.89.gtf data/gtf/Homo_sapiens.GRCh38.89.gtf data/genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa -o encode_human_pancreas -q 0 1 -r 1 -m -c

To test that the installation can load test data files in tests/data (sample rMATS file and human chr 15 genome files)

$ pip install tox
$ tox

To run JCAST using the test files and print the results to Desktop

$ python -m jcast {j}/tests/data/rmats {j}/tests/data/genome/Homo_sapiens.GRCh38.89.chromosome.15.gtf  {j}/tests/data/genome/Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz -o ~/Desktop

All Arguments

python -m jcast -h
usage: __main__.py [-h] [-o OUT] [-r READ] [-m] [-c] [-q q_lo q_hi] [--g_or_ln G_OR_LN] rmats_folder gtf_file genome

jcast retrieves transcript splice junctionsand translates them into amino acid sequences

positional arguments:
  rmats_folder          path to folder storing rMATS output
  gtf_file              path to Ensembl gtf file
  genome                path to genome file

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     name of the output files [default: psq_out]
  -r READ, --read READ  the lowest skipped junction read count for a junction to be translated [default: 1]
  -m, --model           models junction read count cutoff using a Gaussian mixture model [default: False]
  -c, --canonical       write out canonical protein sequence even if transcriptslices are untranslatable [default: False]
  -q q_lo q_hi, --qvalue q_lo q_hi
                        take junctions with rMATS fdr within this threshold [default: 0 1]
  --g_or_ln G_OR_LN     Switch on distribution to use for low end of histogram, 0 for Gamma, anything else for LogNorm

Dependencies

JCAST has been tested in Python 3.7, 3.8, 3.9 and uses the following packages:

biopython>=1.78
gtfparse>=1.2.1
pandas>=1.3.0
requests>=2.24.0
tqdm>=4.61.2
scikit-learn>=1.0
matplotlib==3.4.2
scipy>=1.7.0

Known Issues

  • rMATS output with rows containing NA as gene name can fail.
  • Upstream analyses should be performed using an unmasked genome. Currently JCAST cannot handle masked nucleotides (N).

Additional Information

Additional details on troubleshooting and result interpretation can be found in our publication in STAR Protocols.

Contributing

Please contact us if you wish to contribute, and submit pull requests to us.

Contributors

  • Edward Lau, PhD - Code/design - ed-lau
  • Maggie Lam, PhD - Code/design - Maggie-Lam
  • Robert Wes Ludwig, BSc - Modeling - WesLudwig

License

This project is licensed under the MIT License - see the LICENSE.md file for details