Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be explicit about inputs/outputs #78

Open
olgabot opened this issue Apr 7, 2017 · 0 comments
Open

Be explicit about inputs/outputs #78

olgabot opened this issue Apr 7, 2017 · 0 comments

Comments

@olgabot
Copy link
Collaborator

olgabot commented Apr 7, 2017

Description

Initially, outrigger was made to be convenient at the expense of being modular. The simplicity of the three commands below relies directly on the folder structure and file names, which is not modular, so @alaindomissy doesn't like it :)

outrigger index --sj-out-tab *SJ.out.tab \
    --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
outrigger validate --genome mm10 \
    --fasta /projects/ps-yeolab/genomes/mm10/GRCm38.primary_assembly.genome.fa
outrigger psi

Proposed changes

Split outrigger index into three parts

The first "step" is really three steps:

  1. Count junction reads and output this as a file
  2. Detect exons and output this as a file
  3. Search for alternative exons

Each of these could be separated out because maybe someone has already counted junction reads using a different program and they just want to detect exons! I personally have run into a problem where I wanted to just count junction reads and nothing else and realized I couldn't.

Finally, each of these steps would explicitly take files, rather than inferring them from the structure (which could lead to surprising bugs).

Explicitly define .gtf file differently from a gffutils Feature database

Importantly, in step 2 of outrigger index, this is where the .gtf annotation file gets used, but, if a file with the same name but .gtf.db exists, then that file is presumed to be the gffutils database, which is bad. It'd be better to have a mutually exclusive argument that can be either --gtf/--db

Explicitly define splice types in outrigger validate

Right now, outrigger validate looks for the structure outrigger_output/index/$SPLICE_TYPE/events.csv, where $SPLICE_TYPE becomes a variable and has to be one of two secret splice types... also bad. Better would be:

outrigger validate --se outrigger_output/index/se/events.csv --mxe outrigger_output/index/mxe/events.csv

Explicitly define splice types in outrigger psi

Similar to outrigger validate, the command outrigger psi alone, with no arguments, looks for the file outrigger_output/index/$SPLICE_TYPE/events.csv and outrigger_output/junctions/reads.csv

Better would be:

outrigger psi --se outrigger_output/index/se/events.csv --mxe outrigger_output/index/mxe/events.csv --reads outrigger_output/junctions/reads.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant