Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Treeshop's ability to recognize R1/R2 naming conventions #18

Open
klearned opened this issue Jun 1, 2018 · 1 comment
Open
Assignees

Comments

@klearned
Copy link

klearned commented Jun 1, 2018

Improve Treeshop's ability to recognize R1/R2 naming conventions

Background:
The Makefile currently contains two regex lines to recognize which primary files are R1 and which are R2:

R1 = $(shell find samples -iregex ".+1[^0-9]*$$" | head -1)
R2 = $(shell find samples -iregex ".+2[^0-9]*$$" | head -1)

However, lately, these regex don't work and we have to change them by hand in the Makefile to the following in order to recognize the naming convention of the files we've been getting lately:

R1 = $(shell find samples -iregex ".+R1[^0-9]+.+" | head -1)
R2 = $(shell find samples -iregex ".+R2[^0-9]+.+" | head -1)

Solution suggested by Ellen:
The fab file should use a more sophisticated detection mechanism than a regex and then send THAT to the makefile

@rcurrie
Copy link
Contributor

rcurrie commented Jun 1, 2018

Suggest we keep the default Makefile regex and then override it from fabfile. We've been down this rabbit hole many times and there is no one size fits all so the Makefile should work well with the common case (likely R1/R2) with fab trying to disambiguate.

e-t-k added a commit that referenced this issue Nov 15, 2022
* add ercc thops#466 - reference section

* add troubleshooting for 'Needed to prompt...

* update treeshop.md with new pipelines

* fix makefile expression message

* improve R1/R2 detection in Makefile (#18)

* ERCC - expression step (untested)
add erccexpression step to Makefile (tested, works)
and fabfile (untested).

currently output files that are not ideal are:
- kallisto file
- rsem_genes.hugo.results
(see issue)

* ERCC - qc step (untested)

Added qc step to Makefile (currently running)
and fabfile (fully untested)

* ercc fabfile bugfix

its stringly typed in the process( signature!
Convert it to an actual bool. hilarious.

* ERCC - remaining steps (UNTESTED)

added ERCC option for pizzly, fusion, jfkm, variants
mostly just changes the output dir, a few of them
that drop files in primary / derived need to change the bam names too

totally untested, not even executed.

* single whitespace typo

* removed grep -v -- not working.

so the previous version is broken because i forgot the pipe  character
but i tried putting it in - so the last line is
| grep -v "DEBUG toil"

and it's not sucessfully filtering the lines.

I'm not sure if the pipe is running inside or outside the docker
and im not sure whether docker is sending things to stdout or stderr or what.
So for now I just totally remove it.

(so no, there is not a committed version with the pipe in --
I tried running it without committing and it did run but didn't filter the lines. )

* bugfixes in fabfile.py for ERCC

still in progress, not tested.

> can't hardlink some bams because they are owned by root.
but can move them because ubuntu owns the parent dir. so just move them to a name with
ERCC in them, download, and move back instead.

> fixed longstanding typo "Unable find any fastqs or bams...

* makefile bugfixes for ERCCC untested

- add the --logInfo flag to expression_ercc docker to hopefully
get rid of debug output for real

- fix qc_ercc - wasn't properly giving it the path to the reference file

* ercc - bugfix

removed a wayward do_ercc (should be ercc) that caused pizzly to crash
and 1 more thing.
mostly works.

* Fix fusion potential hang

(this change applies to both standard and ERCC-transcript runs)

fix situation where fusion would hang indefinitely if it didn't generate proper output and instead left behind a _STARtmp folder with a named pipe inside it -- fab would try to download the pipe and it would never say it was done.

With this version -- if it doesn't find any fusion output files at all, it will accept that and continue on with variants and jfkm before moving to the next sample.

this is the version of the fabfile i am testing right now

* tested on ERCC path but not non-ERCC path

- ERCC - run expression and QC only - skip  pizzly, fusions, jfkm, variants.
(However the ERCC toggles are still within those steps if we change and want to run them.)

Non-ERCC Change:
If fusion fails, the pipeline will continue onward and make a note at the end

* Create ercc.md

ERCC-aware pipeline: add documentation (thops#466)

* Update treeshop.md

add notes about acceptable fastq names

* Update README.md

separated out make from git clone to hopefully clarify that its not mandatory

* suppress toil debug output from expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants