expired ftp links #3

mfazel · 2022-09-20T20:52:10Z

Hi Mario,

I was trying to use MARIO today after cloning the repo and realized it fails to download an experiment after several attempts.
By looking at the code I noticed it uses wget to download the SRA file and it's been a while that NIH moved their data to other places like Amazon etc. Can you update the code according to new locations and also I was wondering why wget is used instead of direct use of fastq-dump SRR_ID in the first place?

Thanks
Mehdi

ernstki · 2022-09-23T11:48:43Z

Hi, Medhi. Thanks for bringing this to our attention. I'll have a look.

mfazel · 2022-09-23T17:36:39Z

Thanks Kevin, I appreciate it.

Meanwhile I have a couple more questions.
I managed to bypass the download step by manually downloading SRA file using fastq-dump.
The alignment and calling peaks is successful but at the last step I get an error which could not figure out the reason just by looking at parts of the pipeline, and afraid I can not spent a lot of time to decode your code ;-)
1- As I understand, -n tells the pipeline to annotate the results
a) How do I prevent annotation? Reason is, I assume -n means annotate and if I don't add it, mean's I don't want annotations. Then why it complains about not finding annotation file or I had to comment it out in the config?
b) How these annotation files should look like? Is it possible to include them in the repo or at least post the head so users know what should be the content. My guess was that a gtf to bed should work, at least for one of them.
I did not find much explanation here on github readme file about theses.

2- I followed the steps mentioned on github but it did not work completely. According to Readme: Find ADBs from BAM files:
MARIO -dA outputs/SRR.bam -O outputs -C CONFIG.txt -G variants.xls
ERROR: Missing options -c or -B
Adding "-c" worked to the next step, however I don't know how to bypass calling peaks since I already have them.

3- Is it possible for the pipeline to throw more meaningful errors. Here is what I got in the next step, but not sure what should I provide to make it work, is it a dependency issue or any unprovided input file.
MARIO -cdA outputs/SRR.bam -O outputs -C CONFIG.txt -G variants.xls
Can't locate object method "split_gen_by_chr" via package "0" (perhaps you forgot to load "0"?) at MARIO line 510.

Thanks again,
Mehdi

ernstki · 2022-09-23T18:16:17Z

I don't have any deep insights about MARIO's internals, because I'm just maintaining code that was written by a former lab member. But I can look into those things. If the README doesn't match reality, that's a problem we need to fix.

Given the number of options, and the numerous combinations of those options possible, I think the only viable solution going forward is to have an automated test suite that tries to hit all the common use cases. However, we're not using the MARIO pipeline much anymore internally, so I can't promise if or when we'll be able to dedicate the resources to do that.

Either way, I'll try to test the two particular invocations you tried locally on my end and see what the result is. I may need to open separate issues for each of those, just to keep this issue on track with fixing the FTP link problem. Hope that makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expired ftp links #3

expired ftp links #3

mfazel commented Sep 20, 2022

ernstki commented Sep 23, 2022

mfazel commented Sep 23, 2022

ernstki commented Sep 23, 2022 •

edited

expired ftp links #3

expired ftp links #3

Comments

mfazel commented Sep 20, 2022

ernstki commented Sep 23, 2022

mfazel commented Sep 23, 2022

ernstki commented Sep 23, 2022 • edited

ernstki commented Sep 23, 2022 •

edited