Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expired ftp links #3

Open
mfazel opened this issue Sep 20, 2022 · 3 comments
Open

expired ftp links #3

mfazel opened this issue Sep 20, 2022 · 3 comments

Comments

@mfazel
Copy link

mfazel commented Sep 20, 2022

Hi Mario,

I was trying to use MARIO today after cloning the repo and realized it fails to download an experiment after several attempts.
By looking at the code I noticed it uses wget to download the SRA file and it's been a while that NIH moved their data to other places like Amazon etc. Can you update the code according to new locations and also I was wondering why wget is used instead of direct use of fastq-dump SRR_ID in the first place?

Thanks
Mehdi

@ernstki
Copy link
Contributor

ernstki commented Sep 23, 2022

Hi, Medhi. Thanks for bringing this to our attention. I'll have a look.

@mfazel
Copy link
Author

mfazel commented Sep 23, 2022

Thanks Kevin, I appreciate it.

Meanwhile I have a couple more questions.
I managed to bypass the download step by manually downloading SRA file using fastq-dump.
The alignment and calling peaks is successful but at the last step I get an error which could not figure out the reason just by looking at parts of the pipeline, and afraid I can not spent a lot of time to decode your code ;-)
1- As I understand, -n tells the pipeline to annotate the results
a) How do I prevent annotation? Reason is, I assume -n means annotate and if I don't add it, mean's I don't want annotations. Then why it complains about not finding annotation file or I had to comment it out in the config?
b) How these annotation files should look like? Is it possible to include them in the repo or at least post the head so users know what should be the content. My guess was that a gtf to bed should work, at least for one of them.
I did not find much explanation here on github readme file about theses.

2- I followed the steps mentioned on github but it did not work completely. According to Readme: Find ADBs from BAM files:
MARIO -dA outputs/SRR.bam -O outputs -C CONFIG.txt -G variants.xls
ERROR: Missing options -c or -B
Adding "-c" worked to the next step, however I don't know how to bypass calling peaks since I already have them.

3- Is it possible for the pipeline to throw more meaningful errors. Here is what I got in the next step, but not sure what should I provide to make it work, is it a dependency issue or any unprovided input file.
MARIO -cdA outputs/SRR.bam -O outputs -C CONFIG.txt -G variants.xls
Can't locate object method "split_gen_by_chr" via package "0" (perhaps you forgot to load "0"?) at MARIO line 510.

Thanks again,
Mehdi

@ernstki
Copy link
Contributor

ernstki commented Sep 23, 2022

I don't have any deep insights about MARIO's internals, because I'm just maintaining code that was written by a former lab member. But I can look into those things. If the README doesn't match reality, that's a problem we need to fix.

Given the number of options, and the numerous combinations of those options possible, I think the only viable solution going forward is to have an automated test suite that tries to hit all the common use cases. However, we're not using the MARIO pipeline much anymore internally, so I can't promise if or when we'll be able to dedicate the resources to do that.

Either way, I'll try to test the two particular invocations you tried locally on my end and see what the result is. I may need to open separate issues for each of those, just to keep this issue on track with fixing the FTP link problem. Hope that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants