Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble accessing preloaded datasets #26

Closed
lkothera opened this issue Mar 26, 2019 · 12 comments
Closed

Having trouble accessing preloaded datasets #26

lkothera opened this issue Mar 26, 2019 · 12 comments

Comments

@lkothera
Copy link

Hi, novice Linux user here.

I work for the CDC and our scientific computing people have installed CATCH on our biolinux platform. I have loaded CATCH and was trying to run the line of code to have the program make probes for the installed Zika virus data set. I'm getting error messages that seem to say the .gz file can't be found, although if I move around the directories, I can see the .gz file that is supposed to be used to generate the probe designs.

Here is the line of code and the error messages:
fph6@biolinux> design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta --verbose
2019-03-26 15:09:11,298 - catch.utils.seq_io [INFO] Reading fasta file /apps/x86_64/python/3.6.1/lib/python3.6/sit
e-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz
Traceback (most recent call last):
File "/apps/x86_64/catch/catch/bin/design.py", line 811, in
main(args)
File "/apps/x86_64/catch/catch/bin/design.py", line 60, in main
genomes_grouped += [seq_io.read_dataset_genomes(dataset)]
File "/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/utils
/seq_io.py", line 71, in read_dataset_genomes
seqs = list(read_fasta(fn).values())
File "/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/utils
/seq_io.py", line 152, in read_fasta
with gzip.open(fn, 'rt') as f:
File "/apps/x86_64/python/3.6.1/lib/python3.6/gzip.py", line 53, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/apps/x86_64/python/3.6.1/lib/python3.6/gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/cat
ch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz'

Can you help?
Thanks,
Linda

@haydenm
Copy link
Collaborator

haydenm commented Mar 27, 2019

Hi Linda,

I'm sorry to hear about the issue. I haven't seen this before, and it's not obvious to me what the cause of the problem is if you're able to see that the .gz file is there. It may have something to do with how CATCH was installed on your platform.

Can you start by running ls -l /apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz and pasting the results, so I can see the file size? If the size is small, it may consist of only the hash and suggest the data has not been pulled via git lfs pull, although I'm not sure if this would yield the FileNotFoundError.

@lkothera
Copy link
Author

lkothera commented Mar 27, 2019 via email

@haydenm
Copy link
Collaborator

haydenm commented Mar 27, 2019

I'm not certain, but based on the path you provided (containing an egg) it looks like CATCH may have been installed by your team using easy_install, which I haven't used or tested. As noted in the README, I'd recommend pip -- in particular (but optionally), from within a virtual environment. Installing via conda is another option.

It looks like the design.py on your PATH is in a different directory than where the data lives. One quick fix might be to try running python /apps/x86_64/catch/bin/design.py zika ..., instead of design.py zika .... Can you let me know if that works?

@lkothera
Copy link
Author

lkothera commented Mar 27, 2019 via email

@haydenm
Copy link
Collaborator

haydenm commented Mar 27, 2019

There's a space in python /apps/x86_64/catch/catch/bin/ design.py between bin/ and design.py. Does it work if you run it without that space?

@lkothera
Copy link
Author

lkothera commented Mar 27, 2019 via email

@lkothera
Copy link
Author

lkothera commented Mar 27, 2019 via email

@lkothera
Copy link
Author

lkothera commented Mar 27, 2019 via email

@haydenm
Copy link
Collaborator

haydenm commented Mar 28, 2019

Unfortunately, I think this is going to be tough to resolve given how it was installed. As I mentioned earlier, because of the egg file in the site-packages directory, I suspect that CATCH was installed using Distutils (python setup.py install) or with easy_install. I have not tested it this way, and can't recommend it. The basic problem, when installing this way, is that the installation is copying Python files into the egg file, but not the data -- and consequently the Python modules are unable to locate the data, which would normally be in the same directory structure. These installation methods should be fine if you do not plan to use the data distributed with CATCH, so you could alternatively move on to just use your own input FASTA files.

I think this will be easiest to resolve by asking your compute team if they could reinstall CATCH, using pip, as recommended in the README: via pip install -e . or pip install --user -e .. (Either way, the -e is needed to use the data distributed with the package.) It would also be helpful if they could run the test suite, as described in the README, to verify that everything is working correctly.

@lkothera
Copy link
Author

lkothera commented Mar 28, 2019 via email

@haydenm
Copy link
Collaborator

haydenm commented Mar 28, 2019

Yes, of course. Katie Siddle (kjsiddle@broadinstitute.org), my co-first author on the paper, is the right person to reach out to about those questions. Or you can email me (hayden@mit.edu) and I'll pass them along.

@lkothera
Copy link
Author

Thank you!

@haydenm haydenm closed this as completed Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants