Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proof of concept genomepy integration #323

Merged
merged 14 commits into from
Jun 1, 2020

Conversation

simonvh
Copy link
Contributor

@simonvh simonvh commented May 24, 2020

Hey @daler, this is more of a question at the moment than a full-fledged PR. Would you be interested in including genomepy support in pybedtools? I have a proof-of-concept implementation in this PR. Genomepy is a Python module to manage and use genomes. It supports download of genomes from UCSC, Ensembl or NCBI and streamlines a lot of things. One file it creates by default is a file with chromosome sizes. This can be accessed using genomepy as follows:

g = genomepy.Genome("hg38")
print(g.sizes_file)
/home/simon/.local/share/genomes/hg38/hg38.fa.sizes

The idea here would be that if genomepy is installed, the name of the genome could be used as the genome argument to all of the pybedtools functions that require a genome. There is, I think, not a lot of overhead if you don't have genomepy installed, and it wouldn't touch any of the existing functionality. However, if you do have genomepy, there is no need to provide a full path. It would be really useful to us, but then again, as the developers of genomepy we use it a lot :).

Let me know what you think!

@daler
Copy link
Owner

daler commented May 24, 2020

@simonvh yes, this is a good idea! This is my first time hearing of genomepy and it looks really useful. (edit: apparently it's not the first I've heard of it...looks like I had starred it a while ago but never tried it)

I think this proof of concept is pretty complete; I think all it would need is a test with and without genomepy, and maybe including genomepy in https://github.com/daler/pybedtools/blob/master/optional-requirements.txt.

Thanks for the contribution!

@simonvh
Copy link
Contributor Author

simonvh commented May 25, 2020

OK, sounds good, thanks! We'll update and add a few tests. I'll ping you when I think it's ready for review.

@simonvh
Copy link
Contributor Author

simonvh commented May 25, 2020

@daler Tests are implemented, and I added a short text to the docs in what I think is the relevant place. Let me know if you need additional changes.

@daler
Copy link
Owner

daler commented Jun 1, 2020

@simonvh sorry for the lag time. Went down a couple different rabbit holes trying to fix the failing test.

Some notes for my future reference:

  1. For some reason, conda was pulling in an earlier version (0.5.5 I think?) of genomepy. That caused one of the new tests to fail because the support for chromsizes from fasta file was not in that version. Solution was to pin genomepy to a recent (>0.8) version in the optional requirements.

  2. pybedtools does some import-time trickery to build the docstrings from bedtools command-line programs. There's also a mechanism for setting paths, and that does module reloading stuff too. I finally figured out that the order of the pytests was important! The test_genomepy_integration.py import manipulation was breaking the import manipulation in the other tests. The solution was to run the genomepy test on its own in a separate pytests call.

  3. python version issues. There are a couple of py27 stragglers out there I'm trying to support, so I'm still testing on py27. Had to remove genomepy from the requirements for the py27 tests because only an early (0.5x) is available and would otherwise cause the issue in 1) above. On py35, bucketcache (a dep of genomepy) was causing conda solver conflicts in conda. The solution there was to remove py35 from the tests since I don't know of anyone specifically needing py35.

@daler daler merged commit 3d83fbd into daler:master Jun 1, 2020
@daler
Copy link
Owner

daler commented Jun 1, 2020

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants