Use pyfaidx as the default SequenceFileDB backend, add support for Python 3 #25

mdshw5 · 2016-04-21T15:03:08Z

Also added an option to ignore transcript version numbers when they don't match with the HGVS representation. That was a minor change, and something I wouldn't mind throwing out as I don't think it's a good idea in practice.

mdshw5 · 2016-04-21T15:04:26Z

It's also worth noting that tests are Passing on @travisci for Pythons 2.6, 2.7, 3.4, 3.5.

jdavisp3 · 2016-04-21T16:00:31Z

Adding the venv2.7 directory was not intentional, right?

mdshw5 · 2016-04-21T18:04:18Z

:) Completely unintentional.

lucaswiman · 2016-04-26T01:26:26Z

examples/example1.py

@@ -72,10 +73,10 @@ def get_transcript(name):
 # hgvs_name.ref_allele = 'A'
 # hgvs_name.alt_allele = 'G'

-print (hgvs_name.transcript,
+print((hgvs_name.transcript,


This is a bit of a nit, and I realize this is technically valid Python 2 code, but to guarantee compatibility between Python 2.7 and Python 3, could you add a from __future__ import print_function to the top of files that use print as a function?

No problem - will do.

lucaswiman · 2016-04-26T01:36:27Z

pyhgvs/variants.py

@@ -173,13 +173,13 @@ def _on_forward_strand(self):
            seq_3p = self.seq_3p
            self.seq_5p = revcomp(seq_3p)
            self.seq_3p = revcomp(seq_5p)
-            self.alleles = map(revcomp, self.alleles)
+            self.alleles = list(map(revcomp, self.alleles))


Nit: a list comprehension would be more idiomatic (and prevent duplicating memory overhead in python 2).

lucaswiman · 2016-04-26T01:41:23Z

This is a really nice cleanup, @mdshw5!

lucaswiman · 2016-04-28T00:26:39Z

These changes LGTM from a code perspective. @mdrasmus: Do you have any comments about the pyfaidx changes before merge?

naegelyd · 2017-03-08T16:45:42Z

This LGTM too. @mdrasmus could you take a look at this?

mdrasmus

I had only a few questions and suggestions, otherwise this looks pretty good to me.

Thanks these contributions!

mdrasmus · 2017-03-09T05:39:22Z

examples/example1.py

-# Read genome sequence using pygr.
-genome = SequenceFileDB('hg19.fa')
+# Read genome sequence using pyfaidx.
+genome = Genome('/tmp/hg19.fa')


Does Genome() allow relative paths? If so, I think the relative paths may be more convenient for people to try out the library. If it requires absolute paths, then this is a very reasonable change to make.

Genome should actually be Fasta from the pyfaidx library. I'm not sure why I write differently. Yes, Fasta supports relative paths, as it's using python base library IO classes, so it works as you would expect.

mdrasmus · 2017-03-09T05:46:48Z

pyhgvs/__init__.py

    """
    Parse an HGVS name into (chrom, start, end, ref, alt)

    hgvs_name: HGVS name to parse.
    genome: pygr compatible genome object.
    transcript: Transcript corresponding to HGVS name.
    normalize: If True, normalize allele according to VCF standard.
+    lazy: If True, discard version information from incoming transcript/gene.


I can see that this is convenient. So it SGTM. However, the user should be very careful though, because sequence can change quite a bit between transcript versions, especially indel changes.

Yes, I agree this is dangerous. I can't even remember what my use case for this was, so it can just as well be dumped.

mdrasmus · 2017-03-09T05:47:42Z

pyhgvs/__init__.py

    """
    hgvs = HGVSName(hgvs_name)

    # Determine transcript.
    if hgvs.kind == 'c' and not transcript:
+        if '.' in hgvs.transcript and lazy:
+            hgvs.transcript, version = hgvs.transcript.split('.')
+        elif '.' in hgvs.gene and lazy:


Curious, can genes have versions? I don't believe I have seen a HUGO identifier with a .{version} suffix.

I think this might have been a case where I was being too careful to consider all cases. According to Ensembl, genes to not appear to be versioned within annotation releases.

mdrasmus · 2017-03-09T05:49:14Z

pyhgvs/tests/genome.py

@@ -115,7 +119,7 @@ def read(self, filename):

        filename: a filename string or file stream.
        """
-        if isinstance(filename, basestring):


@lucaswiman do you suggest a six usage here that allows python 2 unicode filenames?

That's a good catch. Should probably use string_type instead.

I don't think that's quite completely accurate. Both str (textual) and bytes (binary) can be used as filenames in both python 2 and python 3, but six.string_types does not include bytes in Python 3. IMO the most Pythonic way to write this would be to use duck typing:

if hasattr(filename, 'read'): infile = filename else: with open(filename) as infile: return self.read(infile)

mdrasmus · 2017-03-09T05:50:50Z

pyhgvs/variants.py


    def _trim_common_prefix(self):
        """
        Trim the common prefix amongst all alleles.
        """
-        minlength = min(map(len, self.alleles))
+        minlength = min(list(map(len, self.alleles)))


I believe min can take an iterable. Did you find that list was needed here?

I think list was probably a holdover from running 2to3, which tends to assume all iterables need to become sequence types.

mdshw5 · 2017-03-09T14:37:31Z

Thanks for the code review! It seems like I've deleted my original branch, so if you're interested in making any of the changes feel free. If you need me to pull this PR, modify, and submit a new PR I can do that, but things are a bit tricky since the PR lives entirely within your repo now...

mdrasmus · 2017-03-09T18:52:27Z

Thanks for the code review! It seems like I've deleted my original branch, so if you're interested in making any of the changes feel free. If you need me to pull this PR, modify, and submit a new PR I can do that, but things are a bit tricky since the PR lives entirely within your repo now...

No worries. I can follow up with the last few nits and merge. Thanks again @mdshw5.

davmlaw · 2019-09-25T06:03:41Z

Hi, this looks good, any chance it can be merged into master?

I have come to depend on this project but it appears that nobody is maintaining it and taking fixes. Any chance I could be given commit access?

mdshw5 · 2019-09-25T15:20:32Z

Sorry to have deleted my fork. You can get a patch from the PR using this URL: https://github.com/counsyl/hgvs/pull/25.patch.

davmlaw · 2019-09-26T00:48:43Z

Thanks, yeah I actually implemented the same basic change (faidx & Python 3) myself, then when I ralised you did it, re-forked, merged your code so that at least it reduces the divergence from this project if it ever gets patched....

Personally I use a database to randomly load transcripts (without requiring loading a massive hash table to reduce startup time), but I'd also like to implement a quick tabix genePred version which I'm sure lots of people would find useful.

Can you contact the authors and ask for commit access so you can merge this? Or if they're not interested, should we make a pyhgvs2? I will be happy to contribute my fixes to your version if you want to host it (or we can both have commit access)

The other Python HGVS library (Invitae) won't install, and has open bugs about it. With the official branch of this project being Python2 only (which will die in a few months) and a few HGVS conversion bugs open this really needs an active maintainer IMO.

REP-1044: Handle inversion event

jtratner · 2019-11-21T01:41:29Z

@davmlaw @mdshw5 - I'm not 100% sure why we didn't push commits from these changes publicly before, but I think they're actually in the master branch of this repo now (would love a double check from you @davmlaw). Unfortunately Github marked this PR as "merged" because of difference in numbering between our internal and the remote one - HOWEVER, I think now all of these changes are actually in the public repo.

We use the code currently on master in production at Myriad Women's Health.

I'm also going to advocate for us to eliminate our internal hgvs repo so that we do all development going forward on the public one.

mdshw5 · 2019-11-21T15:29:20Z

Thanks @jtratner! Sounds like a bit of a headache - good luck!

mdshw5 added 21 commits January 30, 2014 15:42

Refactor for python3 and use pyfaidx instead of pygr

26c2fe6

update setup and example

dfff0e2

ignore virtualenv

71c5714

Updated Readme

f4c969b

Update Readme

aea10ae

revert to 1-based coordinates

6e782d2

Update nose tests to work on Python3

8445d42

Example description updated

c8a8c23

correct typo in attribution

df20e52

Add lazy evaluation of transcript/gene accession version numbers.

d4b420b

Update and rename requirements-dev.txt to requirements.txt

bdbe101

Delete runtests.sh

408144c

Travis-CI integration

fd7b8a5

Updated slicing methods

f981b99

Test case for direct refseq file updated with wrapped fasta file

aef53cb

Build status [skip ci]

e807a1f

Merge upstream master and update pyfaidx API.

d75da86

Add pyfaidx as requirement, tests are passing.

a8e27bd

Bytes -> String

a9996a7

Hack for py3

3ed2aa4

Screw python 3.2

bab230a

Let's try pypy

83fc78b

mdshw5 added 2 commits April 21, 2016 14:05

@jdavisp3 pointed out that I added a virtualenvironment - removing that.

a8daf8e

Merge branch 'master' of https://github.com/mdshw5/hgvs

d520822

lucaswiman reviewed Apr 26, 2016
View reviewed changes

Implement recommendations from @lucaswiman code review.

7f2f454

mdrasmus approved these changes Mar 9, 2017

View reviewed changes

naegelyd mentioned this pull request Mar 23, 2017

Update copyright and remove obsolete config #37

Closed

ctk3b self-assigned this Oct 10, 2017

ctk3b mentioned this pull request Oct 13, 2017

Python 3 support #42

Closed

davmlaw mentioned this pull request Sep 25, 2019

Incorrect HGVS to VCF conversion for some genomic indels #50

Open

jtratner pushed a commit that referenced this pull request Nov 20, 2019

Merge pull request #25 from dev/REP-1044-add-inversion-naming

cc06bfe

REP-1044: Handle inversion event

jtratner merged commit 7f2f454 into counsyl:master Nov 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pyfaidx as the default SequenceFileDB backend, add support for Python 3 #25

Use pyfaidx as the default SequenceFileDB backend, add support for Python 3 #25

mdshw5 commented Apr 21, 2016

mdshw5 commented Apr 21, 2016

jdavisp3 commented Apr 21, 2016

mdshw5 commented Apr 21, 2016

lucaswiman Apr 26, 2016

mdshw5 Apr 26, 2016

lucaswiman Apr 26, 2016

mdshw5 Apr 26, 2016

lucaswiman commented Apr 26, 2016

lucaswiman commented Apr 28, 2016

naegelyd commented Mar 8, 2017

mdrasmus left a comment

mdrasmus Mar 9, 2017

mdshw5 Mar 9, 2017

mdrasmus Mar 9, 2017

mdshw5 Mar 9, 2017

mdrasmus Mar 9, 2017

mdshw5 Mar 9, 2017

mdrasmus Mar 9, 2017

mdshw5 Mar 9, 2017

lucaswiman Mar 9, 2017

mdrasmus Mar 9, 2017

mdshw5 Mar 9, 2017

mdshw5 commented Mar 9, 2017

mdrasmus commented Mar 9, 2017

davmlaw commented Sep 25, 2019

mdshw5 commented Sep 25, 2019

davmlaw commented Sep 26, 2019 •

edited

Loading

jtratner commented Nov 21, 2019

mdshw5 commented Nov 21, 2019

Use pyfaidx as the default SequenceFileDB backend, add support for Python 3 #25

Use pyfaidx as the default SequenceFileDB backend, add support for Python 3 #25

Conversation

mdshw5 commented Apr 21, 2016

mdshw5 commented Apr 21, 2016

jdavisp3 commented Apr 21, 2016

mdshw5 commented Apr 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucaswiman commented Apr 26, 2016

lucaswiman commented Apr 28, 2016

naegelyd commented Mar 8, 2017

mdrasmus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdshw5 commented Mar 9, 2017

mdrasmus commented Mar 9, 2017

davmlaw commented Sep 25, 2019

mdshw5 commented Sep 25, 2019

davmlaw commented Sep 26, 2019 • edited Loading

jtratner commented Nov 21, 2019

mdshw5 commented Nov 21, 2019

davmlaw commented Sep 26, 2019 •

edited

Loading