-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Seqhash functions #6
Conversation
@Koeng101 this looks awesome! I want to refactor and expand on this before merging but to do that I need two things. 1.) Can you run 2.) Can you also give me push access to your fork so I can make those changes there then merge them into the prime branch of poly? This is the best way I know of to make sure github credits you for this pull request while still allowing me to edit it before being merged. 🙏 🙏 🙏 |
1.) Run + added + committed the command above to the prime branch of the fork. 2.) Gave access to fork. |
Read comments, looks good. The reason for uppercasing is that the case never matters sequence-wise, but it impacts hashing. Could be lowercase for the same reason, but uppercase is nicer to read for sequences. Also, the check failed because you didn't fix the |
So I just made a big push and a couple of smaller comment related commits. Everything is looking good so far but testing is still needed before this is merged. Key things to test:
I don't see the need to test for individual hash functions. Seems tedious. |
All basic commands have been implemented and tests passed. Please review before I merge @Koeng101. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
This adds seqhash functionality to the Poly library. Basically, the Seqhash function hashes any DNA/RNA/Protein sequence with the blake3 algorithm to give it a universal identifier. If the sequence is circular, like a plasmid, it first takes the lexicographically minimal string rotation using Booth's algorithm, then hashes the resulting sequence with the blake3 hash.