New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Position Frequency Matrix Featurizer #2896
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
merge commit
tonydavis629
changed the title
Position Frequency Featurizer
Position Frequency Matrix Featurizer
Apr 11, 2022
Hm, any idea why biopython is not importing into the CI? |
arunppsg
reviewed
Apr 20, 2022
deepchem/utils/sequence_utils.py
Outdated
@@ -1,6 +1,7 @@ | |||
from logging import raiseExceptions | |||
import os | |||
import subprocess | |||
from bio import SeqIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a try except block or SeqIO should be imported locally in a function where it is used
A couple of suggestions:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Description
This PR adds a new featurizer PFMFeaturizer. This takes in a multisequence alignment and outputs a position frequency matrix. Utility function PFM_to_PPM converts the PFM to a position probability matrix and has been included in the PFMFeaturizer Module.
Also included is MSA_to_dataset in sequence_utils. This can be used to convert an MSA generated by the hhblits or hhsearch utilities into a dataset that can be read by PFMFeaturizer.
Testing is included for PFMFeaturizer, PFM_to_PPM, and MSA_to_dataset. Docs are updated as well.
Type of change
Please check the option that is related to your PR.
Checklist
yapf -i <modified file>
and check no errors (yapf version must be 0.22.0)mypy -p deepchem
and check no errorsflake8 <modified file> --count
and check no errorspython -m doctest <modified file>
and check no errors