-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development of PDBManager
Class (WIP)
#272
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #272 +/- ##
==========================================
+ Coverage 40.27% 43.87% +3.60%
==========================================
Files 48 113 +65
Lines 2811 7718 +4907
==========================================
+ Hits 1132 3386 +2254
- Misses 1679 4332 +2653
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
Wow, thanks for the refactor & additions! I added deposition dates for temporal splits. I also had to rename the file as |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@a-r-j, no problem! Happy to do so. Funny enough, earlier today, I accidentally also implemented parsing of deposition dates (should have checked my GitHub notifications beforehand). However, to minimize development style friction, I've conformed my implementation to reuse most parts of your implementation. In addition, I've also added support for time-based (i.e., deposition date) splits, as well as new PDB filters in this commit. |
Eep! Sorry, should have dropped a note first. Re SonarCloud, it's not a problem; I can manually override the flag since there's no HTTPS alternative (AFAIK). |
@a-r-j, apologies for my delayed response time here. I just ran the latest version of the PDBManager tutorial notebook, and I was successfully able to perform all time-based splitting and sequence-based clustering in succession. I did not encounter any freezing issues, and (very nicely) I was able to reproduce all outputs from your most recent execution of the tutorial notebook. This suggests that our entire data fetching and splitting procedure is truly deterministic and reproducible! Great work on putting this notebook together. The only small comment I have on it is that users may also like to know that they can compose multiple types of splitting operations together. For example, they can perform a time-based split after clustering sequences into train/val/test splits without corrupting the sequence-based splits. Other than that, everything looks good to me! |
Great! It seems it's the clustering step with mmseqs that jupyter wasn't happy with.
Good point!. Will add this. |
…nto pr/amorehead/272-1
for more information, see https://pre-commit.ci
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Reference Issues/PRs
#270 @a-r-j
What does this implement/fix? Explain your changes
Adds a utility for creating selections of experimental PDB structures
What testing did you do to verify the changes in this PR?
WIP
Draws the following metadata:
Currently missing:
train
,val
, andtest
).Pull Request Checklist
./CHANGELOG.md
file (if applicable)./graphein/tests/*
directories (if applicable)./notebooks/
(if applicable)python -m py.test tests/
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,python -m py.test tests/protein/test_graphs.py
)black .
andisort .