Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PDB manager #270 #271

Closed
wants to merge 3 commits into from
Closed

add PDB manager #270 #271

wants to merge 3 commits into from

Conversation

a-r-j
Copy link
Owner

@a-r-j a-r-j commented Feb 24, 2023

Reference Issues/PRs

#270 @amorehead

What does this implement/fix? Explain your changes

Adds a utility for creating selections of experimental PDB structures

What testing did you do to verify the changes in this PR?

WIP

Draws the following metadata:

id | pdb | chain | length | molecule_type | name | sequence | ligands | source | resolution | experiment_type
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
100d_A | 100d | A | 10 | na | DNA/RNA (5'-R(*CP*)-D(*CP*GP*GP*CP*GP*CP*CP*GP... | CCGGCGCCGG | [SPM] |   | 1.90 | diffraction
100d_B | 100d | B | 10 | na | DNA/RNA (5'-R(*CP*)-D(*CP*GP*GP*CP*GP*CP*CP*GP... | CCGGCGCCGG | [SPM] |   | 1.90 | diffraction
101d_A | 101d | A | 12 | na | DNA (5'-D(*CP*GP*CP*GP*AP*AP*TP*TP*(CBR)P*GP*C... | CGCGAATTCGCG | [CBR, MG, NT] |   | 2.25 | diffraction
101d_B | 101d | B | 12 | na | DNA (5'-D(*CP*GP*CP*GP*AP*AP*TP*TP*(CBR)P*GP*C... | CGCGAATTCGCG | [CBR, MG, NT] |   | 2.25 | diffraction
101m_A | 101m | A | 154 | protein | MYOGLOBIN | MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDR... | [HEM, NBN, SO4] | Physeter catodon | 2.07 | diffraction
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
9xia_A | 9xia | A | 388 | protein | XYLOSE ISOMERASE | MNYQPTPEDRFTFGLWTVGWQGRDPFGDATRRALDPVESVQRLAEL... | [DFR, MN] | Streptomyces rubiginosus | 1.90 | diffraction
9xim_A | 9xim | A | 393 | protein | D-XYLOSE ISOMERASE | SVQATREDKFSFGLWTVGWQARDAFGDATRTALDPVEAVHKLAEIG... | [MN, XLS] | Actinoplanes missouriensis | 2.40 | diffraction
9xim_B | 9xim | B | 393 | protein | D-XYLOSE ISOMERASE | SVQATREDKFSFGLWTVGWQARDAFGDATRTALDPVEAVHKLAEIG... | [MN, XLS] | Actinoplanes missouriensis | 2.40 | diffraction
9xim_C | 9xim | C | 393 | protein | D-XYLOSE ISOMERASE | SVQATREDKFSFGLWTVGWQARDAFGDATRTALDPVEAVHKLAEIG... | [MN, XLS] | Actinoplanes missouriensis | 2.40 | diffraction
9xim_D | 9xim | D | 393 | protein | D-XYLOSE ISOMERASE | SVQATREDKFSFGLWTVGWQARDAFGDATRTALDPVEAVHKLAEIG... | [MN, XLS] | Actinoplanes missouriensis | 2.40 | diffraction

Currently missing:

  • Download method.
  • Clustering method with MMSeqs.

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

@sonarcloud
Copy link

sonarcloud bot commented Feb 24, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j
Copy link
Owner Author

a-r-j commented Feb 24, 2023

@amorehead I've added the clustering utils - does this cover what you were hoping for?

You also mentioned some structural clustering - I also think this would be good. Do you have a preferred method?

@amorehead
Copy link
Contributor

amorehead commented Feb 25, 2023

Hi, @a-r-j.

All these changes look great! I've gone ahead and created another pull request using a personal fork of Graphein's latest master branch. In particular, I've revised some of the documentation for each class method, and more importantly, I've added initial support for splitting the e.g., clustered sequence dataset into an arbitrary number of "splits" (e.g., train, val, and test). Also, feel free to push changes directly to this forked branch of mine if you would like to make additions or edits to it. My hope is that we can use this pull request to finish developing the remaining functionality listed. Let me know if you have any questions, comments, or concerns.

Looking forward to the final result!

@a-r-j a-r-j closed this Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants