Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check PSSMs #401

Merged
merged 45 commits into from May 11, 2023
Merged

fix: check PSSMs #401

merged 45 commits into from May 11, 2023

Conversation

DaniBodor
Copy link
Collaborator

@DaniBodor DaniBodor commented Mar 20, 2023

Summary:

  • Change the assumed order in which the pssm file is read to alphabetical by 3-letter-code (was 1-letter-code, but this did not match pssm files used).
    • Perhaps would be even nicer to read the order from the top of the pssm file instead of assuming a fixed order in the future.
  • Include sanity check on matching amino acid order of pssm and pdb file
    • Test for both correct aa order (test_incorrect_pssm_order) and missing entries (test_incomplete_pssm) in pssm in test_query.py
  • Test that not having a pssm file throws an error if conservation module is selected
    • Improve the error message when no pssm file is found
  • Change default feature modules in query.process to only components and contact
    • Option to use 'all' as input, which will use all existing feature modules
  • Improvements to test_querycollection.py
    • Create function to check for expected presence/absence of features
    • Make separate tests for all features and default feature modules
    • Other minor improvements
  • Improve test speed, e.g. by reducing the number of queries and epochs in some of the tests (as this PR was on hold for a while, that may have been implemented by some other PR in the meantime).
  • Fixed pytorch 2.0.0 in the workflow, because no wheel is available for 2.0.1. This may cause problems in the future as well if pytorch and the wheels are not updated synchronously.

Note: It would be nice if there were some form of documentation on what the pssm files should look like and/or where and how to build it correctly, as it is very easy get it wrong and very difficult to diagnose or even detect if it is. This is not implemented in this PR.

@github-actions
Copy link

This PR is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale issue not touched from too much time label Apr 19, 2023
@DaniBodor DaniBodor changed the title fix: match the assumed aa order of the code to the pssm files and perform checks on correctness of pssm files fix: check PSSMs May 8, 2023
Copy link
Collaborator

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice checks :-)

I'd avoid adding exclude_feature_module (the least parameters the better), and I'd implement the solution we were discussing, which is defaulting the modules to the essentials and making feature_modules required. I don't think opening a new PR for this is needed, right?

@@ -60,7 +60,7 @@ runs:
# Only way to install msms is through conda
conda install -c bioconda msms
# Safest way to install PyTorch and PyTorch Geometric is through conda
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install pytorch torchvision torchaudio cpuonly -c pytorch==2.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I've also just noticed that in one of my branches. Also I had a question: why is the line 66 needed? (# Install optional pyg dependencies) Shouldn't conda install pyg -c pyg take care of the pyg dependencies?

Copy link
Collaborator Author

@DaniBodor DaniBodor May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why, but it doesn't. I guess because they are optional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With conda, from their website it should not be necessary. Only with pip. So if it doesn't work using only conda install pyg -c pyg their docs are wrong

README.md Show resolved Hide resolved
deeprankcore/query.py Outdated Show resolved Hide resolved
deeprankcore/query.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments about minor edits in the process method. I think that explicitly defaulting the modules (not using None) and allowing for 'all' but not None is clear from the user perspective.

@DaniBodor
Copy link
Collaborator Author

I added comments about minor edits in the process method. I think that explicitly defaulting the modules (not using None) and allowing for 'all' but not None is clear from the user perspective.

Agreed!

@DaniBodor DaniBodor merged commit 6ec03ce into main May 11, 2023
8 checks passed
@DaniBodor DaniBodor deleted the 332_pssmformat_dbodor branch May 11, 2023 15:23
@gcroci2 gcroci2 mentioned this pull request May 15, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input format of PSSM data
2 participants