-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kmer association #115
Kmer association #115
Conversation
…p params changed. Linted
…ct-Oriented Manner. Cleaned and added doc strings. Altered Learn output to take up 40-50% less storage. Much cleaner for a future publication. More readable and managable for future updates. Currently working for standard usage. To do: Check if this still works for the additive database method in Learn.
…coring. Linted. Still to do: Documentation
Failing on Model in the test. Is this a known issue? Looks like it may be unrelated to model but maybe just an issue with the test. |
Hi @christinehc, just checking in on this. Do you know if this is a known or previous issue with model? |
Hmm, I took a look and didn't see anything obvious. I would try rerunning the tests and seeing if that works? If it fails again, I'll do some more digging Edit: started the rerun a short while ago; we'll see how it goes Edit 2: failed again, hmm. Not getting much clarity from the debug log itself but let me try a few things. Seems to be an issue with the actions workflow itself |
@jjacobson95: Still failing but I found this possibly related issue? AKA try changing |
Ubuntu-18.04 failed - look like its no longer supported with github. Looks like only the latest (ubuntu-22.04) and ubuntu-20.04 are available - options are here. |
Note to all (#116) - ubuntu 20.04 works. |
Ready to merge! @biodataganache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: I would use an input function to handle the conditional creation of optional files. It's a bit "cleaner" code style-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stylistic comments:
- See previous comment on apply.smk about input functions for conditional rule all
- Lines 606/616/903:
if not x
is preferable toif x == False
- Line 791: I would print a more informative error message.
- Thoughts for future development: I wonder if some aspects of the classes, e.g. the checking function stream, can be streamlined. I do think the object-based approach is great, but the classes are a bit large/unwieldy and in future development I'd consider strategies to simplify, even if that means moving some of the heavy lifting to module-level functions rather than classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @christinehc. I think all of those are good ideas. If not in this version, I'll make these changes for the next version. The classes are pretty large, in a future iteration, I'll think on how to handle functions. would you recommend using an additional helper file and importing them or keeping them within the current scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can discuss what makes the most sense when we plan the next major code update, as part of the complexity of the classes arises from the complexity of the workflow itself and we'd have to see which areas would be most ripe for simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including some commentary on minor suggested changes, but everything seems to be working / CI is passing, so I'll submit an approval formally.
Couple of items:
|
Changes made. But before merging, I should also update the docs to reflect parameter changes. I'll try to have this done by next Tuesday. |
Please also remember to update the version here before pushing |
changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated
changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated
Not ready for pull yet, but just wanted this to be on the radar.
Updates:
Before merging pull request several things must still be done: