Kmer association #115

jjacobson95 · 2023-10-16T20:54:11Z

Not ready for pull yet, but just wanted this to be on the radar.

Updates:

Learn output storage space reduced by 40-50%.
Apply output storage space reduced by ~95% when using new "save_apply_associations" parameter .
Small increases in speed and efficiency for both Learn and Apply.
Learn is set up to train on full length and is capable of evaluating confidence based on user defined fragment sizes now.
Code fully restructured to be object oriented. It is now much more readable, maintainable and cleaner for a future publication.
- Doc strings added to all functions. Overall, there should be much greater clarity for the purpose of each section of code.

Before merging pull request several things must still be done:

Confirm that the additive method for kmer count matrix generation is still working for Learn.
Update Documentation with new parameters.
Lint.

…p params changed. Linted

…ct-Oriented Manner. Cleaned and added doc strings. Altered Learn output to take up 40-50% less storage. Much cleaner for a future publication. More readable and managable for future updates. Currently working for standard usage. To do: Check if this still works for the additive database method in Learn.

…coring. Linted. Still to do: Documentation

jjacobson95 · 2023-10-31T02:14:34Z

Failing on Model in the test. Is this a known issue? Looks like it may be unrelated to model but maybe just an issue with the test.

jjacobson95 · 2023-11-06T22:48:00Z

Hi @christinehc, just checking in on this. Do you know if this is a known or previous issue with model?

christinehc · 2023-11-07T00:09:56Z

Hmm, I took a look and didn't see anything obvious. I would try rerunning the tests and seeing if that works? If it fails again, I'll do some more digging

Edit: started the rerun a short while ago; we'll see how it goes

Edit 2: failed again, hmm. Not getting much clarity from the debug log itself but let me try a few things. Seems to be an issue with the actions workflow itself

christinehc · 2023-11-28T22:18:46Z

@jjacobson95: Still failing but I found this possibly related issue?

AKA try changing ubuntu-latest to ubuntu-18.04 in the actions.yml and see if that works

jjacobson95 · 2023-11-29T23:31:07Z

Ubuntu-18.04 failed - look like its no longer supported with github. Looks like only the latest (ubuntu-22.04) and ubuntu-20.04 are available - options are here.
Currently testing ubuntu-20.04.

jjacobson95 · 2023-11-29T23:46:54Z

Note to all (#116) - ubuntu 20.04 works.
Note to me - Learn/Apply unit test needs updating as several new arguments need to be updated in the config file.

jjacobson95 · 2023-12-04T23:31:27Z

Ready to merge! @biodataganache

christinehc · 2023-12-05T20:45:03Z

snekmer/rules/apply.smk

Minor suggestion: I would use an input function to handle the conditional creation of optional files. It's a bit "cleaner" code style-wise.

christinehc · 2023-12-05T21:09:16Z

snekmer/rules/learn.smk

Minor stylistic comments:

See previous comment on apply.smk about input functions for conditional rule all

Lines 606/616/903:if not x is preferable to if x == False

Line 791: I would print a more informative error message.

Thoughts for future development: I wonder if some aspects of the classes, e.g. the checking function stream, can be streamlined. I do think the object-based approach is great, but the classes are a bit large/unwieldy and in future development I'd consider strategies to simplify, even if that means moving some of the heavy lifting to module-level functions rather than classes.

Thanks for the suggestions @christinehc. I think all of those are good ideas. If not in this version, I'll make these changes for the next version. The classes are pretty large, in a future iteration, I'll think on how to handle functions. would you recommend using an additional helper file and importing them or keeping them within the current scope?

I think we can discuss what makes the most sense when we plan the next major code update, as part of the complexity of the classes arises from the complexity of the workflow itself and we'd have to see which areas would be most ripe for simplification.

christinehc

Including some commentary on minor suggested changes, but everything seems to be working / CI is passing, so I'll submit an approval formally.

biodataganache · 2023-12-06T00:03:40Z

Couple of items:

Please add a script in the Snekmer/resources/tutorial/learnapp_tutorial_files/ folder that will run the learn and apply examples (see Snekmer/resources/tutorial/demo_example/ for the idea).
Please remove the base file in the Snekmer/resources/tutorial/learnapp_tutorial_files/learn/ folder from the repo (this should be generated by running the example)
Please fix the following error when snekmer learn is run:
/Users/d3p620/lib/Snekmer/resources/tutorial/learnapp_tutorial_files/learn

(snekmer) d3p620@WE48427 learn % snekmer learn
KeyError in line 914 of /Applications/anaconda3/envs/snekmer/lib/python3.10/site-packages/snekmer/rules/learn.smk:
'conf_weight_modifier'
File "/Applications/anaconda3/envs/snekmer/lib/python3.10/site-packages/snekmer/rules/learn.smk", line 914, in

jjacobson95 · 2023-12-08T22:21:05Z

Changes made. But before merging, I should also update the docs to reflect parameter changes. I'll try to have this done by next Tuesday.

christinehc · 2023-12-12T00:18:58Z

Please also remember to update the version here before pushing

changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated

jjacobson95 added 5 commits September 8, 2023 11:18

Confidence rework for fragmentated data added to Learn Apply

89f948e

Fragmentation and conf generation option added succcessfully. LearnAp…

771df2c

…p params changed. Linted

snekmer LA storage reduction applied

bc0062f

Weighted Average with modifer written for additive Learn confidence s…

dbaee7f

…coring. Linted. Still to do: Documentation

christinehc added 2 commits November 6, 2023 16:33

ci: add recursive flag to mkdir

94336d2

ci: add explicit permissions for google auth

de36c75

christinehc mentioned this pull request Nov 29, 2023

Functionmotifs #116

Merged

jjacobson95 added 2 commits November 29, 2023 15:18

Update action.yml

d310bc8

Update action.yml

4fde561

updated test

d77a22e

christinehc reviewed Dec 5, 2023

View reviewed changes

christinehc approved these changes Dec 5, 2023

View reviewed changes

biodataganache mentioned this pull request Dec 6, 2023

LA tutorial issues #117

Open

fixed LA demo and made requested changes

a65ee54

christinehc mentioned this pull request Dec 12, 2023

Enable background subtraction / file unzipping #118

Open

Docs updated with LA parameters

df4bdb7

Version incremented

5579c7f

christinehc merged commit 94a1374 into main Dec 13, 2023
3 checks passed

christinehc mentioned this pull request Dec 20, 2023

kmer-association learn workflow implementation #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kmer association #115

Kmer association #115

jjacobson95 commented Oct 16, 2023 •

edited

jjacobson95 commented Oct 31, 2023

jjacobson95 commented Nov 6, 2023

christinehc commented Nov 7, 2023 •

edited

christinehc commented Nov 28, 2023

jjacobson95 commented Nov 29, 2023

jjacobson95 commented Nov 29, 2023

jjacobson95 commented Dec 4, 2023

christinehc Dec 5, 2023

christinehc Dec 5, 2023

jjacobson95 Dec 5, 2023

christinehc Dec 12, 2023

christinehc left a comment

biodataganache commented Dec 6, 2023

jjacobson95 commented Dec 8, 2023

christinehc commented Dec 12, 2023

Kmer association #115

Kmer association #115

Conversation

jjacobson95 commented Oct 16, 2023 • edited

jjacobson95 commented Oct 31, 2023

jjacobson95 commented Nov 6, 2023

christinehc commented Nov 7, 2023 • edited

christinehc commented Nov 28, 2023

jjacobson95 commented Nov 29, 2023

jjacobson95 commented Nov 29, 2023

jjacobson95 commented Dec 4, 2023

christinehc Dec 5, 2023

Choose a reason for hiding this comment

christinehc Dec 5, 2023

Choose a reason for hiding this comment

jjacobson95 Dec 5, 2023

Choose a reason for hiding this comment

christinehc Dec 12, 2023

Choose a reason for hiding this comment

christinehc left a comment

Choose a reason for hiding this comment

biodataganache commented Dec 6, 2023

jjacobson95 commented Dec 8, 2023

christinehc commented Dec 12, 2023

jjacobson95 commented Oct 16, 2023 •

edited

christinehc commented Nov 7, 2023 •

edited