Updated motif processing to Tangermeme 0.4 #69

avantikalal · 2024-10-16T23:42:59Z

Replaced pymemesuite with tangermeme for all motif handling.

Specifically

Removing pymemesuite:

removed pymemesuite dependency from setup.cfg
removed pymemesuite from the dockerfile

Reading motifs into a dictionary format instead of pymemesuite format

changed read_meme_file so that it now reads motifs in the dictionary format that tangermeme expects
updated the test for the above function
Removed modisco_to_meme since it is no longer necessary to have a meme file input. Instead, added a read_modisco_report function that reads modisco motifs directly into the dictionary format that tangermeme expects
added tests for the above function
renamed grelu.io.meme to grelu.io.motifs as this module now involves modisco .h5 files in addition to meme files

Changing downstream functions

changed motifs_to_strings to use the dictionary format as input
changed scan_sequences to use tangermeme fimo function as backend
improved the test for scan_sequences
changed the grelu.transforms.seq_transforms.MotifScore class to use the dictionary format as input

Misc

also removed the padding argument from grelu.interpret.motifs.trim_pwm since we no longer have a use case for it

for more information, see https://pre-commit.ci

avantikalal · 2024-10-16T23:47:02Z

Partially addresses #55

…to tangermeme_04

for more information, see https://pre-commit.ci

src/grelu/io/motifs.py

suragnair · 2024-10-29T00:17:49Z

src/grelu/io/motifs.py

+
+                # Trim PPMs based on information content
+                start, end = trim_pwm(
+                    cwm, trim_threshold=trim_threshold, return_indices=True


I think the trim should be done on the pwm and not cwm? cwm values are not normalized in any way generally

I based this on the modiscolite repo where trimming seems to be done on the CWM, e.g.:
https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/report.py#L98
https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/report.py#L236

Is it incorrect?

Ok I had a look. Trimming on CWM is fine.

So it looks like within trim_pwm, the function recomputes the threshold as
trim_thresh = np.max(score) * trim_threshold
This is a bit unexpected to me, as the threshold should be used as is. Originally, I imagine it's for PWMs where the range is fixed (probs from 0 to 1).

My suggestion would be to compute exact threshold outside function call

Actually, I think the threshold is meant to be calculated on the log-odds matrix, not on the probability matrix. Let me find a reference for that.

Here you go: https://bioconductor.org/packages/devel/bioc/vignettes/universalmotif/inst/doc/IntroductionToSequenceMotifs.pdf

Specifically, this section on page 4:

While the context here is motif scanning, not trimming, this is the threshold that we're calculating. It's related to entropy of each position.

I see. In either case, the exact threshold calculation should happen outside the trim_pwm function. Trim should just take a matrix, some threshold and perform the trimming using that threshold

src/grelu/interpret/motifs.py

for more information, see https://pre-commit.ci

lala8 and others added 5 commits October 16, 2024 21:21

updated motif input functions to tangermeme and added tests

563ef3b

updated scan_sequences to tangermeme, updated test

33adef5

transpose motifs before passing to tangermeme

25597bc

removed pymemesuite from tutorials

954e17d

[pre-commit.ci] auto fixes from pre-commit.com hooks

0061410

for more information, see https://pre-commit.ci

avantikalal requested review from gokceneraslan and suragnair October 16, 2024 23:46

lala8 added 2 commits October 17, 2024 00:05

updated MotifScore

1b25d7c

Merge branch 'tangermeme_04' of https://github.com/Genentech/gReLU in…

aef1b46

…to tangermeme_04

avantikalal changed the title ~~Updated motif processing to Tangermeme 04~~ Updated motif processing to Tangermeme 0.4 Oct 17, 2024

lala8 and others added 2 commits October 17, 2024 00:49

transpose motifs when reading from files

d6e7771

[pre-commit.ci] auto fixes from pre-commit.com hooks

7cd09a2

for more information, see https://pre-commit.ci

suragnair requested changes Oct 29, 2024

View reviewed changes

avantikalal and others added 3 commits October 30, 2024 17:26

update documentation

4b51a1f

use tangermeme read_meme function

6c018e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

7317b33

for more information, see https://pre-commit.ci

avantikalal requested a review from suragnair October 30, 2024 17:47

suragnair approved these changes Nov 4, 2024

View reviewed changes

avantikalal merged commit a1c3be4 into main Nov 4, 2024
1 check passed

avantikalal deleted the tangermeme_04 branch November 4, 2024 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated motif processing to Tangermeme 0.4 #69

Updated motif processing to Tangermeme 0.4 #69

Uh oh!

avantikalal commented Oct 16, 2024 •

edited

Loading

Uh oh!

avantikalal commented Oct 16, 2024

Uh oh!

Uh oh!

Uh oh!

suragnair Oct 29, 2024

Uh oh!

avantikalal Oct 30, 2024

Uh oh!

suragnair Nov 4, 2024

Uh oh!

avantikalal Nov 4, 2024

Uh oh!

avantikalal Nov 4, 2024

Uh oh!

suragnair Nov 4, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Updated motif processing to Tangermeme 0.4 #69

Updated motif processing to Tangermeme 0.4 #69

Uh oh!

Conversation

avantikalal commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avantikalal commented Oct 16, 2024

Uh oh!

Uh oh!

Uh oh!

suragnair Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

avantikalal Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

suragnair Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

avantikalal Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

avantikalal Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

suragnair Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avantikalal commented Oct 16, 2024 •

edited

Loading