Skip to content

Conversation

@avantikalal
Copy link
Collaborator

@avantikalal avantikalal commented Oct 16, 2024

Replaced pymemesuite with tangermeme for all motif handling.

Specifically

  1. Removing pymemesuite:
  • removed pymemesuite dependency from setup.cfg
  • removed pymemesuite from the dockerfile
  1. Reading motifs into a dictionary format instead of pymemesuite format
  • changed read_meme_file so that it now reads motifs in the dictionary format that tangermeme expects
  • updated the test for the above function
  • Removed modisco_to_meme since it is no longer necessary to have a meme file input. Instead, added a read_modisco_report function that reads modisco motifs directly into the dictionary format that tangermeme expects
  • added tests for the above function
  • renamed grelu.io.meme to grelu.io.motifs as this module now involves modisco .h5 files in addition to meme files
  1. Changing downstream functions
  • changed motifs_to_strings to use the dictionary format as input
  • changed scan_sequences to use tangermeme fimo function as backend
  • improved the test for scan_sequences
  • changed the grelu.transforms.seq_transforms.MotifScore class to use the dictionary format as input
  1. Misc
  • also removed the padding argument from grelu.interpret.motifs.trim_pwm since we no longer have a use case for it

@avantikalal
Copy link
Collaborator Author

Partially addresses #55

@avantikalal avantikalal changed the title Updated motif processing to Tangermeme 04 Updated motif processing to Tangermeme 0.4 Oct 17, 2024

# Trim PPMs based on information content
start, end = trim_pwm(
cwm, trim_threshold=trim_threshold, return_indices=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trim should be done on the pwm and not cwm? cwm values are not normalized in any way generally

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based this on the modiscolite repo where trimming seems to be done on the CWM, e.g.:
https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/report.py#L98
https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/report.py#L236

Is it incorrect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I had a look. Trimming on CWM is fine.

So it looks like within trim_pwm, the function recomputes the threshold as
trim_thresh = np.max(score) * trim_threshold
This is a bit unexpected to me, as the threshold should be used as is. Originally, I imagine it's for PWMs where the range is fixed (probs from 0 to 1).

My suggestion would be to compute exact threshold outside function call

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the threshold is meant to be calculated on the log-odds matrix, not on the probability matrix. Let me find a reference for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go: https://bioconductor.org/packages/devel/bioc/vignettes/universalmotif/inst/doc/IntroductionToSequenceMotifs.pdf

Specifically, this section on page 4:
image

While the context here is motif scanning, not trimming, this is the threshold that we're calculating. It's related to entropy of each position.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In either case, the exact threshold calculation should happen outside the trim_pwm function. Trim should just take a matrix, some threshold and perform the trimming using that threshold

@avantikalal avantikalal requested a review from suragnair October 30, 2024 17:47
@avantikalal avantikalal merged commit a1c3be4 into main Nov 4, 2024
1 check passed
@avantikalal avantikalal deleted the tangermeme_04 branch November 4, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants