Skip to content

Conversation

@avantikalal
Copy link
Collaborator

@avantikalal avantikalal commented Mar 14, 2025

Note: these changes were made on top of #118 which should be merged first.

Expanded the simulations module along with tutorial and tests.

  1. Added a function shuffle_tiles to shuffle successive tiles along an input sequence, e.g. for enhancer discovery
  2. Add a function marginalize_pattern_spacing to insert motifs at different distances in shuffled background sequences
  3. Added a function plot_position_effect in `visualize to plot the outputs of these functions
  4. Expanded the tutorial to show applications of these functions.

Obviously, there is a lot of unnecessary duplication and complexity in the dataset module. This will be cleaned up in later PRs.

Args:
seqs: DNA sequences as intervals, strings, integer encoded or one-hot encoded.
fixed_pattern: A subsequence to insert in the center of each background sequence.
variable_pattern: A subsequence to insert into the background sequences at
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable_pattern is a bit misleading as it suggested the pattern itself is changing-- maybe moving_pattern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1102 to 1106
max_pos = self.seq_len - self.moving_pattern_len + 1
excl_start = self.fixed_pattern_start - self.moving_pattern_len + 1
excl = range(excl_start, self.fixed_pattern_end)

positions = [x for x in range(0, max_pos, self.stride) if x not in excl]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is shared with Tiling Shuffle Dataset. Should it be abstracted out? Do you see it being used again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will break it out

@avantikalal avantikalal merged commit 143be0f into main Mar 27, 2025
2 checks passed
@avantikalal avantikalal deleted the simulations-3 branch March 27, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants