-
Notifications
You must be signed in to change notification settings - Fork 49
Simulations #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulations #129
Conversation
src/grelu/data/dataset.py
Outdated
| Args: | ||
| seqs: DNA sequences as intervals, strings, integer encoded or one-hot encoded. | ||
| fixed_pattern: A subsequence to insert in the center of each background sequence. | ||
| variable_pattern: A subsequence to insert into the background sequences at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable_pattern is a bit misleading as it suggested the pattern itself is changing-- maybe moving_pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/grelu/data/dataset.py
Outdated
| max_pos = self.seq_len - self.moving_pattern_len + 1 | ||
| excl_start = self.fixed_pattern_start - self.moving_pattern_len + 1 | ||
| excl = range(excl_start, self.fixed_pattern_end) | ||
|
|
||
| positions = [x for x in range(0, max_pos, self.stride) if x not in excl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is shared with Tiling Shuffle Dataset. Should it be abstracted out? Do you see it being used again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will break it out
Note: these changes were made on top of #118 which should be merged first.
Expanded the simulations module along with tutorial and tests.
shuffle_tilesto shuffle successive tiles along an input sequence, e.g. for enhancer discoverymarginalize_pattern_spacingto insert motifs at different distances in shuffled background sequencesplot_position_effectin `visualize to plot the outputs of these functionsObviously, there is a lot of unnecessary duplication and complexity in the dataset module. This will be cleaned up in later PRs.