Skip to content

Conversation

@avantikalal
Copy link
Collaborator

@avantikalal avantikalal commented Feb 3, 2025

In LabeledSeqDataset, we add padding at the end of intervals to load extra bases for data augmentation by shift/jittering. Similarly in VariantDataset, we generate intervals surrounding the variant. In both these cases, the extended/generated intervals may end up extending beyond the chromosome which will result in an error when we try to read sequences.

This PR introduces the following changes:

  1. adds a function check_chrom_ends which raises an error if any intervals exceed the chromosome ends, and prints the relevant intervals.
  2. Added a test for this function
  3. Added this function to the dataset classes.

Addresses #54

@avantikalal avantikalal changed the title WIP: prevent variant intervals from exceeding chromosome ends Prevent padded intervals from exceeding chromosome ends Feb 3, 2025
@avantikalal avantikalal requested a review from suragnair February 3, 2025 22:07
@avantikalal avantikalal merged commit 21227de into main Feb 20, 2025
2 checks passed
@avantikalal avantikalal deleted the filter-variants-chrom branch February 20, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants