Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation Sprint 2 Task List #273

Closed
3 of 5 tasks
cristinaetrv opened this issue Sep 26, 2023 · 3 comments
Closed
3 of 5 tasks

Annotation Sprint 2 Task List #273

cristinaetrv opened this issue Sep 26, 2023 · 3 comments
Assignees
Labels
annotation .task list A checklist of smaller tasks
Milestone

Comments

@cristinaetrv
Copy link
Collaborator

cristinaetrv commented Sep 26, 2023

Background:

Improving Annotation for Maintainability

Completion Criteria:

hg19 and hg38 database that has exact match for dbSNP and ClinVar, add back 'intergenic'

Implementation Summary

  • Switch from using dbSNP tab separated to VCF file for exact match (tab separated gives no guarantee that allelic representation matches)
  • Switch from using ClinVar tab separated to VCF file for exact match (tab separated gives no guarantee that allelic representation matches)
  • Ensure global regex's are applied to values being inserted into the database
    - Every track will have to ensure that it applies the replacement method which will be defined in class Tracks::build
    - Document this
  • Replace unit separator with forward slash
  • Write replacement unit tests for overlapping values
@cristinaetrv cristinaetrv added this to the Sprint 2 milestone Sep 26, 2023
@cristinaetrv cristinaetrv added .task list A checklist of smaller tasks annotation labels Sep 26, 2023
@akotlar
Copy link
Collaborator

akotlar commented Oct 19, 2023

See #304 and #300 for dbSNP 2 - related PRs.

I've tested building the resulting transformed file (100k lines), and that works well. For instance, querying chr1:10114, shows us the expected data, see attached

Expected features (these will be displayed in order, from index 0 to index 10 in the next image):
Screenshot 2023-10-19 at 1 13 27 AM

The result in Bystro
Screenshot 2023-10-19 at 1 12 58 AM

The expected allele frequencies, per population
Screenshot 2023-10-19 at 1 14 56 AM

@akotlar
Copy link
Collaborator

akotlar commented Oct 19, 2023

"Write replacement unit tests for overlapping values" is already done, see:

"Write replacement unit tests for overlapping values" and "Ensure global regexes applied" is done, see:

@cristinaetrv
Copy link
Collaborator Author

Remaining work moved to Sprint 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annotation .task list A checklist of smaller tasks
Projects
None yet
Development

No branches or pull requests

2 participants