Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregation to BigBedAdapter to group bigGenePred transcripts #4456

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Jun 18, 2024

This adds the ability to aggregate multiple bigbed entries into a single 'gene' feature based on a attribute such as geneName

BigBed /BED in general is only capable of storing one transcript per line and doesnt explicitly acknowledge child->parent relationships with a gene level feature

result on volvox

before
image

this PR
image

motivation: better UCSC2jbrowse mega-instance functionality

@cmdcolin
Copy link
Collaborator Author

uses a redispatching approach similar to gff3tabix to handle cases where you retrieve a child and need to re-fetch to get all children within it's bounds. it's a heuristic that could be broken but hopefully holds up for general usage

@cmdcolin
Copy link
Collaborator Author

on the UCSC Gencode bigGenePred file, i think it has much nicer default behavior

this branch
image

main branch
image

note that a similar thing could be done to bedTabix potentially, but the UCSC bedTabix that i exported from the sql tables don't have any geneName type attribute to aggregate by afaik

@cmdcolin cmdcolin changed the title Add aggregation function to bigbed Add aggregation configuration to BigBedAdapter to improve bigGenePred type annotations Jun 18, 2024
@cmdcolin cmdcolin changed the title Add aggregation configuration to BigBedAdapter to improve bigGenePred type annotations Add aggregation configuration to BigBedAdapter to improve bigGenePred transcript grouping Jun 18, 2024
@cmdcolin cmdcolin changed the title Add aggregation configuration to BigBedAdapter to improve bigGenePred transcript grouping Add aggregation to BigBedAdapter to improve bigGenePred transcript grouping Jun 19, 2024
@cmdcolin cmdcolin changed the title Add aggregation to BigBedAdapter to improve bigGenePred transcript grouping Add aggregation to BigBedAdapter to group bigGenePred transcripts Jun 19, 2024
@cmdcolin cmdcolin merged commit c470c16 into main Jun 19, 2024
9 of 10 checks passed
@cmdcolin cmdcolin deleted the bigbed_aggre branch June 19, 2024 03:01
@cmdcolin cmdcolin added the enhancement New feature or request label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant