Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
SIG: Annotation harmonization (BioC 2019 BoF proposal) #22
There seems to be sufficient and widespread interest for this:
https://twitter.com/KMS_Meltzy/status/1118613260138369024 (note epic Twitter thread)
As it happens, this also showcases some of the great strengths of Bioconductor, which has been dealing with this chaos for the past 20 (25?) years or so. Perhaps it is also worth discussing MeSH terms and so forth. There is always an annotation session, but this would be different -- a focus on harmonizing existing literature and data using best practices and the BioC infrastructure.
It seems like @lwaldron ought to have veto power over this, having done more than most to present, expose, and improve the underlying BioC and HGNC[helper] infrastructures, but there are many other (perhaps lesser known) resources that arose in the discussion. Enough, perhaps, to justify a BoF this year and a full-length workshop next year?
For what it's worth, I can provide some historical background on mitochondrial genome annotation; the lack of much visible infrastructure in BioC for dealing with mitochondrial references and variants motivated me to write the MTseeker package. Compared to Levi's efforts and those of others, mine is quite germinal; yet it is already sufficient for my group (and perhaps others) to begin leveraging the vast publicly available MT resources of SRA in the BioC sphere.
This appears to be a recurrent problem among people who get serious about systematic reviews of genome information, clinical trials, immunology, and so forth, thus a popular enough topic.
(Edited to comply with format requirements)
Intro: I am an assistant professor of bioinformatics at the Van Andel Research Institute and have been a Bioconductor user for about 15 years, a Bioconductor developer for approximately 10 years.
Desired outputs for the BoF: a roadmap to BioC "annotation harmonization" or, less poetically, decrufting tools, suitable to form the backbone of a workshop. Code examples and/or workflows could include literature search and (attempted) disambiguation of gene names, symbols, protein names and symbols, compounds, disease names or ontological classifications, and perhaps MeSH terms (as a lead-in to controlled vocabularies and ontologies). Ideally the outputs would be use cases suitable for a workflow paper in a venue such as F1000R.
I would definitely be interested in this SIG as I struggle with this issue all the time, and not just with human and mouse. What complicates matters even more is that there is not one global definitive gene set for even human (NCBI vs. Ensembl/Gencode/Havana - just standardize already!!) and mapping between the two is a nightmare.
Indeed — Ensembl and HGNC both chimed in on twitter regarding the MANE project, which aims to address this. Annotation, variant calling, and literature mining is vastly more difficult than many people realize. I am hopeful that enough “once bitten” folks attend BioC to ensure attendance :-) Thank you for supporting this BoF proposal!…
On May 9, 2019, at 9:30 AM, Jenny Drnevich ***@***.***> wrote: I would definitely be interested in this SIG as I struggle with this issue all the time, and not just with human and mouse. What complicates matters even more is that there is not one global definitive gene set for even human (NCBI vs. Ensembl/Gencode/Havana - just standardize already!!) and mapping between the two is a nightmare. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.