Refactor genomes.py duplication #321

BurkovBA · 2018-02-06T12:04:47Z

Currently, genome handling logic is all around and we're using crutches to do format conversions. This chaos needs to go.

I suggest the following roadmap:

I'll look into the genomes-related code in :
- database
- python and django models and serializers methods logic
- data import pipelines
- urls
- Genoverse genome-browser
- text search and Lucene index
- user-readable representations on website (backend and frontend-generated)
- hyperlinks generation for external resources (E!, UCSC, ...)

I'll create a github issue with hyperlinks for Anton and Blake to quickly recap.

Anton and Blake, using the hyperlinks I provided, refresh in their memory this whole problem and come up with their visions of:
- how this should be done
- how to get from where we are to where we need to be ASAP
We do a short meeting and agree on what formats we're using for genome names in each part of our site. I create meeting notes that will serve as a documentation prototype.
Using meeting notes, I document the formats used to store data and pipelines of data transfer. I make this documentation available and we maintain this documentation up-to-date.
Following the documentation, we create one and only data flow with well-defined interfaces and adapter functions for convertions between formats. This pipeline describes is used:

by data import pipelines to transport data import pipeline from external sources to the database and python code
by backend code to retrieve data from DB to python/django models
by various frontend modules to request genomes form backend
by various frontend modules to display data

We rewrite our code to use this pipeline and remove any duplications of logic and ad-hoc code.

We can download all the available genomes from E! public MySQL database into our own database table.

Then we can get rid of config/genomes.py and similar code on frontend. Expose genomes through REST api endpoint.

This script is an example of how to retrieve genomes information from E! public MySQL database.
https://github.com/RNAcentral/rnacentral-webcode/blob/master/rnacentral/portal/management/commands/update_ensembl_genome_mapping.py

We also have multiple functions, tied to genomes, such as Xref.get_ucsc_db_id, Xref.get_ensembl_division(), Accession.get_ensembl_species_url().

The text was updated successfully, but these errors were encountered:

blakesweeney · 2018-02-06T12:34:07Z

How often do we need to run this? If it is something we should run when we import E! data I would prefer to add it to the pipeline as part of the Ensembl update. pgloader supports pulling from a mysql database into a Postgres one: http://pgloader.readthedocs.io/en/latest/ref/mysql.html.

AntonPetrov · 2018-02-06T12:38:38Z

This would need to run every time Ensembl is updated so it's a good idea to merge this script with the Ensembl import pipeline.

Not sure if pgloader can help here because we need to pull data from several tables across multiple Ensembl databases.

blakesweeney · 2018-02-06T12:46:57Z

Ok, I can work on adding it as part of the import pipeline later then. I'll aim for after I update Ensembl data for this release.

…ding migration.

…ion.py.

…ivery.

BurkovBA added enhancement data problem technical debt labels Feb 6, 2018

BurkovBA self-assigned this Feb 6, 2018

blakesweeney self-assigned this Feb 6, 2018

BurkovBA added a commit that referenced this issue Apr 4, 2018

#321 Implemented EnsemblAssembly API endpoint.

6a31fbd

BurkovBA added a commit that referenced this issue Apr 4, 2018

#321 Stub for unit-tests of EnsemblAssembly endpoint.

a8f9dcb

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Fixed EnsemblAssemblyTestCase.test_detail().

e7e480c

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Minor fixes in unit-tests.

0f57e12

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Added example location fields to EnsemblAccession and correspon…

7741fcf

…ding migration.

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Separate update_ensembl_assembly management command.

00a8dd0

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Copied example locations to update_ensembl_assembly command.

db69184

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Fixed handling of assemblies without example location.

95314c7

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Improved logging in update_ensembl_assembly.

ef2ea6a

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Added connection to ensembl genomes public MySQL.

2d43a4a

BurkovBA added a commit that referenced this issue Apr 6, 2018

#321 Tiny fix in update_ensembl_genome_mapping.

2fb0300

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Removed genomes from config.genomes.py.

0fc8df3

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Deleted config.genomes entirely, moved url2db, db2url to access…

f97ebbd

…ion.py.

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Removed redundant EnsemblAssembly endpoint, fixed test case.

d017e39

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Download genomes from server side in genome-browser.controller.js.

970cc59

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Refactoring frontend appropriately.

7f462f5

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Simplifying genomes handling on frontend.

2389193

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Bugfix in E! subdomain handling.

79d2d57

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Filtered out genomes without example location.

6988d43

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Added subdomain column to EnsemblAssembly.

7dee506

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Got rid of getEnsemblSubdomainByDivision().

103b54b

BurkovBA added a commit that referenced this issue Apr 8, 2018

#321 Removed debugger.

33a5003

BurkovBA added a commit that referenced this issue Apr 9, 2018

#321 Fix in example locations.

1205e99

BurkovBA added a commit that referenced this issue Apr 9, 2018

#321 Fixed genoverse display in sequence page.

471f0b2

BurkovBA added a commit that referenced this issue Apr 9, 2018

#321 Display common_name in genome browser ng-options.

6ac2ec6

BurkovBA added a commit that referenced this issue Apr 9, 2018

#321 Updated angularjs-genoverse.

068339d

BurkovBA added a commit that referenced this issue Apr 9, 2018

#321 Leftover migration for EnsemblAssembly.subdomain.

e41472d

BurkovBA mentioned this issue Apr 11, 2018

321 ensembl assembly refactor #374

Merged

BurkovBA added a commit that referenced this issue Apr 12, 2018

#321 Manually fixed conflicts, overlooked by automatic merge.

658ac2d

BurkovBA added a commit that referenced this issue Apr 12, 2018

#321 Don't color selectedLocation, if it doesn't exist.

0f9b006

BurkovBA added a commit that referenced this issue Apr 12, 2018

#321 Fix in fabfile to print deployed branch.

1a2f87c

BurkovBA added a commit that referenced this issue Apr 12, 2018

#321 Removed Handlebars from README.md just to check Github hook del…

49c3e82

…ivery.

BurkovBA added a commit that referenced this issue Apr 12, 2018

#321 Removed datatables to check Jenkins hook setup.

9bc223b

BurkovBA added a commit that referenced this issue May 11, 2018

#321 Made if-else clauses in $doCheck mutually exclusive.

ecbe8e9

BurkovBA added a commit that referenced this issue May 11, 2018

#321 Unified style of genome names display.

647e2af

BurkovBA closed this as completed in #374 May 12, 2018

BurkovBA added a commit that referenced this issue May 14, 2018

#321 genoverseConfig: removed '100000: false', hiding the tracks.

3e6375d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor genomes.py duplication #321

Refactor genomes.py duplication #321

BurkovBA commented Feb 6, 2018 •

edited

Loading

blakesweeney commented Feb 6, 2018

AntonPetrov commented Feb 6, 2018

blakesweeney commented Feb 6, 2018

Refactor genomes.py duplication #321

Refactor genomes.py duplication #321

Comments

BurkovBA commented Feb 6, 2018 • edited Loading

blakesweeney commented Feb 6, 2018

AntonPetrov commented Feb 6, 2018

blakesweeney commented Feb 6, 2018

BurkovBA commented Feb 6, 2018 •

edited

Loading