Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate variant browser to postgres #528

Merged
merged 170 commits into from
Apr 2, 2019
Merged

Conversation

talavis
Copy link
Contributor

@talavis talavis commented Mar 22, 2019

Describe the pull request:

  • Bug fix
  • Functional change
  • New feature
  • Code cleanup
  • Build system change
  • Documentation change
  • Language translation

Pull request long description:

Migration of the variant browser from mongo to postgres. Also includes a lot of bug fixes in both backend and frontend. A lot of browser tests are also added.

Changes made:

A lot.

…. Most hacks can be removed once database is fixed
Copy link
Member

@viklund viklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone over most of the python code. None of the JS so far.

Apart from the specific comments I have left I have a few general ones.

  1. The beacon style dataset reference parsing. Wouldn't it be better to put that in a function and call that in all handlers that get the dataset name. That way we are both future proof and it's easier to find all the places to fix the beacon-version parsing bugs.

  2. There's a lot of [x for x in db.query.dicts()], esp. in the lookups module, is this really necessary? In one or two cases maybe, but I feel that the conversion from peewee objects into dicts should be made as late as possible. Maybe this requires a larger refactor than what we should do at the moment, but I would like to make sure.

  3. This is even more general. Should the exception handling happen close to where the exceptions happen or should the lookups just let them pass and then we deal with them in the handlers. The latter approach would (possibly) reduce the amount of exception handling. I'm not super convinced either way, but as it is now the exceptions are handled in both the lookup and the handler (although, there exceptions are the case when there's a None return from the underlying lookup function).

For example GetGene in browser_handlers.py. Line 89 checks wheter gene in None, which only happens in case get_gene in lookups.py (line 196) gets an exception. To me this looks like we are doing double work.

backend/beacon.py Outdated Show resolved Hide resolved
backend/db.py Outdated Show resolved Hide resolved
backend/handlers.py Outdated Show resolved Hide resolved
backend/handlers.py Show resolved Hide resolved
backend/handlers.py Outdated Show resolved Hide resolved
backend/modules/browser/browser_handlers.py Outdated Show resolved Hide resolved
(db.Coverage.pos <= end_pos) &
(db.Coverage.chrom == chrom) &
(db.Coverage.dataset_version == dataset_version.id))
.dicts())]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you just return the db.Coverage...where() witihout .dicts() and without a list-comprehension?

Copy link
Contributor Author

@talavis talavis Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list comprehension is needed; otherwise it only returns a SelectQuery object that needs to be executed on the database.

return db.Gene.select().where((db.Gene.gene_id == gene_id) &
(db.Gene.reference_set == ref_set)).dicts().get()
except db.Gene.DoesNotExist:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be in just one try: with two except clauses instead, would be a bit clearer to me at least.

(db.Gene.start <= stop_pos) &
(db.Gene.stop >= start_pos) &
(db.Gene.chrom == chrom)).dicts()
return [gene for gene in gene_query]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not really sure you need to do the dicts() and list comprehension here.

if variant['rsid']:
if not str(variant['rsid']).startswith('rs'):
variant['rsid'] = 'rs{}'.format(variant['rsid'])
return variant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this method should instead be:

variant = get_raw_variant(dataset, pos, chrom, ref, alt, ds_version)
if variant and 'rsid' in variant and not str(variant['rsid']).startswith('rs'):
    variant['rsid'] = f"{rs{variant['rsid']}"
return variant

(the main point was the updated ifs, not the f-string)

Copy link
Member

@viklund viklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the js as well.

frontend/src/js/controller.browserController.js Outdated Show resolved Hide resolved
frontend/templates/ng-templates/browser-region.html Outdated Show resolved Hide resolved
<p>Your region is too large. Please submit a region of at most 100 kb.</p>
<p>If you require larger regions, please see our <a href="/dataset/{{ ctrl.dataset.shortName }}/download">Download</a> page for raw data.</p>
</div>
<div ng-if='ctrl.region.statusText=="The region is too large"'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about comparing with a string in this way here. Isn't there a statusCode for too large region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using 400 (bad request) for too large region. 400 is also used for problems with parsing the region (which in my opinion also fits bad request). Thus the need to check the actual text. But maybe I should change it to ===?

@talavis
Copy link
Contributor Author

talavis commented Mar 29, 2019

  1. Fine by me. Though there are ~20 functions using dataset.

  2. Apparently the .dicts() call performs the select. I got used that I need I either had to use it as an iterator (thus the for part) or perform a second call with execute. As for using dicts, that's only done because the exac browser was doing it. I've been considering to change to peewee objects, but chose to mimic exac for this initial migration.

  3. I've been trying to keep the exception handling close to where it occurs, but I'm open to do it mainly in the handlers instead.

Copy link
Member

@viklund viklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good. Just a small comment about the parse_beacon_dataset.

backend/db.py Outdated Show resolved Hide resolved
@talavis talavis force-pushed the feature/browser-postgres branch 2 times, most recently from 9a86973 to 5c17392 Compare April 1, 2019 20:23
@viklund viklund merged commit 8874f2b into develop Apr 2, 2019
@viklund viklund deleted the feature/browser-postgres branch April 2, 2019 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants