Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query OLS to update term status #77

Merged
merged 41 commits into from
Aug 27, 2020
Merged

Query OLS to update term status #77

merged 41 commits into from
Aug 27, 2020

Conversation

joj0s
Copy link
Collaborator

@joj0s joj0s commented Aug 3, 2020

This PR adds the ability to query OLS for status updates, via asynchronous background tasks. Closes #31.

@joj0s joj0s added the Scope: Backend Backend logic & data processing scripts label Aug 3, 2020
@tskir tskir temporarily deployed to trait-curation-pr-77 August 3, 2020 08:51 Inactive
@joj0s
Copy link
Collaborator Author

joj0s commented Aug 3, 2020

@tskir testing this functionality is tricky, because it turns out that django-computedfields doesn't play well with the Django admin site. Changing a status value directly from the admin site causes an error in the term. So if you want I could write a script that changes some status values of terms via the Django shell, so that you can test if they are updated correctly when the OLS queries are made.

@joj0s joj0s requested a review from tskir August 3, 2020 09:12
Copy link
Member

@tskir tskir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good & succinct for the most part! Just left some comments about reorganising the logic flow of the status updates

traitcuration/traits/datasources/ols.py Outdated Show resolved Hide resolved
traitcuration/traits/tasks.py Show resolved Hide resolved
term.status = get_term_status(term_info['is_obsolete'], term_ontology_id)
else:
term.status = Status.DELETED
term.save()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a question of personal preference, but I feel it would be better to have just one function here which would, in one cycle, go through all traits and make the necessary queries/status updates. That would make the logic more readable.

Also, I see that this currently the “current” terms are only queried in their native ontology, and only “awaiting import” are queried in EFO. But actually, we want to query all terms both in their native ontology and in EFO (of course, if EFO is the native ontology, then we only want to make one query), and then make the appropriate decisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a question of personal preference, but I feel it would be better to have just one function here which would, in one cycle, go through all traits and make the necessary queries/status updates. That would make the logic more readable.

I am not sure I understood that, you mean we should merge the check_awaiting_import_terms and the check_term_status functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Into one function which would go through all traits (regardless of their current status), and then for each trait do the necessary logic for queries and status updates. Something like (pseudocode):

def ols_update():
    for trait in all_traits:
        query efo
        query native ontology
        new_status = decide_status(current_status, efo_status, native_ontology_status)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that example, the decide_status doesn't have to be a separate function (and probably even shouldn't be). This could all be a monolithic function which both does the queries and then immediately decides the new status of the traits based on a bunch of branching if statements. The purpose of this is to translate the logic of choosing the new status as directly into code as possible, so that it's easy to verify its correctness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is still not clear, I'll be happy to elaborate more

Copy link
Collaborator Author

@joj0s joj0s Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this approach makes much more sense. Although I think it would be best then to separate the status calculation, since there are going to be a bunch of if statements, and it is going to make the code much more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think it would be best then to separate the status calculation, since there are going to be a bunch of if statements, and it is going to make the code much more readable.

Well yes, actually a good point. I agree.

traitcuration/traits/views.py Show resolved Hide resolved
@jerch
Copy link

jerch commented Aug 4, 2020

testing this functionality is tricky, because it turns out that django-computedfields doesn't play well with the Django admin site.

@joj0s May I ask whats not working for you with django-computedfields? Also feel free to write issues, if you find something not working or feels like going into the wrong direction with computed fields. I still need to collect evidence where and to what extend it can be helpful at all.

@joj0s
Copy link
Collaborator Author

joj0s commented Aug 4, 2020

Hi @jerch, first of all thanks for coming here and providing assistance.

I have been investigating an issue since yesterday, where a TextChoices CharField which is being computed using your package (the computation works perfectly fine), suddenly breaks when it is being changed manually through Django Admin. I am still looking for the source of it, so I might have been quick to blame it on the package itself.

Once I identify the exact problem (if it is indeed something that has to do with the package), I can open an issue in the package's page to investigate it further, and of course provide any other feedback needed to improve it.

@jerch
Copy link

jerch commented Aug 5, 2020

... suddenly breaks when it is being changed manually through Django Admin

Yes, thats indeed not supported, computed fields are non editable by default (not even sure, how you got that changed) and get recalculated from the method code on .save(). Setting values manually to CFs has no meaning, those values would be lost.

May I ask whats your exact usecase here? For me it was never an issue to be able to set a CF value manually, furthermore it would work against the idea of computed fields, as it skips the method logic (prolly leading to desync issues).

@joj0s
Copy link
Collaborator Author

joj0s commented Aug 5, 2020

@jerch Bad wording on my part, I didn't try to manually change the computed field itself, rather the field that the computed field depended on.

I did manage to fix it though, and the mentioned problem was unrelated to the django-computedfields in the end, so the package is continuing to serve us well.

@jerch
Copy link

jerch commented Aug 5, 2020

@joj0s Ah ok, np.

I had a quick look at your defined CFs in the code, I see one issue with this one

@computed(models.BooleanField(), depends=[
['review_set', []],
])
def is_reviewed(self):
return self.review_set.count() >= 2

The concrete field listing on the right side is empty, which basically makes this rule a NOOP (no explicit dependency is added). This might lead to desync issues, if elements of review_set get moved between Mapping instances. To avoid that, you should list the fk field pointing back on the right side as concrete source field, here ['review_set', ['mapping_id']].

@joj0s
Copy link
Collaborator Author

joj0s commented Aug 5, 2020

@jerch Thank you very much for this suggestion, I will add that!

@joj0s joj0s temporarily deployed to trait-curation-pr-77 August 5, 2020 13:15 Inactive
@joj0s joj0s temporarily deployed to trait-curation-pr-77 August 5, 2020 16:11 Inactive
@joj0s joj0s temporarily deployed to trait-curation-pr-77 August 7, 2020 12:42 Inactive
@joj0s joj0s requested a review from tskir August 7, 2020 15:11
@joj0s joj0s temporarily deployed to trait-curation-pr-77 August 21, 2020 10:57 Inactive
@tskir tskir changed the base branch from master to auth August 27, 2020 11:02
@tskir tskir changed the base branch from auth to master August 27, 2020 11:02
@tskir tskir merged commit 3b42095 into master Aug 27, 2020
@tskir tskir deleted the ols_queries branch August 27, 2020 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Backend logic & data processing scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement querying OLS to update term status
3 participants