Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proonto #41

Merged
merged 14 commits into from
Sep 7, 2020
Merged

Proonto #41

merged 14 commits into from
Sep 7, 2020

Conversation

JakeWolfe
Copy link
Contributor

Protein ontology protein fragment entity integration.

@MihaiSurdeanu
Copy link
Contributor

@bgyori: we added protein fragments from the PO here. Does this PR look Ok to you? There is another parallel PR for this in reach. I will mention you there as well. Thanks!

@bgyori
Copy link
Contributor

bgyori commented Sep 4, 2020

Thanks @MihaiSurdeanu and @JakeWolfe. I'll try to get to this by tomorrow!

@bgyori
Copy link
Contributor

bgyori commented Sep 7, 2020

I made a few changes, to add another category of synonyms we previously missed, and remove some synonyms that are not things that would actually be used in text. I have some concerns still: as opposed to other protein resources, here we don't use the species information so all entries default to human. But actually, SARS-CoV-2 protein fragments, as well as some other viral proteins are now picked up as Human. This could cause issues downstream of Reach, though isn't very serious (since the grounding itself is more important than what Reach says about the organism). I am also generally a bit disappointed with the Protein Ontology - this is also my first time working with it - since it seems to be missing some human protein entries that I would have expected to see (e.g., bradykinin), and also doesn't really seem to have many viral proteins other than SARS-CoV-2. I think we can use these synonyms, but we may want to also include UniProt fragments, which as far as I can now see have better coverage, and are also more "synonym-like" (i.e., found in actual text).

@MihaiSurdeanu MihaiSurdeanu merged commit 931911e into master Sep 7, 2020
@MihaiSurdeanu
Copy link
Contributor

Thanks @bgyori !

  • @bgyori, @JakeWolfe: what are the next steps if we want to add the protein fragments from Uniprot?
  • @bgyori: We just noticed that we introduced a bug in LexiconNER, which might cause it to over match. @kwalcock will look into this. But this bug might generate some false positives in Reach for now...

@MihaiSurdeanu MihaiSurdeanu deleted the proonto branch September 7, 2020 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants