-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve overrides and fix outdated GO IDs #30
Conversation
Actually, this might change the NER files so I'll add those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I see what is happening with the tests, or wait?
Here is the summary of the results and my analysis. Let me try to fix what I can and I'll comment if there are any left that I'm unsure how to fix.
-> this test is trying to make a distinction between FamPlex families and complexes but there's no need to do so, labeling them as Family is good.
-> Here I reviewed MAF yesterday and it is the symbol of a gene so I think it will currently resolve to that instead of a family. There are also changes to the groundings here, for instance, COX4 will now resolve to FamPlex's COX4.
-> TGFB is a FamPlex family
-> ERbB is actually a family
-> Same as the one before, ERbB is a family
-> SAPK is actually a family |
I fixed almost all the tests and pushed the changes to the bio31 branch of Reach. I have one question though (for the one remaining failure): there is a test for the string "FLOT" with the expectation that it should ground to FamPlex FLOT. The current grounding entries are relevant here:
In PFAM-families.tsv
There is currently nothing for this entity in NER-Grounding-Override.tsv. Why could it be that "FLOT" gets grounded to PF15975 under these circumstances (unless I'm missing something unrelated to the grounding logic)? |
There were a couple of issues:
I had to prioritize FamPlex over the other Family KBs in ReachEntityLookup. Add, a hidden issue, add FamilyOrComplex to the NER list of KBs. Note that these are configured separately because grounding is a different component. Also, the last issue did not impact FLOT because PFAM had the string. But it was a bug nonetheless. FLOT is now correctly grounded. Please check. Is my assumption that FamPlex should be prioritized over PFAM for grounding correct? |
I see, I don't fully understand why FamilyOrComplex comes into the picture though since we decided to put FamPlex entries under Family, and in ner_kb.config, we have:
I thought that what that meant was that ner/Family.tsv.gz would contain entries from both FamPlex and Pfam. So where does ner/FamilyOrComplex.tsv.gz come into the picture? |
You're right. The change in reference.conf is not needed. I suspect FamilyOrComplex.tsv.gz was a left over from a previous version, which confused me. I'll revert that change. |
I just ran the tests again, and I think the reordering of the priority now broke another override test:
I'll look into this and see what needs to be changed. |
Thank you!
…On Mon, Apr 13, 2020 at 11:51 Benjamin M. Gyori ***@***.***> wrote:
I just ran the tests again, and I think the reordering of the priority now
broke another override test:
[info] - should match expected grounding IDs in the text Activin A, Activin AB, Inhibin A, Inhibin B,
[info] AMPK alpha1beta1gamma1, AMPK alpha1beta1gamma2, AMPK alpha1beta1gamma3,
[info] AMPK alpha1beta2gamma1, AMPK alpha1beta2gamma2, AMPK alpha2beta1gamma1,
[info] AMPK alpha2beta2gamma1, AMPK alpha2beta2gamma2, AMPK alpha2beta2gamma3,
[info] AMPK a1b1g1, AMPK a1b1g2, AMPK a1b1g3,
[info] AMPK a1b2g1, AMPK a1b2g2, AMPK a2b1g1,
[info] AMPK a2b2g1, AMPK a2b2g2, AMPK a2b2g3,
[info] alpha1beta1gamma1, alpha1beta1gamma2, alpha1beta1gamma3,
[info] alpha1beta2gamma1, alpha1beta2gamma2, alpha2beta1gamma1,
[info] alpha2beta2gamma1, alpha2beta2gamma2, and alpha2beta2gamma3
[info] are important complexes. *** FAILED *** (10 milliseconds)
[info] false was not true (TestOverrides.scala:188)
I'll look into this and see what needs to be changed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI75TUKSMN6UBDKP5I7OB3RMNNJRANCNFSM4MGISRGQ>
.
|
Okay, I pushed a change for that test and now all the Reach tests are passing. |
I confirm that tests pass. I'll release next. |
Bioresources has been released, and the reach branch was merged in master. |
This PR makes several changes to the NER-Grounding-Override.tsv:
(I also added a one-off script to help find these cases automatically).
The PR also updates bio_process.tsv and GO-subcellular-locations.tsv to replace deprecated GO IDs with current ones. (Note that I have been planning to implement a script to fully update these resource files from GO, not just replace the outdated IDs in the old file but this is a good first step for the next release).
@MihaiSurdeanu if you see any surprising effects on the Reach tests you have been updating, let me know.