-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we want obsolete nodes? #995
Comments
Good point. Thanks for bringing this issue to my attention. As you implied, the advantage of deleting them is decrufting KG2. But.... I wonder if there might conversely be an advantage to keeping them. Other nodes within KG2 (or outside of KG2) may connect to these nodes due to legacy datasets. At this point, this is speculation; I would have to check whether this is really the case. Certainly, deleting them is possible during the KG2 build process. |
Another possibility is to keep them during the KG2 build process, but to drop deprecated nodes during canonicalization. |
Tagging @amykglen to solicit her thoughts on this |
hmm.. I agree it's a bit strange to have obsolete nodes in our knowledge graph, but I checked and it's true that at least some of them are connected to non-deprecated nodes... (about 3,500 out of 36,000 deprecated nodes in
I guess I'd say it'd be ideal if we kept them in the regular KG2, but that the although it looks like 21,000 of the 36,000 deprecated nodes don't have a name, which means they can't really be synonymized by the |
OK, if a node satisfies:
then I think it is a reasonable candidate for deletion at the |
The above from @saramsey seems reasonable to me.
|
Yes, for example, a PathWhiz reaction node that connects to all of its reactants and products. |
The If additional cases of uncaught obsolete nodes are pasted here, I can update our filters to catch them. |
in the latest KG2 (
|
Thanks, @amykglen. Would you mind pasting a couple of example |
sure! here are some examples:
(obtained via this query in kg2-3-1: |
Thanks, I've pushed a possible fix for this (38a2e6a ) |
Testing on |
As of KG2.3.5, Amy's ORPHANET query returned no records, so that particular issue seems to be fixed.
However, when running
For that reason, I'm going to leave this issue open for now. |
Hi @kvarforl what's the status of this issue? I see that we had "verify this fix in the next KG2 build" but then that label was removed last October. Just wondering about what I should do. |
I haven't changed anything on this since october! Did we decide that it's okay to have nodes with "obsolete" in the name and if so, we can probably close this issue. If not, I can try to track down the offending nodes in Kg2.5.1 |
Hi @kvarforl ah, I missed where you wrote an explanation:
|
I wonder if in Our test-case for whether or not it worked could be this Cypher query in the Neo4j:
currently returns 2227, but after your fix, it should return a count of 0. |
OK @edeutsch after thinking about it, I believe the best approach is as follows:
|
okay, fine with me. I do not recall a specific problem caused by obsolete nodes. |
sure! if we don't want obsolete nodes, that's fine with me to remove them during the KG2c build process. so in that case, they'd still be a part of the NodeSynonymizer's knowledge base (since it's built off the regular KG2).... I suppose one little hitch is that we would need to make sure the synonymizer never chooses an obsolete curie as the |
I don't currently have a way of checking the deprecated property during the NodeSynonymizer build process. The process works from file dumps not a live connection. One possibility is to include the word OBSOLETE in the name. That I can check. And is already done to some extent. |
Okay, as of KG2.5.2 there are very few nodes that have obsolete in the name but deprecated as false:
@saramsey is this sufficient, or should we put a catch somewhere to try to get them all to be marked as deprecated? |
Hi @kvarforl can I see the actual names for some example nodes, along with their |
Maybe something like this?
|
I'm going to close this issue and move the discussion over to #1261 since they're discussing the same question/problem. I added examples in https://github.com/RTXteam/RTX/issues/1261#issuecomment-797736336 :) |
I've been meaning to mention this for a while but keep forgetting. I'm thinking it is not a good thing to include obsolete nodes in KG2. Should we filter these out? Perhaps just keep as a synonym to the new node?
The text was updated successfully, but these errors were encountered: