New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linking refactor #233
Linking refactor #233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few small comments/questions. One question about naming, i tagged everywhere that you still called things "UMLS". Was this an intentional decision for backwards compat? an oversight? something else?
scispacy/umls_utils.py
Outdated
self.semantic_type_tree: UmlsSemanticTypeTree = construct_umls_tree_from_tsv( | ||
types_file_path | ||
) | ||
|
||
|
||
class MeshKnowledgeBase(KnowledgeBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe mesh terms have a tree as well, if one of us is able to find that tree somewhere on the internet it would be nice to package it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do, but the only reason we have the tree in there in the first place was because we were messing around with changing the granularity of the types for the NER model, which turned out to not be super useful. Also, the MESH KB lives in this hyper complex sparql database and I really don't want to figure out how to extract the tree haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol fair enough. i do think that one of the pieces of value we add here is easy access to these kbs that are pretty complicated to figure out how to access (and people do use the type tree, or at least the levels of the types). so if its not too much work, it'd be great to extract and add, if its difficult, im fine letting it go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true - I'll add a TODO, maybe we can do it later. Mesh has a much better browser for viewing the entities, e.g https://meshb.nlm.nih.gov/record/ui?ui=D002430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just open an issue, and if someone has a compelling use case we can add it, or maybe they will add it :)
Ok, I moved from |
This is an attempt to make it easier to load other KBs trained via the same mechanism as a pipeline.
Changes:
UmlsEntity
->Entity
types
argument toUmlsEntity
is now optional, as not every KB will have typesUmlsKnowledgeBase
intoKnowledgeBase
, which doesn't hold the semantic type tree of UMLS.LinkerPaths
namedtuple, so we can reference groups of them by namename
arg to the linker and candidate generator, which is enough to construct the various pre-defined linkers we have.I'll actually add the MESH linker in a different PR to keep this one manageable.edit: Yolo it's not much extra code, i'll just add it here.
Once you've reviewed i'll add the data and stuff to s3!