Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking refactor #233

Merged
merged 11 commits into from May 19, 2020
Merged

Linking refactor #233

merged 11 commits into from May 19, 2020

Conversation

DeNeutoy
Copy link
Collaborator

@DeNeutoy DeNeutoy commented May 18, 2020

This is an attempt to make it easier to load other KBs trained via the same mechanism as a pipeline.

Changes:

  • UmlsEntity -> Entity
  • types argument to UmlsEntity is now optional, as not every KB will have types
  • Abstract UmlsKnowledgeBase into KnowledgeBase, which doesn't hold the semantic type tree of UMLS.
  • Wrap up all the linker paths into a LinkerPaths namedtuple, so we can reference groups of them by name
  • Pass a name arg to the linker and candidate generator, which is enough to construct the various pre-defined linkers we have.

I'll actually add the MESH linker in a different PR to keep this one manageable.
edit: Yolo it's not much extra code, i'll just add it here.

Once you've reviewed i'll add the data and stuff to s3!

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few small comments/questions. One question about naming, i tagged everywhere that you still called things "UMLS". Was this an intentional decision for backwards compat? an oversight? something else?

scispacy/candidate_generation.py Show resolved Hide resolved
scispacy/candidate_generation.py Show resolved Hide resolved
scispacy/candidate_generation.py Show resolved Hide resolved
scispacy/candidate_generation.py Show resolved Hide resolved
scispacy/candidate_generation.py Outdated Show resolved Hide resolved
scispacy/candidate_generation.py Outdated Show resolved Hide resolved
self.semantic_type_tree: UmlsSemanticTypeTree = construct_umls_tree_from_tsv(
types_file_path
)


class MeshKnowledgeBase(KnowledgeBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe mesh terms have a tree as well, if one of us is able to find that tree somewhere on the internet it would be nice to package it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do, but the only reason we have the tree in there in the first place was because we were messing around with changing the granularity of the types for the NER model, which turned out to not be super useful. Also, the MESH KB lives in this hyper complex sparql database and I really don't want to figure out how to extract the tree haha

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol fair enough. i do think that one of the pieces of value we add here is easy access to these kbs that are pretty complicated to figure out how to access (and people do use the type tree, or at least the levels of the types). so if its not too much work, it'd be great to extract and add, if its difficult, im fine letting it go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true - I'll add a TODO, maybe we can do it later. Mesh has a much better browser for viewing the entities, e.g https://meshb.nlm.nih.gov/record/ui?ui=D002430

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just open an issue, and if someone has a compelling use case we can add it, or maybe they will add it :)

scispacy/linking.py Outdated Show resolved Hide resolved
scispacy/linking.py Show resolved Hide resolved
scispacy/linking.py Outdated Show resolved Hide resolved
@DeNeutoy
Copy link
Collaborator Author

Ok, I moved from umls to kb, including for the spacy span annotations, but I left umls_ents there too for the time being. We can remove it in some later version.

@DeNeutoy DeNeutoy merged commit 0fb1bb0 into allenai:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants