Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept not found if token order is slightly changed contrary to mentioned note in paper #344

Closed
KimBenjaminTang opened this issue Sep 12, 2023 · 3 comments

Comments

@KimBenjaminTang
Copy link

Hello,

thank you for providing MedCat and also a Demo to try it out!

I found the paper very interesting and read that "MedCAT can ignore token order, but only for up-to two tokens".
This feature seems useful, but I somehow did not manage to test it in the available Demo.

As an example I used these two sentences:

We report on a patient who was under our inpatient care.

Diagnoses: Triple vessel coronary artery disease - slightly reduced syst. LVF

And it recognizes "Triple vessel coronary artery disease".
grafik

But when I try to change the token order, the concept is not matched anymore.

Swapped order with vessel and coronary

grafik

Swapped order with coronary and artery

grafik

I also tried this with the downloaded .zip file for the MedCat model specifically trained on SNOMED-CT and MIMIC III and the concept broke whenever the token order was changed by switching two tokens.

In the documentation for CAT.get_entities() I did not find an option to activate this behavior. Is it possible to turn it on or is it not available?

Thanks for your help!

@baixiac
Copy link
Member

baixiac commented Oct 6, 2023

Hi, @KimBenjaminTang, I believe the "only for up-to two tokens" rule applies exclusively when the longest match consists of two tokens while your case has five. For instance, "Description: Intracerebral hemorrhage (very acute ..." vs "Description: Hemorrhage intracerebral (very acute...".

@KimBenjaminTang
Copy link
Author

Hi @baixiac, thanks for your clarification! I understand now that I misunderstood how the phrase "only for up-to two tokens" was meant in the paper. I thought that it meant, that up to two tokens can be in a wrong order, not that the concept overall can only consist of two tokens and that then, in that case, the two tokens can have a mixed order.

This topic was discussed a bit more in the issue I opened on CogStack previously and is concluded from my side, since a workaround was suggested there and it poses a bit more of a complex conceptual issue, so I will close it on here.

@mart-r
Copy link
Collaborator

mart-r commented Oct 11, 2023

I will close it on here.

Closing on their behalf.

@mart-r mart-r closed this as completed Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants