Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query failure for MeSH terms that contain parentheses #165

Open
AlastairKelly opened this issue May 17, 2021 · 1 comment
Open

Query failure for MeSH terms that contain parentheses #165

AlastairKelly opened this issue May 17, 2021 · 1 comment

Comments

@AlastairKelly
Copy link

Although these have were all cleaned up around 2019, some MeSH descriptors used to contain parentheses, like "outcome assessment (health care)". These variants are still included in the entry terms. However, I am totally unable to get a match on these terms using either the SPARQL endpoint or the Lookup API. They do not generate an error, but retrieve an empty set--no matches. I noticed that neither option on the Swagger page used % encoding for the parentheses characters, so I tried replacing them manually in the generated URLS with %28 and %29, but this still yields an empty result.

We have only three of these in our dataset, so I think we will end up just manually alter them, but I would still like to know if there is something I could be doing that would make these function as expected.

@danizen
Copy link
Contributor

danizen commented May 17, 2021

Smart efforts on your part. Terms have a preferred label, meshv:prefLabel, and zero or more alternate labels, meshv:altLabel. See https://hhs.github.io/meshrdf/terms. Our string literals also use a Language code, which believe is likely to be the issue in your case. In any case, despite the clean-up effort in 2019, there are definitely terms still in MeSH containing parenthesis in their labels - https://id.nlm.nih.gov/mesh/T082417.html is one example.

I can query for it by using a string literal, but even though we do not have any alternate languages of MeSH, we still use language-typed string literals to allow for this in fiture. That is likely the issue you have run into. Here is an example query:

PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>


SELECT ?term
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
  ?term a meshv:Term .
  ?term meshv:prefLabel "3-methyl-s-triazolo(3,4-a)phthalazine"@en .
}

Without the "@en" after the label, it would not match.

Another possible issue - I have told others to do fast string search for terms using bif:contains, and may have told you - not sure. bif:contains is a fast boolean keyword search, but since it searches keywords, it will not match punctuation. Text information retrieval has it easy - one can decide to ignore punctuation, and thus argue forever about the definition of "keyword"! In this case, arguing is fruitless because the powerful keyword seach bif:contains comes built-in to Virtuoso, and we have not had the resources to chart our own course for this product.

If none of these suggestions help, consider downloading our MeSH database locally. Once it is local, you can also use edit distance metrics such as Levenshtein depending on your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants