Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature / warning requests: Usage of " versus ' and wildcard #180

Open
rkrug opened this issue Oct 19, 2023 · 4 comments
Open

Feature / warning requests: Usage of " versus ' and wildcard #180

rkrug opened this issue Oct 19, 2023 · 4 comments

Comments

@rkrug
Copy link

rkrug commented Oct 19, 2023

After some discussions with the OpenAlex support, I have solved an issue I had with using search, namely I used single quote (') while OpenAlex expects the double quotation mark (") to specify words adjacent to each other.

Also (not a problem), the wildcard character * is stripped from the search (as is the single quotation mark ', which causes a problem).

This is done by Elasticsearch, and not under the control of OpenAlex.

I would therefore suggest two things:

  1. In case of the wildcard character, give a warning in the function oa_query() that the wildcard character is stripped away and Openex is not doing any wildcard expansion (but stemming by default)
  2. In the case of a single quote in the search string, (and possibly also in the others like search?), raise an error, as it has an impact on the result and will result in completely wrong results (default operator is AND if no operator is between two words).

This would help a to make openalexR easier to use and more reliable.

Thanks,

Rainer

@trangdata
Copy link
Collaborator

Thank you for this excellent suggestion @rkrug. 💯 Do you have any particular query examples you could share?

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

Yes - here is the example which I used to solve the "issue" with OpanAlex support.

In a nutshell:

  • bidiversity OR ‘natural environment’ becomes bidiversity OR natural environment becomes bidiversity OR natural AND environment
  • ‘natural environment' OR bidiversity becomes natural environment OR bidiversity becomes natural AND environment OR bidiversity

I do a search with the search term bidiversity OR ‘natural environment’ (the typo in “bidiversity" does not matter) and filter for the doi https://doi.org/10.1111/conl.12377.

The result should be one, as it is with this call:

https://api.openalex.org/works?filter=doi%3Ahttps%3A%2F%2Fdoi.org%2F10.1111%2Fconl.12377&search=%27natural%20environment%2A%27%20OR%20bidiversity

{
"meta":{
"count":1,
"db_response_time_ms":54,
"page":1,
"per_page":25,
},
...
>

But when I change the order of the search terms, the result is zero:

https://api.openalex.org/works?filter=doi%3Ahttps%3A%2F%2Fdoi.org%2F10.1111%2Fconl.12377&search=bidiversity%20OR%20%27natural%20environment%2A%27

{
"meta":{
"count":0,
"db_response_time_ms":71,
"page":1,
"per_page":25,
},
"results":[
],
"group_by":[
]
}

Hope this helps.

Also: as the precedence rules are not that clear, he highly recommended to use brackets.

@yjunechoe
Copy link
Collaborator

Just to be clear - does OpenAlex strip ' as well?

Also I don't know if it's just a formatting thing, but your example sometimes uses and which are not the same as the single quote character ' - not sure if we should catch these for users as well

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

Yes - according tho the info I got from OpenAlex, the single inverted comma / quote ' is stripped as well.

Yes - I copied the code, so everything should be the single inverted comma / single quote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants