Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-4010] Removed the escaping on the query parameter for the discove… #2206

Merged

Conversation

@Raf-atmire
Copy link
Contributor

Raf-atmire commented Sep 17, 2018

This PR removes the escaping that is done on the input from the query parameter in the discover endpoints to pull this functionality equal to previous versions of DSpace which didn't take this into account either. This fixes simple issues like not being able to query on the dc.date.issued metadata field through a query paremeter.

Related Jira ticket: https://jira.duraspace.org/browse/DS-4010

@@ -183,7 +183,7 @@ private DiscoverQuery buildCommonDiscoverQuery(Context context, DiscoveryConfigu
if (StringUtils.isNotBlank(query)) {
//Note that these quotes are needed incase we try to query OR for example.
//If the quotes aren't present, it'll crash.

This comment has been minimized.

Copy link
@tdonohue

tdonohue Sep 17, 2018

Member

@Raf-atmire : Have we tested this change with boolean OR queries? The comment here specifically says the quotes are needed in the setQuery() call, or else boolean OR won't work, etc. If the quotes are definitely not needed, we should also remove the comment.

This comment has been minimized.

Copy link
@Raf-atmire

Raf-atmire Sep 17, 2018

Author Contributor

@tdonohue : I think I typed my explanation wrong. OR queries will work if we're talking about query= dc.date.issued:2007-01-08 OR dc.date.issued:2007-01-09 for example.
A query that doesn't work is query=OR I was told that this query not working properly was a deliberate design and it will be brought up in a coming DSpace meeting.

This comment has been minimized.

Copy link
@tdonohue

tdonohue Sep 17, 2018

Member

@Raf-atmire : Ok, then I'd suggest removing the two line comment here as part of this PR. The comment no longer describes the code.

This comment has been minimized.

Copy link
@Raf-atmire

Raf-atmire Sep 17, 2018

Author Contributor

@tdonohue : That's a good point, I've removed the comments in a new commit. Thanks!

@tdonohue

This comment has been minimized.

Copy link
Member

tdonohue commented Sep 17, 2018

@Raf-atmire : It looks like this small change has test failures, because we have tests that check for the removed double quotes.

@Raf-atmire

This comment has been minimized.

Copy link
Contributor Author

Raf-atmire commented Sep 18, 2018

@tdonohue : Indeed, I completely missed those tests, apologies. The new commit resolves the test failures.

@abollini

This comment has been minimized.

Copy link
Member

abollini commented Sep 20, 2018

it looks good to me. I will appreciate if you can add a test to check for the failure reported in DS-4010
https://dspace7.4science.it/dspace-spring-rest/#https://dspace7.4science.it/dspace-spring-rest/api/discover/search/objects?query=dc.date.issued:"2015-08-27"

Raf-atmire added 5 commits Sep 19, 2018
…t valid and the searchservice threw an exception
…nt instead so that the exception handling picks it up easily and returns a proper error response with headers included
…larguement instead so that the exception handling picks it up easily and returns a proper error response with headers included"

This reverts commit b192552.
…w illegalarguement instead so that the exception handling picks it up easily and returns a proper error response with headers included""

This reverts commit ea42765.
…containing dc.date.issued
Copy link
Contributor

tomdesair left a comment

I'm not sure we can disable escaping without giving this some further thought. You must take into account that may publication titles contain a ":" and brackets which are also a Lucene special characters, for example https://www.repository.cam.ac.uk/handle/1810/280596

We need to make sure that if a user enters a query like "Faithful Infidel: Exploring Conformity (2nd edition)" this does not lead to a Lucene parse exception. This has been discussed before in the past:

@Raf-atmire Can you update the code and add additional integration tests to make sure that users can still enter queries like "Faithful Infidel: Exploring Conformity (2nd edition)" (without quotes) and that these queries return the expected items.

@tdonohue

This comment has been minimized.

Copy link
Member

tdonohue commented Sep 27, 2018

@tomdesair : after discussion in today's DSpace 7 meeting, we've decided to keep this PR mostly "as-is", but ensure that we provide feedback to users to escape special characters in titles by using quotes.

So, my recommendation would be to ensure we can support queries like "Faithful Infidel: Exploring Conformity (2nd edition)" but with quotes (as that search is for an exact match). Therefore, @Raf-atmire, an integration test that can prove that special characters are escaped when surrounded by quotes would be useful.

If someone enters that query without quotes, it will result in no results (and/or an error), but we are looking at ways to provide user feedback in those scenarios. For example, we plan to add in a note at the UI level that says something like "Special characters in phrases or titles must be escaped by using quotes" (and obviously give a list of the special characters).

Overall, this problem seems hard to solve perfectly either way. Escaping the special characters causes issues with advanced search syntax (where you want Solr to recognize the special character). So, we feel we must avoid escaping the characters in order to fully support advanced search syntax -- and therefore provide better error messages / suggestions as to which special characters to escape with quotes (when a user wants an exact match search).

@mwoodiupui

This comment has been minimized.

Copy link
Member

mwoodiupui commented Sep 27, 2018

The problem is simple: we want users to be able to enter either plain strings or Lucene query syntax through the same path. That is: we want colons to be both meaningful and meaningless. Thus, even users who aren't using advanced query syntax have to know enough of it to escape it, and to know when they need to escape it. Unless we separate the two types of queries, so that we can treat them separately, the user is the only entity that knows what his query means and will have to do the treatment himself. We can at best try to guess the user's intent and offer hints.

@tomdesair

This comment has been minimized.

Copy link
Contributor

tomdesair commented Sep 27, 2018

That's fine by me. But I think it's important that we capture this decision in (integration) tests so that next time someone decides to re-add escaping, there is a test that alerts us that it breaks our current decision.

The integration tests we need are:

  1. A query using the Lucene syntax, something like (dc.date.issued:2017 OR dc.date.issued:2018) AND dc.title:*, and verifying the correct outcome (it needs a decent test item set).
  2. A query with "escaped" Lucene characters, something like "Faithful Infidel: Exploring Conformity (2nd edition)" and verify that this does not result in an error and returns the correct item.
  3. A query with Lucene characters that are not escaped and which results in an error, something like Warning: (un)escaped characters ahead!. This should result in an error that is properly handled and if possible presents the client with a meaningful error message.

@Raf-atmire, ping me if you need help!

Copy link
Contributor

tomdesair left a comment

Thanks for the tests @Raf-atmire! Only only have one small minor remark regarding the exception message.

@@ -108,6 +110,7 @@ public SearchResultsRest getSearchObjects(final String query, final String dsoTy

} catch (SearchServiceException e) {
log.error("Error while searching with Discovery", e);
throw new IllegalArgumentException();

This comment has been minimized.

Copy link
@tomdesair

tomdesair Oct 3, 2018

Contributor

Maybe change this to new IllegalArgumentException("Error while searching with Discovery: " + e.getMessage()); or new IllegalArgumentException("Error while searching with Discovery, this might be a query syntax exception"); to make clients aware that we use a specific query syntax.

This comment has been minimized.

Copy link
@tomdesair

tomdesair Oct 3, 2018

Contributor

Could you also make sure that that message reaches the client (e.g. HAL Browser)?

This comment has been minimized.

Copy link
@tomdesair

tomdesair Oct 3, 2018

Contributor

I now also realize that Warning: (un)escaped characters ahead! has probably a valid syntax. But it would not return the item with title "Warning: (un)escaped characters ahead!".

Can you maybe also add an additional test query call for the "Faithful Infidel: Exploring Conformity (2nd edition)" test item but without escaping (thus query Faithful Infidel: Exploring Conformity (2nd edition))? My guess is that that query returns 0 results.

Copy link
Member

tdonohue left a comment

@Raf-atmire : Overall this is looking good, but I agree with @tomdesair on one point below. I think the Tests are likely "good enough" as-is, as the provide the main verification we were looking for. We can enhance them later as needed, but they seem to prove out that complex Lucene queries work, character escaping works, and that invalid Lucene syntax throws an error.

Once the below empty error is fixed, I'd give this a 👍

@@ -108,6 +110,7 @@ public SearchResultsRest getSearchObjects(final String query, final String dsoTy

} catch (SearchServiceException e) {
log.error("Error while searching with Discovery", e);
throw new IllegalArgumentException();

This comment has been minimized.

Copy link
@tdonohue

tdonohue Oct 26, 2018

Member

As @tomdesair noted here, this Exception should not be empty, as it provides no information to the user (and "swallows" the actual error). Please update this to be: new IllegalArgumentException("Error while searching with Discovery: " + e.getMessage()); or similar.

This comment has been minimized.

Copy link
@benbosman

benbosman Oct 31, 2018

Member

@tdonohue: @Raf-atmire just committed this modification

Copy link
Member

tdonohue left a comment

👍 Looks good to me now. Merging, as this is at +2.

@tdonohue tdonohue merged commit 1f15bb8 into DSpace:master Oct 31, 2018
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+1.7%) to 29.032%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.