Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17263: HttpJdkSolrClient doesn't encode curly braces etc (follow-up PR) #2435

Closed
wants to merge 2 commits into from

Conversation

jdyer1
Copy link
Contributor

@jdyer1 jdyer1 commented May 1, 2024

This is a follow up to PR .

Using @andywebb1975 fix for HttpJdkSolrClient this adds a unit test, adding fq="{!terms f=myfield}value1,value2 to both GET, POST and PUT requests both for HttpJdkSolrClient and Http2SolrClient. Prior to the fix, the test addition fails for GET requests on HttpJdkSolrClient.

I found the modifications to POST and PUT unnecessary, but the GET change is needed.

- fix for GET only
- Do not double the ?
@@ -238,7 +238,7 @@ private PreparedRequest prepareGet(
validateGetRequest(solrRequest);
reqb.GET();
decorateRequest(reqb, solrRequest);
reqb.uri(new URI(url + "?" + queryParams));
reqb.uri(new URI(url + queryParams.toQueryString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a closer look. I should say I'm never looked "under the hood" of the clients before and also I'm not familiar with all their use-cases.

I hadn't clocked that toQueryString() adds its own ? - so yes, this line is better than my version.

I do think line 316 should change this way too, as toQueryString() does a more comprehensive job of encoding strings than toString() and it explicitly uses UTF-8 encoding, while the description of toString() indicates it's just intended for logging purposes.

My PR changed the method used at line 305 too, which would have injected an unwanted ? - but I'm not sure that using toString() is right either for the same reasons as above. I think it'd be worth revisiting this block in general - I'm not clear on the intent of the code to be able to say how it should change, but my understanding is that line 302 makes a second reference to the same object, which doesn't seem right to me. I may be wrong!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had removed the change from lines 305 and 316 as adding the fq parameter you supplied the unit test only seemed to show that GET was broken. Likes 305 & 316 are building POST or PUT requests. On the other hand, so long as the extra ? is removed, adding toQueryString() there doesn't seem to break anything either. Yet again, I do not like "fixing" things if I cannot first demonstrate it is broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi James - I've explored this some more. I'm even more convinced we should use toQueryString() now.

#2433 now has a series of commits where I've added test cases and other checks to demonstrate the value of the changes.

The current tests don't exercise the URI encoding enough to show the difference in behaviour - but adding in another U+1234 and some curly quotes triggers failures for the PUT/POST URL generation when using toString() - I saw the same exception as we had previously for GETs.

I've also seen that the request body content length becomes inconsistent with that of the provided query string unless it's pre-encoded by toQueryString() - when we use toString() it's relying on HttpRequest.BodyPublisher re-encoding it. As I've noted in the code comment I wouldn't suggest doing that check for real, but I'll see if I can convert it to a test.

Hope this helps!
Andy

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we embrace randomized testing by indexing a random chunk of text and then query for it using the raw QP? Over time, this will show that escaping is handled properly.

@dsmiley
Copy link
Contributor

dsmiley commented May 16, 2024

I merged the first PR to main, branch_9x, and branch_9_6. Shall this PR be merged too? There are conflicts. Are we all "Watching" the parent JIRA issue? I commented there yesterday.

@andywebb1975
Copy link
Contributor

I merged the first PR to main, branch_9x, and branch_9_6. Shall this PR be merged too? There are conflicts. Are we all "Watching" the parent JIRA issue? I commented there yesterday.

Thanks @dsmiley - no, this one's redundant now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants