SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

dsmiley · 2024-03-01T04:57:53Z

https://issues.apache.org/jira/browse/SOLR-17189

-Dsolr.bench.seed=1392507964231541

WIP. Didn't fix the problem yet but just tried to make the benchmark tests actually repeatable

dsmiley · 2024-03-02T04:11:31Z

I wrote a tiny script that loops over the code points here and there are many whitespace chars, including a space char (ASCII digit 32). This and many others are in the first block. @markrmiller did you intend the "realistic unicode" to include whitespace? What makes these characters "realistic" anyway?

markrmiller · 2024-03-16T07:43:31Z

Realistic is not referring to the characters.

The random Unicode character code likely came from Lucene. If there is a regex check that fails in the test, then it’s likely the generator wasn’t intended to generate whitespace characters. I’d bet random string generation is meant to generate a sequence of none whitespace characters.

dsmiley · 2024-03-23T05:34:21Z

Okay. For simplicity, let's just remap each whitespace to the first non-whitespace in the chosen block. Or maybe even simpler -- the letter 'X' (hey why not?). Or maybe you might recommend something else.

The coding style / framework here is unusual to me and I think most people. If I had to name it, it'd be "extreme-streaming" or "latent-generation" or I dunno. I won't even bother giving it to ChatGPT as it doesn't know this unique framework. Do you have advice or a tip on how to approach this little programming problem? Feel free to send a commit to this branch :-)

Separately, note this PR includes a fix for the non-repeatability of the randomness. It's not perfect -- the RandomizedContext seed isn't being passed in unless I set it explicitly via the standard tests.seed.

dsmiley

I fixed the issue; the test passed.
I used a wrapping generator to ensure all strings that come through do not have a whitespace char.
If I don't hear back, I'll merge in a couple days.

dsmiley · 2024-04-07T22:51:58Z

After reading some QuickTheories docs, it seems using an assume(Predicate) would be an alternative; less code too. I'll switch it.

These strings must not have whitespace. Includes a fix for the non-repeatability of the randomness. It's not perfect -- the RandomizedContext seed isn't being passed in unless it is set explicitly via the standard tests.seed property. (cherry picked from commit 453a23b)

SOLR-17189 Fix DockMakerTest.testRealisticUnicode

513f40b

dsmiley requested a review from markrmiller March 1, 2024 04:57

github-actions bot added the tests label Mar 1, 2024

dsmiley and others added 2 commits April 6, 2024 19:09

Remove whitespace from random unicode strings

fdf5286

Revert DockMakerTest log (it was temporary)

1eeeb4e

github-actions bot removed the tests label Apr 7, 2024

dsmiley commented Apr 7, 2024

View reviewed changes

dsmiley marked this pull request as ready for review April 7, 2024 22:32

Use assuming(Predicate) instead

3e390c6

dsmiley merged commit 453a23b into apache:main Apr 10, 2024
2 of 3 checks passed

dsmiley deleted the solr17189-bench branch April 10, 2024 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

dsmiley commented Mar 1, 2024

dsmiley commented Mar 2, 2024

markrmiller commented Mar 16, 2024

dsmiley commented Mar 23, 2024

dsmiley left a comment

dsmiley commented Apr 7, 2024

SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

Conversation

dsmiley commented Mar 1, 2024

dsmiley commented Mar 2, 2024

markrmiller commented Mar 16, 2024

dsmiley commented Mar 23, 2024

dsmiley left a comment

Choose a reason for hiding this comment

dsmiley commented Apr 7, 2024