Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17189 Fix DockMakerTest.testRealisticUnicode #2327

Merged
merged 4 commits into from
Apr 10, 2024

Conversation

dsmiley
Copy link
Contributor

@dsmiley dsmiley commented Mar 1, 2024

https://issues.apache.org/jira/browse/SOLR-17189

-Dsolr.bench.seed=1392507964231541

WIP. Didn't fix the problem yet but just tried to make the benchmark tests actually repeatable

@dsmiley
Copy link
Contributor Author

dsmiley commented Mar 2, 2024

I wrote a tiny script that loops over the code points here and there are many whitespace chars, including a space char (ASCII digit 32). This and many others are in the first block. @markrmiller did you intend the "realistic unicode" to include whitespace? What makes these characters "realistic" anyway?

@markrmiller
Copy link
Member

Realistic is not referring to the characters.

The random Unicode character code likely came from Lucene. If there is a regex check that fails in the test, then it’s likely the generator wasn’t intended to generate whitespace characters. I’d bet random string generation is meant to generate a sequence of none whitespace characters.

@dsmiley
Copy link
Contributor Author

dsmiley commented Mar 23, 2024

Okay. For simplicity, let's just remap each whitespace to the first non-whitespace in the chosen block. Or maybe even simpler -- the letter 'X' (hey why not?). Or maybe you might recommend something else.

The coding style / framework here is unusual to me and I think most people. If I had to name it, it'd be "extreme-streaming" or "latent-generation" or I dunno. I won't even bother giving it to ChatGPT as it doesn't know this unique framework. Do you have advice or a tip on how to approach this little programming problem? Feel free to send a commit to this branch :-)

Separately, note this PR includes a fix for the non-repeatability of the randomness. It's not perfect -- the RandomizedContext seed isn't being passed in unless I set it explicitly via the standard tests.seed.

@github-actions github-actions bot removed the tests label Apr 7, 2024
Copy link
Contributor Author

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the issue; the test passed.
I used a wrapping generator to ensure all strings that come through do not have a whitespace char.
If I don't hear back, I'll merge in a couple days.

@dsmiley dsmiley marked this pull request as ready for review April 7, 2024 22:32
@dsmiley
Copy link
Contributor Author

dsmiley commented Apr 7, 2024

After reading some QuickTheories docs, it seems using an assume(Predicate) would be an alternative; less code too. I'll switch it.

@dsmiley dsmiley merged commit 453a23b into apache:main Apr 10, 2024
2 of 3 checks passed
@dsmiley dsmiley deleted the solr17189-bench branch April 10, 2024 12:48
dsmiley added a commit that referenced this pull request Apr 10, 2024
These strings must not have whitespace.

Includes a fix for the non-repeatability of the randomness. It's not perfect -- the RandomizedContext seed isn't being passed in unless it is set explicitly via the standard tests.seed property.

(cherry picked from commit 453a23b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants