[SPARK-48573][SQL] Upgrade ICU version#47011
[SPARK-48573][SQL] Upgrade ICU version#47011mihailomilosevic2001 wants to merge 18 commits intoapache:masterfrom
Conversation
This reverts commit d54453e.
|
@dbatomic Could you review this PR? |
|
I think we should update the benchmark result of |
should run |
|
@LuciferYang Regenerated golden file. As for benchmarks, we are currently working on a new plan to update them, as they are pretty unstable, so those files will be regenerated with some separate PR. |
Convert to draft first to avoid being merged unexpectedly |
|
@mihailom-db fyi: good to go |
|
Can we reverse the order of these cases, and then the |
|
@yaooqinn do you mean printing out 1/Relative, and represent relative time instead of speed? Otherwise if you ask me it makes no difference if we sort fastest to slowest or slowest to fastest. |
|
@mihailom-db @yaooqinn this is an interesting topic (which has nothing to do with one pro of using relative time (essentially the inverse of relative speed in this context) would be better precision - no loss of decimals. However, all other benchmarks in spark rely on BenchmarkBase and compute relative speed, so I would suggest adding a paramter to |
No, the current relative column is Okay to me, but CollationBenchmark is not. Some of the relative values are rounded |
|
Will run benchmarks now, when I do will upload them and mark this as ready. |
|
Actually, @yaooqinn I am not quite sure what you are referring to. Our row that is a group control is UTF8_BINARY as it is the default, backwards compatible implementation of string collators. UTF8_LCASE is our implementation of a collation that is expected to be faster than ICU implemented collations, and UNICODE and UNICODE_CI are completely ICU implemented. What ordering exactly would you like to see, ordered on what column and in asc or desc order? |
|
Using the run order such as |
|
@yaooqinn I wouldn't say that would be a good way to compare collations right now - most / all of these collations are still under development, and it would only make sense to compare them agains As for the "x0.0" problem, this stems from the fact that some collations are very slow compared to others (with the current implementation), but this problem of precision loss can simply be solved by just computing the inverse value: instead of saying that this is a "x0.0" speed-up, let's say it's a "x21.0" slow-down (that is, let's compute thoughts? |
If However, what we have been discussing is not a blocker for merging this PR. |
|
@yaooqinn or @LuciferYang could we move forward with merging this PR in, we will create a PR for benchmark reorganisation in a separate ticket |
|
Merged to master, thank you all |
What changes were proposed in this pull request?
Upgrade of ICU version from 72.1 -> 75.1
Why are the changes needed?
We need to keep the version up-to-date.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests were not broken.
Was this patch authored or co-authored using generative AI tooling?
No.