Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LANG-1576] refine StringUtils.chomp #565

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

XenoAmess
Copy link
Contributor

as title.

@XenoAmess XenoAmess changed the title refine chomp [LANG-1576] refine chomp Jun 29, 2020
@XenoAmess XenoAmess changed the title [LANG-1576] refine chomp [LANG-1576] refine StringUtils.chomp Jun 29, 2020
@coveralls
Copy link

coveralls commented Jun 29, 2020

Coverage Status

Coverage increased (+0.0003%) to 94.958% when pulling 46350d4 on xenoamess-fork:refine_chomp into 8d35d66 on apache:master.

@sebbASF
Copy link
Contributor

sebbASF commented Jul 26, 2020

If I understand the benchmark correctly, the New method is approx 3% slower for Test1 and about 2% faster for Test2.
Does not seem like a huge gain.

@XenoAmess
Copy link
Contributor Author

Does not seem like a huge gain.

Yes, it isn't.

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At current this benchmark is not testing the differences between the methods in the manner recommended by JMH. The results for test1 and test2 for new and old methods are within the 99.9% confidence level for each and thus are inconclusive.

I would improve the test to hit all code paths and ensure the result of the chomp method is used.

@sebbASF
Copy link
Contributor

sebbASF commented Jul 26, 2020

The JMH results are very long and obscure the thread of the PR.

Is it possible to post just the summary inline, with a link to the full results in an attachment?
That would make it much easier to follow the PR.

@XenoAmess XenoAmess force-pushed the refine_chomp branch 2 times, most recently from 441603a to 833bfc0 Compare July 28, 2020 10:42
@XenoAmess
Copy link
Contributor Author

XenoAmess commented Jul 28, 2020

The JMH results are very long and obscure the thread of the PR.

Is it possible to post just the summary inline, with a link to the full results in an attachment?
That would make it much easier to follow the PR.

@sebbASF
Yes, you are right.
I will try ubuntu pastebin.

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Jul 28, 2020

As a result:
I re-refine the codes and redone the jmh test.
Sorry for the delay because I was quite busy yesterday, and the test takes too long time(several hours on my pc.)
full jmh result at:
https://pastebin.ubuntu.com/p/wbsqtzrkyC/
In short:

Benchmark                            Mode  Cnt      Score     Error  Units
StringUtilsChompTest.test1New        avgt   25      2.368 ?  0.011  ns/op

StringUtilsChompTest.test1Old        avgt   25      2.434 ?  0.102  ns/op

StringUtilsChompTest.test2New        avgt   25      2.364 ?  0.010  ns/op

StringUtilsChompTest.test2Old        avgt   25      2.368 ?  0.013  ns/op

StringUtilsChompTest.test3New        avgt   25      2.355 ?  0.007  ns/op

StringUtilsChompTest.test3Old        avgt   25      2.556 ?  0.035  ns/op

StringUtilsChompTest.test4New        avgt   25      6.352 ?  0.149  ns/op

StringUtilsChompTest.test4Old        avgt   25      6.494 ?  0.363  ns/op

StringUtilsChompTest.test5New        avgt   25      6.272 ?  0.086  ns/op

StringUtilsChompTest.test5Old        avgt   25      6.248 ?  0.035  ns/op

StringUtilsChompTest.test6New        avgt   25      6.304 ?  0.079  ns/op

StringUtilsChompTest.test6Old        avgt   25      6.298 ?  0.061  ns/op

StringUtilsChompTest.testStringsNew  avgt   25  72976.756 ?304.822  ns/op

StringUtilsChompTest.testStringsOld  avgt   25  78290.023 ?657.792  ns/op

The test shows it is almost even speed when ends with \r or \n, but 6.7% faster when ends with no \r nor \n,which I think is the most cases we use the function.
If I still done something stupidly wrong just tell me, will try to fix it when have any time.
Thanks.

@XenoAmess
Copy link
Contributor Author

performance tests refined.
thanks for help from @aherbert
full test result at : https://pastebin.ubuntu.com/p/mC3wTgsCKT/
in short,

Benchmark                                       Mode  Cnt      Score       Error  Units
StringUtilsChompTest.test10_Random_Strings_New  avgt    5  67574.592 ? 9156.972  ns/op

StringUtilsChompTest.test10_Random_Strings_Old  avgt    5  76213.928 ?12127.719  ns/op

StringUtilsChompTest.test1_Empty_New            avgt    5      2.375 ?    0.087  ns/op

StringUtilsChompTest.test1_Empty_Old            avgt    5      2.387 ?    0.085  ns/op

StringUtilsChompTest.test2_No_Chomp_New         avgt    5      2.370 ?    0.120  ns/op

StringUtilsChompTest.test2_No_Chomp_Old         avgt    5      2.386 ?    0.104  ns/op

StringUtilsChompTest.test3_No_Chomp_New         avgt    5      2.419 ?    0.096  ns/op

StringUtilsChompTest.test3_No_Chomp_Old         avgt    5      2.377 ?    0.162  ns/op

StringUtilsChompTest.test4_R_New                avgt    5      2.402 ?    0.106  ns/op

StringUtilsChompTest.test4_R_Old                avgt    5      2.361 ?    0.023  ns/op

StringUtilsChompTest.test5_N_New                avgt    5      2.357 ?    0.005  ns/op

StringUtilsChompTest.test5_N_Old                avgt    5      2.397 ?    0.167  ns/op

StringUtilsChompTest.test6_R_N_New              avgt    5      6.013 ?    0.245  ns/op

StringUtilsChompTest.test6_R_N_Old              avgt    5      6.133 ?    0.788  ns/op

StringUtilsChompTest.test7_a_N_New              avgt    5      6.407 ?    0.343  ns/op

StringUtilsChompTest.test7_a_N_Old              avgt    5      6.717 ?    1.407  ns/op

StringUtilsChompTest.test8_a_N_New              avgt    5      6.678 ?    0.752  ns/op

StringUtilsChompTest.test8_a_N_Old              avgt    5      6.580 ?    0.208  ns/op

StringUtilsChompTest.test9_a_R_N_New            avgt    5      6.498 ?    0.672  ns/op

StringUtilsChompTest.test9_a_R_N_Old            avgt    5      6.770 ?    2.015  ns/op

@XenoAmess

This comment has been minimized.

@XenoAmess XenoAmess requested a review from aherbert August 8, 2020 04:28
@XenoAmess

This comment has been minimized.

@sebbASF
Copy link
Contributor

sebbASF commented Aug 10, 2020

Thanks - much easier to see what has been changed now.
i.e. cache the string length, and don't use substring unless it is needed.

I don't think the RandomStrings test is a fair benchmark.
The behaviour of the method depends only on the last one or two characters (and the length).
AFAICT the strings are not random and don't have a mix of line-endings.
So I think the test only measures the efficiency of String.substring where nothing needs to be dropped.

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Aug 10, 2020

@sebbASF

I don't think the RandomStrings test is a fair benchmark.

Actually it is fair.
See the string array generation function of that test.

@XenoAmess XenoAmess requested a review from sebbASF August 11, 2020 06:09
@sebbASF
Copy link
Contributor

sebbASF commented Aug 11, 2020

Sorry, I see now that the strings do have a mix of CR and LF endings (or neither).
However, the strings are all of length 2.
The strings are not representative of the likely use cases.

As to the change itself, it looks fine, but IMO the benchmark needs some work.

@XenoAmess
Copy link
Contributor Author

@sebbASF
Hi.

Sorry, I see now that the strings do have a mix of CR and LF endings (or neither).
However, the strings are all of length 2.
The strings are not representative of the likely use cases.

The behaviour of the method depends only on the last one or two characters (and the length).
And I don't think add more chars will help.
If we increase the length to some normal length of our usecases (for example 6) it will be a really large test.

@sebbASF
Copy link
Contributor

sebbASF commented Aug 15, 2020

I think the current huge list of input strings is more about testing functionality rather than performance.

The method only cares about 3 characters: CR, LF or something else (unless it has a bug).
So the performance test needs to check those in various combinations. I think that is just 9 combinations.

It also needs to test the length, because that is used when doing the substring.
The method is likely to be used with textual input so it would make sense to try with a selection of lengths.
Not sure what the maximum should be, probably at least 1000, maybe considerably more.
It might make sense to do these as separate tests to see if the length affects the performance.

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Aug 16, 2020

@sebbASF

I think the current huge list of input strings is more about testing functionality rather than performance.

The method only cares about 3 characters: CR, LF or something else (unless it has a bug).
So the performance test needs to check those in various combinations. I think that is just 9 combinations.

Not really.
More cases are:

  1. null string.
  2. string of length 0 ("")
  3. string of length 1 who ends with '\r'
  4. string of length 1 who ends with '\n'
  5. string of length 1 who ends with normal char

While 3,4,5 be actually handled by a same if, I'd prefer hanle them seperately.
So do 1 and 2.

It also needs to test the length, because that is used when doing the substring.
The method is likely to be used with textual input so it would make sense to try with a selection of lengths.
Not sure what the maximum should be, probably at least 1000, maybe considerably more.
It might make sense to do these as separate tests to see if the length affects the performance.

So you mean we should add test for some specific length of strings?
For example, "a"*1024, "a"*10240" and "a"*102400"?
Fine, then I will add/rerun it if you need this data, at later today.

@sebbASF
Copy link
Contributor

sebbASF commented Aug 16, 2020

You are correct - I was forgetting that the characters could be missing.

As to testing the length, yes I think we do need to test for various lengths.
Longer substrings may take longer to create; if that is true it should show that the new code is better.

@XenoAmess
Copy link
Contributor Author

Hi.
I added some more lengths of Strings to the benchmark, as suggested by @sebbASF . Thanks.
full benchmark at https://pastebin.ubuntu.com/p/yPH4xnKqZd/
In short,

Benchmark                                               (data)  (name)  Mode  Cnt      Score      Error  Units
StringUtilsChompTest.singleString                         NULL     old  avgt    5      3.246 ?   0.091  ns/op

StringUtilsChompTest.singleString                         NULL     new  avgt    5      3.265 ?   0.153  ns/op

StringUtilsChompTest.singleString                        CHAR0     old  avgt    5      3.901 ?   0.251  ns/op

StringUtilsChompTest.singleString                        CHAR0     new  avgt    5      3.863 ?   0.769  ns/op

StringUtilsChompTest.singleString                     CHAR0_CR     old  avgt    5      4.393 ?   0.519  ns/op

StringUtilsChompTest.singleString                     CHAR0_CR     new  avgt    5      4.335 ?   0.393  ns/op

StringUtilsChompTest.singleString                     CHAR0_LF     old  avgt    5      4.279 ?   0.295  ns/op

StringUtilsChompTest.singleString                     CHAR0_LF     new  avgt    5      4.315 ?   0.230  ns/op

StringUtilsChompTest.singleString                  CHAR0_CR_LF     old  avgt    5     10.768 ?   0.643  ns/op

StringUtilsChompTest.singleString                  CHAR0_CR_LF     new  avgt    5      9.901 ?   0.136  ns/op

StringUtilsChompTest.singleString                        CHAR1     old  avgt    5      4.358 ?   0.257  ns/op

StringUtilsChompTest.singleString                        CHAR1     new  avgt    5      3.989 ?   0.117  ns/op

StringUtilsChompTest.singleString                     CHAR1_CR     old  avgt    5     19.181 ?   0.715  ns/op

StringUtilsChompTest.singleString                     CHAR1_CR     new  avgt    5     19.502 ?   1.590  ns/op

StringUtilsChompTest.singleString                     CHAR1_LF     old  avgt    5     20.168 ?   1.003  ns/op

StringUtilsChompTest.singleString                     CHAR1_LF     new  avgt    5     19.578 ?   0.575  ns/op

StringUtilsChompTest.singleString                  CHAR1_CR_LF     old  avgt    5     19.984 ?   1.332  ns/op

StringUtilsChompTest.singleString                  CHAR1_CR_LF     new  avgt    5     19.834 ?   1.017  ns/op

StringUtilsChompTest.singleString                        CHAR2     old  avgt    5      4.408 ?   0.098  ns/op

StringUtilsChompTest.singleString                        CHAR2     new  avgt    5      4.033 ?   0.231  ns/op

StringUtilsChompTest.singleString                     CHAR2_CR     old  avgt    5     19.056 ?   1.116  ns/op

StringUtilsChompTest.singleString                     CHAR2_CR     new  avgt    5     18.928 ?   0.903  ns/op

StringUtilsChompTest.singleString                     CHAR2_LF     old  avgt    5     21.658 ?   2.624  ns/op

StringUtilsChompTest.singleString                     CHAR2_LF     new  avgt    5     20.125 ?   1.185  ns/op

StringUtilsChompTest.singleString                  CHAR2_CR_LF     old  avgt    5     19.865 ?   0.865  ns/op

StringUtilsChompTest.singleString                  CHAR2_CR_LF     new  avgt    5     20.033 ?   1.695  ns/op

StringUtilsChompTest.singleString                     CHAR1024     old  avgt    5      4.469 ?   0.186  ns/op

StringUtilsChompTest.singleString                     CHAR1024     new  avgt    5      4.051 ?   0.146  ns/op

StringUtilsChompTest.singleString                  CHAR1024_CR     old  avgt    5    102.333 ?   1.823  ns/op

StringUtilsChompTest.singleString                  CHAR1024_CR     new  avgt    5    102.237 ?   0.913  ns/op

StringUtilsChompTest.singleString                  CHAR1024_LF     old  avgt    5    104.155 ?   2.604  ns/op

StringUtilsChompTest.singleString                  CHAR1024_LF     new  avgt    5    103.365 ?   2.392  ns/op

StringUtilsChompTest.singleString               CHAR1024_CR_LF     old  avgt    5    103.665 ?   4.241  ns/op

StringUtilsChompTest.singleString               CHAR1024_CR_LF     new  avgt    5    103.453 ?   1.482  ns/op

StringUtilsChompTest.singleString                    CHAR10240     old  avgt    5      4.772 ?   1.314  ns/op

StringUtilsChompTest.singleString                    CHAR10240     new  avgt    5      4.064 ?   0.085  ns/op

StringUtilsChompTest.singleString                 CHAR10240_CR     old  avgt    5    981.277 ?  15.790  ns/op

StringUtilsChompTest.singleString                 CHAR10240_CR     new  avgt    5    985.230 ?  13.679  ns/op

StringUtilsChompTest.singleString                 CHAR10240_LF     old  avgt    5    983.508 ?  15.570  ns/op

StringUtilsChompTest.singleString                 CHAR10240_LF     new  avgt    5   1000.211 ?  48.633  ns/op

StringUtilsChompTest.singleString              CHAR10240_CR_LF     old  avgt    5    990.806 ?   7.380  ns/op

StringUtilsChompTest.singleString              CHAR10240_CR_LF     new  avgt    5    990.001 ?   9.229  ns/op

StringUtilsChompTest.singleString                   CHAR102400     old  avgt    5      4.474 ?   0.142  ns/op

StringUtilsChompTest.singleString                   CHAR102400     new  avgt    5      4.154 ?   0.325  ns/op

StringUtilsChompTest.singleString                CHAR102400_CR     old  avgt    5  10077.261 ? 616.639  ns/op

StringUtilsChompTest.singleString                CHAR102400_CR     new  avgt    5  10002.896 ?  85.774  ns/op

StringUtilsChompTest.singleString                CHAR102400_LF     old  avgt    5  10081.275 ?  64.053  ns/op

StringUtilsChompTest.singleString                CHAR102400_LF     new  avgt    5  10085.129 ?  43.548  ns/op

StringUtilsChompTest.singleString             CHAR102400_CR_LF     old  avgt    5  10044.759 ?  68.778  ns/op

StringUtilsChompTest.singleString             CHAR102400_CR_LF     new  avgt    5  10097.769 ? 101.311  ns/op

StringUtilsChompTest.test_Random_Strings_New               N/A     N/A  avgt    5  76270.389 ?1885.347  ns/op

StringUtilsChompTest.test_Random_Strings_Old               N/A     N/A  avgt    5  82757.374 ?9869.516  ns/op

@XenoAmess
Copy link
Contributor Author

@garydgregory rebased. please find some time to review. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants