ICU performance test results. Maintained by ICU-TC.
The performance tests execution and time measurment is a layered process.
- API invocation iteration: A performance test invocation for an API under tests iterates over the full test data and calls the API once for each test data item. The test data item can be a character, word, line, or the entirety of the test data.
- Loops: Calibration runs determine how many of the API invocation iterations can be executed within a given time, typically 5 seconds. The resulting number is called loops.
- Passes: Pre-defined number of performance test tries, typically ten passes. At each pass the API invocation iteration is executed in loops as often as determined by the calibration.
- Test data: the test data for all ICU character property APIs are all code points from x0 to x10FFFF, i.e. 1,114,112 code points. Each API invocation iteration iterates 1,114,112 times and calls isAlpha() once at each iteration with the respective code point.
- Loops: For illustration, let's assume that the calibration phase determines that the above 1,114,112 iterations can be repeated 1,600 times within 5 seconds.
- Passes: Each pass runs 1,600 loops of 1,114,112 iterations and records the execution time of the 1,600 loops. If the number of passes is set to 10, we get 10 performance numbers.
- Final calculation: The minimum number among the passes is selected and divided by loops*iterations. In this example: "minimum pass"/(1,600 * 1,114,112). The result is the average execution time of one call of isAlpha() from the fastest pass.
The performance tests are all run on Ubuntu and broken out by feature.
-
Charperf Performance tests for character property APIs, cf. the charts. The measured time is for one call of the respective character property API.
-
Utfperf Performance tests for Unicode converters: ucnv_fromUnicode(), ucnv_toUnicode(), ucnv_convertEx(). The measured time is processing time of a codepoint in the input string.
-
Usetperf Performance tests for Uniset APIs add(), contains(), UnicodeSetIterator::get(). Time measured is relative to the size of the USet, which is small (containing U_TITLECASE_LETTER characters) or large (containing U_UNASSIGNED codepoints). Also tests UnicodeSet::applyPattern() for three pre-defined pattern of different complexity.
The following performance tests are additionally broken out by type of test data processed.
-
Collperf: Performance tests for the ICU collation feature. The time measured is per line of the test data for the KeyGen test and for the entire operation of the two more complex qsort and BinarySearch test scenarios.
-
NormPerf: Performance tests for ICU normalization. The time measured is per character from the test data.
-
Ustrperf: Performance tests for a variety of Unicode string functions or operators: constructors, assignments (setTo, =), charAt(), concatenate (+), indexOf(). Time measured is per line of the test data.
-
Strsrchperf: Performance tests for usearch_next() and usearch_previous() for a variety of locales and corresponding test data files. Time measured is per character in the test data file.
Copyright © 2022-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.
The contents of this repository are governed by the Unicode Terms of Use and are released under LICENSE.