Improve conversion performance #60

SethMMorton · 2023-01-30T21:08:48Z

In addition to #57, #59, this PR optimizes performance across many aspects of fastnumbers. Some enhancements were algorithmic, some were utilizing more efficient libraries, and some were micro-optimizations and bithacks.

This PR also solves #28 by using a more robust string to double conversion function (the one mentioned in #57).

codecov · 2023-01-30T21:32:32Z

Codecov Report

Base: 91.64% // Head: 88.15% // Decreases project coverage by -3.50% ⚠️

Coverage data is based on head (e8ef8c7) compared to base (c14959f).
Patch coverage: 85.93% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #60      +/-   ##
==========================================
- Coverage   91.64%   88.15%   -3.50%     
==========================================
  Files           5        6       +1     
  Lines         934      996      +62     
==========================================
+ Hits          856      878      +22     
- Misses         78      118      +40

Impacted Files	Coverage Δ
src/cpp/extractor.cpp	`89.74% <ø> (ø)`
src/cpp/argparse.cpp	`71.24% <71.24%> (ø)`
src/cpp/fastnumbers.cpp	`84.88% <96.92%> (-0.12%)`	⬇️
src/cpp/c_str_parsing.cpp	`99.36% <100.00%> (-0.15%)`	⬇️
src/cpp/parser.cpp	`93.50% <100.00%> (+1.19%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

SethMMorton · 2023-01-31T01:36:24Z

Performance charts before-and-after for the drop-in-replacement functions. Granted, some of the poor "before" performance was bad decisions made in the C++ refactor and the current C implementation is slightly better, but for sure the "after" is much better than the current C implementation.

Python 3.7

Timing comparison of `int` functions

Before

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	14.588 ± 0.366	11.751 ± 0.390
Int String	16.197 ± 2.065	16.573 ± 3.420
Large Int String	23.700 ± 1.429	28.062 ± 0.614
Int	9.575 ± 0.193	11.536 ± 3.479
Float	20.762 ± 2.199	24.159 ± 0.449

After

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	14.186 ± 0.248	7.790 ± 0.138
Int String	15.145 ± 0.093	10.171 ± 0.949
Medium Int String	17.014 ± 0.271	10.367 ± 0.075
Large Int String	22.850 ± 0.070	21.071 ± 0.489
Int	9.258 ± 0.097	6.889 ± 0.079
Float	19.593 ± 0.538	20.350 ± 0.188

Timing comparison of `float` functions

Before

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	12.193 ± 0.287	15.934 ± 2.573
Int String	12.841 ± 2.024	17.251 ± 4.057
Large Int String	38.674 ± 6.709	46.769 ± 2.007
Small Float String	14.901 ± 4.136	15.891 ± 1.810
Float String	31.982 ± 4.383	18.625 ± 1.801
Large Float String	55.922 ± 3.676	64.327 ± 3.662
Int	8.534 ± 0.885	11.937 ± 0.777
Float	7.511 ± 0.249	9.281 ± 1.310

After

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	11.829 ± 0.092	9.585 ± 0.090
Int String	12.384 ± 0.560	9.919 ± 0.055
Medium Int String	13.359 ± 0.051	10.308 ± 0.167
Large Int String	35.909 ± 7.434	14.476 ± 0.049
Small Float String	12.384 ± 1.176	10.130 ± 0.288
Float String	29.721 ± 0.108	11.757 ± 1.495
Large Float String	52.927 ± 1.115	11.749 ± 0.270
Int	8.040 ± 0.050	8.839 ± 0.059
Float	7.213 ± 0.050	6.478 ± 0.057

Python 3.10

Timing comparison of `int` functions

Before

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	9.833 ± 1.317	11.301 ± 0.952
Int String	10.943 ± 0.125	13.034 ± 0.129
Large Int String	17.007 ± 0.399	24.291 ± 0.805
Int	6.068 ± 0.049	10.130 ± 0.627
Float	15.456 ± 0.386	22.657 ± 0.311

After

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	9.631 ± 0.476	7.466 ± 0.151
Int String	10.730 ± 0.715	9.469 ± 0.185
Medium Int String	11.573 ± 0.726	9.414 ± 0.098
Large Int String	16.108 ± 0.168	18.211 ± 0.764
Int	5.848 ± 0.814	6.317 ± 0.146
Float	15.029 ± 0.244	19.011 ± 0.175

Timing comparison of `float` functions

Before

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	8.967 ± 0.509	14.056 ± 1.034
Int String	9.648 ± 0.184	14.064 ± 0.207
Large Int String	28.892 ± 1.026	40.313 ± 0.423
Small Float String	8.720 ± 0.081	13.713 ± 0.017
Float String	26.218 ± 0.220	15.497 ± 0.035
Large Float String	50.403 ± 0.354	61.001 ± 0.382
Int	4.739 ± 0.048	10.855 ± 0.017
Float	3.968 ± 0.051	8.958 ± 0.493

After

Input type	builtin (ms)	fastnumbers (ms)
Small Int String	8.512 ± 0.565	9.487 ± 0.746
Int String	9.216 ± 0.227	9.715 ± 0.150
Medium Int String	10.486 ± 0.162	10.028 ± 0.087
Large Int String	28.977 ± 1.327	14.435 ± 0.182
Small Float String	8.719 ± 0.304	9.946 ± 0.652
Float String	26.140 ± 0.447	11.401 ± 0.158
Large Float String	52.146 ± 1.277	12.464 ± 0.450
Int	4.869 ± 0.045	9.016 ± 0.121
Float	4.060 ± 0.064	6.736 ± 0.116

So I am removing those as a means of communicating timing data. Instead I will just generate markdown directly and store this in the repo.

The previous implementation had as its first priority not invoking the Python exception mechanism, as this is a bit expensive, especially if one is just going to unset it anyway. This optimized the "fail" path at the expense of the numeric path. The logic has been re-thought to optimize for the "happy" path first, and assume that an acceptable trade of for very fast conversions is to have the "fail" path take a little bit longer than before. Scope of the changes - Remove "is_likely" and "might_overflow" pre-checks as they are now no longer needed - Remove the Parser's as_int and as_float methods - we always now just return Python objects - Remove Python's string to double conversion since the C++ parser is now robust enough - Use the integer parser to examine the validity of a string, instead of using a separate full string checker - Add some "peek" methods to the Parser to see if the contained value is of a certain type without having to do a full inspection NOTE: THERE IS A LIE ABOVE!! This commit does not change the C++ string-to-double parser to the more robust version, so many tests fail. The next commit will implement that change.

This is a very fast implementation of string to double conversion that is also very accurate. WIN WIN. This closes #57 and closes #28.

Instead of 100% relying on the python converter for these integers, use std::from_chars which is fast and provides feedback on overflow, which can be used to determine if we must fall back on the python converter.

Much of the character "parsing" code has been micro-optimized. - When comparing against characters, instead of looking at both upper- and lower-case, we use a bit-hack to force to lower and just make one comparison` - Testing for whitespace and digits is now done with lookup tables - Conversion from character to a digit is now done with a lookup table - All these functions are now constexpr (doesn't help performance, but it makes me feel better :) )

After re-writing parse_int and parse_float, a lot of the helpers that had existed are no longer needed. The parsing file is now cleaned up. As a plus, it was found that we were not being safe about parsing, and we have to check we are not at the end of the char array when we evaluate a character. So now parsing is safer.

Instead of creating a python float object, then seeing if it is int-like and then creating a python long object from there, it is now checked to see which of these two objects should be created.

Code has been added that can parse 8 characters at a time.

This uses the "vector call" functionality to call C functions. It requires fewer memory allocations by Python to call the function and thus is a bit faster. Unsurprisingly, this gave a 1-2 microsecond speedup across the board in the performance tests I ran. Unfortunately, Python does not ship a parser for this, so I adapted the parser from the numpy project. This closes #59.

dig, max_exp, and min_exp no longer have any meaning since fast_float::from_chars was introduced. max_int_len will not have any meaning in a future commit.

Not a performance issue, but stil needed to be fixed.

Python uses long to store non-big integers internally, so originally that is what was being used to parse non-big integers. Some compilers still have 32-bit long (MSVC). So, it is more efficient to use 64-bit int to always use our fast parser when possible, then let python decide how to internally represent it.

Old and new functions have different allow_underscores default, so we have to normalize to do the tests.

SethMMorton force-pushed the improve-conversion-performance branch 11 times, most recently from cc594fd to 1140c95 Compare January 31, 2023 01:22

SethMMorton added 15 commits January 30, 2023 19:45

I hate jypter notebooks

11371b1

So I am removing those as a means of communicating timing data. Instead I will just generate markdown directly and store this in the repo.

Use fast_float::from_chars

cf3a51a

This is a very fast implementation of string to double conversion that is also very accurate. WIN WIN. This closes #57 and closes #28.

Add fast-path parsing for non-base-10 integers

1b37e0b

Instead of 100% relying on the python converter for these integers, use std::from_chars which is fast and provides feedback on overflow, which can be used to determine if we must fall back on the python converter.

Avoid unnecessary object allocation in coerce/force int

cf4cd71

Instead of creating a python float object, then seeing if it is int-like and then creating a python long object from there, it is now checked to see which of these two objects should be created.

Add fast bulk parsing

79d62ab

Code has been added that can parse 8 characters at a time.

Remove dig, max_exp, min_exp, and max_int_len

29fb99a

dig, max_exp, and min_exp no longer have any meaning since fast_float::from_chars was introduced. max_int_len will not have any meaning in a future commit.

Put erroneous Buffer function definition into buffer.h

a4f72ba

Not a performance issue, but stil needed to be fixed.

Update documentation to reflect new profiling results

429c7c2

Correct static analysis issues

395767c

Fix backwards-compatibilty test bug

e8ef8c7

Old and new functions have different allow_underscores default, so we have to normalize to do the tests.

SethMMorton force-pushed the improve-conversion-performance branch from 1140c95 to e8ef8c7 Compare January 31, 2023 03:54

SethMMorton merged commit d7e41fa into main Jan 31, 2023

SethMMorton deleted the improve-conversion-performance branch January 31, 2023 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve conversion performance #60

Improve conversion performance #60

SethMMorton commented Jan 30, 2023

codecov bot commented Jan 30, 2023 •

edited

SethMMorton commented Jan 31, 2023

Improve conversion performance #60

Improve conversion performance #60

Conversation

SethMMorton commented Jan 30, 2023

codecov bot commented Jan 30, 2023 • edited

Codecov Report

SethMMorton commented Jan 31, 2023

Python 3.7

Timing comparison of int functions

Before

After

Timing comparison of float functions

Before

After

Python 3.10

Timing comparison of int functions

Before

After

Timing comparison of float functions

Before

After

codecov bot commented Jan 30, 2023 •

edited

Timing comparison of `int` functions

Timing comparison of `float` functions

Timing comparison of `int` functions

Timing comparison of `float` functions