WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

igitur · 2018-03-15T09:33:31Z

Follow on from #734. Possible performance improvement.

igitur · 2018-03-15T09:44:37Z

Doesn't seem to make that much of a difference. I suspected it would - especially for whole-column lookups.

Pankraty

This way of caching is not thread-safe, but since the entire calculation engine is not thread-safe too I don't think it's a problem. As for the performance of column-wide computations, I am currently experimenting with the alternative approach that does not require enumeration of empty values. After all, ranges in formulas may cover the whole sheet, and there is not any possibility to enumerate 17 billion values in reasonable time. If I succeed with this optimization the effect of the current PR will become more significant.

igitur · 2018-05-08T15:50:16Z

@Pankraty Do you think we should first quantify the effect on performance before I merge?

Pankraty · 2018-05-08T15:57:47Z

If you have time to measure that would be great. This will give us more information for future optimizations.

Pankraty · 2019-07-09T17:28:20Z

I finally managed to do a little performance testing. Unfortunately, the effect is negligible, if any. Here is the test file I used.

SumIfs.xlsx

Reading I1 value would take ~13 seconds, reading I2 - around 112 seconds, both with this PR and without it. As for I3 and I4, I gave up before the evaluation finished.

Will check out if there are other ways to speed up the computation.

PS. For testing, I rebased this branch on latest develop

igitur · 2019-07-09T18:11:40Z

Yeah, I was thinking to close this eventually without merging.

jahav · 2023-04-13T14:26:33Z

I am closing this PR, because the regexps are not a good way to parse criteria for SUMIF/COUNTIF/database functions.

What is a better way forward: Use a custom parser. There already is a allocation free fast parser for wildcards (Wildcard.Search), as well as proper parser of numbers ScalarValue.TextToNumber (kind of awkward place, but it works), as well as XLErrorParser and few others.

A custom parser will parse the criteria into a comparison op and a value/bool/error/wildcard using other parsers that already exist. Benefits:

More in-line with excel criteria (e.g. ">1 1/2" is a valid criteria in Excel that finds only values greater than 1.5)
No startup cost for compiling
IIRC wildcard was faster than regex
Less dangerous from DoS point of view

Current approach has several drawbacks:

Potential denial of service, there is a reason why regex match function has optional timeout argument. Regex is a basically a program and attacker could supply a very slow program.
Requires allocations
Is mismatched with Excel

igitur requested a review from Pankraty March 15, 2018 09:33

Pankraty approved these changes Mar 15, 2018

View reviewed changes

igitur mentioned this pull request Mar 19, 2018

Optimize tally computations #740

Merged

2 tasks

igitur force-pushed the cache-regexes-for-sumifs branch from b494f41 to 86d1aaf Compare May 4, 2018 15:25

Compile and cache Regexes for SUMIFS and COUNTIFS

b5054c7

igitur force-pushed the cache-regexes-for-sumifs branch from 86d1aaf to b5054c7 Compare May 8, 2018 12:22

igitur changed the title ~~Compile and cache Regexes for SUMIFS and COUNTIFS~~ WIP Compile and cache Regexes for SUMIFS and COUNTIFS Aug 3, 2018

Pankraty mentioned this pull request Jul 9, 2019

Improve computations by caching LastCellUsed and FirstCellUsed #1246

Open

jahav closed this Apr 13, 2023

igitur mentioned this pull request Apr 14, 2023

SUMIFS multiple dimensions #2061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

igitur commented Mar 15, 2018

igitur commented Mar 15, 2018

Pankraty left a comment

igitur commented May 8, 2018

Pankraty commented May 8, 2018

Pankraty commented Jul 9, 2019 •

edited

igitur commented Jul 9, 2019

jahav commented Apr 13, 2023

WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

Conversation

igitur commented Mar 15, 2018

igitur commented Mar 15, 2018

Pankraty left a comment

Choose a reason for hiding this comment

igitur commented May 8, 2018

Pankraty commented May 8, 2018

Pankraty commented Jul 9, 2019 • edited

igitur commented Jul 9, 2019

jahav commented Apr 13, 2023

Pankraty commented Jul 9, 2019 •

edited