Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Compile and cache Regexes for SUMIFS and COUNTIFS #736

Closed
wants to merge 1 commit into from

Conversation

igitur
Copy link
Member

@igitur igitur commented Mar 15, 2018

Follow on from #734. Possible performance improvement.

@igitur igitur requested a review from Pankraty March 15, 2018 09:33
@igitur
Copy link
Member Author

igitur commented Mar 15, 2018

Doesn't seem to make that much of a difference. I suspected it would - especially for whole-column lookups.

Copy link
Member

@Pankraty Pankraty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of caching is not thread-safe, but since the entire calculation engine is not thread-safe too I don't think it's a problem. As for the performance of column-wide computations, I am currently experimenting with the alternative approach that does not require enumeration of empty values. After all, ranges in formulas may cover the whole sheet, and there is not any possibility to enumerate 17 billion values in reasonable time. If I succeed with this optimization the effect of the current PR will become more significant.

@igitur igitur mentioned this pull request Mar 19, 2018
2 tasks
@igitur igitur force-pushed the cache-regexes-for-sumifs branch from b494f41 to 86d1aaf Compare May 4, 2018 15:25
@igitur igitur force-pushed the cache-regexes-for-sumifs branch from 86d1aaf to b5054c7 Compare May 8, 2018 12:22
@igitur
Copy link
Member Author

igitur commented May 8, 2018

@Pankraty Do you think we should first quantify the effect on performance before I merge?

@Pankraty
Copy link
Member

Pankraty commented May 8, 2018

If you have time to measure that would be great. This will give us more information for future optimizations.

@igitur igitur changed the title Compile and cache Regexes for SUMIFS and COUNTIFS WIP Compile and cache Regexes for SUMIFS and COUNTIFS Aug 3, 2018
@Pankraty
Copy link
Member

Pankraty commented Jul 9, 2019

I finally managed to do a little performance testing. Unfortunately, the effect is negligible, if any. Here is the test file I used.

SumIfs.xlsx

Reading I1 value would take ~13 seconds, reading I2 - around 112 seconds, both with this PR and without it. As for I3 and I4, I gave up before the evaluation finished.

Will check out if there are other ways to speed up the computation.

PS. For testing, I rebased this branch on latest develop

@igitur
Copy link
Member Author

igitur commented Jul 9, 2019

Yeah, I was thinking to close this eventually without merging.

@jahav
Copy link
Member

jahav commented Apr 13, 2023

I am closing this PR, because the regexps are not a good way to parse criteria for SUMIF/COUNTIF/database functions.

What is a better way forward: Use a custom parser. There already is a allocation free fast parser for wildcards (Wildcard.Search), as well as proper parser of numbers ScalarValue.TextToNumber (kind of awkward place, but it works), as well as XLErrorParser and few others.

A custom parser will parse the criteria into a comparison op and a value/bool/error/wildcard using other parsers that already exist. Benefits:

  • More in-line with excel criteria (e.g. ">1 1/2" is a valid criteria in Excel that finds only values greater than 1.5)
  • No startup cost for compiling
  • IIRC wildcard was faster than regex
  • Less dangerous from DoS point of view

Current approach has several drawbacks:

  • Potential denial of service, there is a reason why regex match function has optional timeout argument. Regex is a basically a program and attacker could supply a very slow program.
  • Requires allocations
  • Is mismatched with Excel

@jahav jahav closed this Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants