Skip to content

Conversation

@oleibman
Copy link
Collaborator

@oleibman oleibman commented Mar 26, 2021

Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity":

  • BASE
  • FACT
  • LCM
  • MDETERM, MINVERSE, MMULT
  • MULTINOMIAL
  • PRODUCT
  • QUOTIENT
  • SERIESSUM
  • SUM
  • SUMPRODUCT

MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition.

Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it.

Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines.

All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be very challenging. It is not something I plan to look at, at least not in the near future.

One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem.

SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF does ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change.

SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now.

FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected.

When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does.

QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version.

The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form.

SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility.

I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.

This is:

- [ ] a bugfix
- [ ] a new feature
- [x] code refactoring

Checklist:

Why this change is needed?

oleibman added 17 commits March 25, 2021 18:00
Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity":
- BASE
- FACT
- LCM
- MDETERM, MINVERSE, MMULT
- MULTINOMIAL
- PRODUCT
- QUOTIENT
- SERIESSUM
- SUM
- SUMPRODUCT

MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition.

Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it.

Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines.

All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be *very* challenging. It is not something I plan to look at, at least not in the near future.

One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem.

SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF *does* ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change.

SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now.

FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected.

When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does.

QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version.

The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form.

SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility.

I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.
@oleibman
Copy link
Collaborator Author

I've got to tell you, the last 4 hours playing whack-a-mole with Scrutinizer over doc blocks were not fun.

@PowerKiKi PowerKiKi merged commit 9239b3d into PHPOffice:master Mar 26, 2021
@PowerKiKi
Copy link
Member

Excellent contribution again, thank you !

It's already merged, but I still have remarks. Have you considered alternatives to "one class with one static method for each excel functions" ?

If all classes have exactly one single public static method, then technically we could consider using pure functions instead ?

Also why the prefix "func" and duplication of class in method name ? Have you considered an alternative such as MathTrig\Base::execute() instead of MathTrig\Base::funcBase() ?

To me it seems slightly more predictable and more discoverable, so better learning curve ?

Also for Scrutinizer, I plan to add PHPStan soonish. This should significantly improve the feedback loop for developers and reduce alerts on Scrutinizer...

@oleibman
Copy link
Collaborator Author

Thank you for merging so quickly. As for your questions:

Using "execute" rather than "funcWhatever" seems like a good idea. I'll use that moving forward. I chose the "func" option because, even though this has been deprecated, I was worried that PHP or one of the static analysis tools might confuse it with a constructor if it matched the class name, even case-insensitively. "Execute" will also solve that problem.

I'll look forward to PHPStan. I can easily run the other tools in my test environment, but not Scrutinizer, so anything which it reports comes as a surprise, and it comes very late - usually about 15 minutes after everything else has completed. If PHPStan can eliminate some of those surprises, I'm all for it.

My background isn't writing object-oriented code (I'm old), so it s possible that I haven't made the same choice as someone more familiar with it would have made. What follows is my explanation of why I did it as I did, not an attempt to say that it's the best way to have done it.

Breaking everything up into its own class seemed like a reasonable choice to me. For example, it follows exactly the model that is being used for all the Test classes. I could have grouped similar methods together (as I did for the three matrix methods), but "similar" is in the mind of the beholder, and the simplistic approach of one Excel function per member seems pretty discoverable, whereas grouping things would make that harder. And there are, of course, many Excel functions which aren't really related to anything else.

A lot of the classes have private methods in addition to the one public method, So, just making them pure functions isn't as straightforward as defining them inside a class. Anyhow, I'm not sure why making them pure functions would be advantageous.

I did consider making all the functions inherit from a base class (like Helpers) which implements the common validation methods as protected rather than public. I just couldn't see why that would be an advantage, and, anyhow, that doesn't really change the one public function per class model. But I would be willing to switch to that if you think that's preferable.

@MarkBaker
Copy link
Member

If all classes have exactly one single public static method, then technically we could consider using pure functions instead ?

Not really a good idea at all, because PHP can't autoload functions. Instead, userland functions are all loaded into memory for every server request. Upshot of that would be excessive memory usage, even if a spreadsheet didn't have any formulae.
Before we started this refactoring exercise, if a spreadsheet referenced just one date or one math function in a formula, then it would load the entirety of the DateTime functions class, or the entirety of the MathTrig functions class. This refactoring means that the script will now load a much smaller class containing just the required function or group of related functions, with less memory usage overall.
A single Excel function per class isn't necessarily optimal though, because that can lead to a lot of code duplication, not just for argument validation checks, but also commonality of execution code, such as returning a PHP DateTime formatted value simply with different format arguments; so grouping avoids those duplications of code.

Using "execute" rather than "funcWhatever" seems like a good idea. I'll use that moving forward. I chose the "func" option because, even though this has been deprecated, I was worried that PHP or one of the static analysis tools might confuse it with a constructor if it matched the class name, even case-insensitively. "Execute" will also solve that problem.

The only real areas of concern are Excel function names that are a PHP reserved word. Way back when, I discovered that the hard way with the VAR() function; TextData wasn't called simply Text because there was an internals RFC at the time to make Text a PHP reserved word (it was subsequently withdrawn before it even reached the voting stage); and last year I had to "play" with MATCH() for PHP8.

execute() is a good alternative. I've been looking at grouping functions more, partly based on their commonality of validations, partly based on their purpose in Excel.... in many ways, that's easier to do with the Statistical and Financial categories of functions. Then I'm defining an appropriate class name based on that purpose (the "loose" groupings found at ExcelFunctions.net are helpful here, though some of their groupings can be taken further, and I'm sub-namespacing in a few cases). Then I try to come up with a method name for the public facing Excel function call based on the purpose as described in the Excel docs, rather than necessarily using Excel's abbreviated (and often obscure) name. But for the Database functions, I did use a more generic evaluate().

e.g.

  • A Financial\Securities\Price class, with price(), priceDiscounted() and priceAtMaturity() methods for Excel's PRICE(), PRICEDISC() and PRICEMAT() functions.
  • or the Financial\InterestRate class, with effective() and nominal() methods for Excel's EFFECT() and NOMINAL()
  • or Statistical\Distributions\Beta class, with distribution() and inverse() methods for Excel's BETADIST() and BETAINV()

A couple of obvious groupings for the Date functions might be DAY()/MONTH()/YEAR() in a single Date class, and HOUR()/MINUTE()/SECOND() in a Time class

In MathTrig, there's the ROUND()/FLOOR()/CEILING() groups of functions.

At least method renaming is easy with an IDE, allowing it to hunt down all calls to that method in the codebase; and it's only the big phpSpreadsheetFunctions array in Calculation.php that needs manually updating. I'll want to sanity check that anyway once this exercise is complete (though there's a unit test that verifies all the entries are callable to help verify that); and I'm also considering a way that can be reduced in memory.
Grouping functions is slightly harder, but again the IDE should be able to identify where an existing class/method is called (keep a list) then once you've grouped similar/related functions together, use that list to change all the callers. Slightly more manual work, but hopefully not too difficult.

My background isn't writing object-oriented code (I'm old),

Age is no excuse ;-) I'm over 60, but I always advocate that learning and exploring is good to keep you feeling younger. And I've been working with PHPExcel/PHPSpreadsheet for over 14 years now. I guess that makes me to blame for allowing a lot of these problems to build up over the years.

And I'm still learning even now, and experimenting with whether validations should be in a separate Helper or Validations class; or a base class that the Excel function classes can extend; or as a Trait.

But the work that we're doing refactoring those Excel function classes is paying real dividends improving the overall quality of the code (making it easier to maintain and easier to test function by function in the future), and improving the test coverage to catch and handle those cases where an invalid argument would otherwise trigger a fatal exception in userland code. Scrutinizer bears this out, with code quality rating going from 4.98 at the beginning of the year to 5.98 today, test coverage up from 77.6% to 81.55%, and issues down from 972 to 780.
And just as importantly, it's putting the codebase in a good place for going forward over the coming year with the new changes that Microsoft have been introducing in Office 365, with Lambdas, and array functions, and objects in cells rather than simply scalar values.

@MarkBaker
Copy link
Member

Ad as a general comment about Helpers class, validation base class or a validation trait; they're all basically internal, not public access, so any approach to rationalising them after refactoring all the Excel functions can be done without breaking BC. It's only the refactoring that requires deprecating the public Excel function methods.

I also want to do a rationalisation of the deprecation docblocks as well before the next release.

@PowerKiKi
Copy link
Member

Thank for your inputs. If I read you both correctly, we should:

  • avoid artificial prefixes such as funcWhatever
  • either use a "standard" evaluate() if there is a single method, or use a descriptive name
  • never use pure function, so we save memory and keep the possibility to have private methods
  • sometimes group a few public methods per class

PS: @oleibman, we invited you to join the team. If you accept you'll be able to close issues, merge PRs, and push directly to this repo (but not on master)

PowerKiKi added a commit that referenced this pull request May 14, 2021
BlackyTay pushed a commit to BlackyTay/PhpSpreadsheet that referenced this pull request Aug 8, 2025
Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity":
- BASE
- FACT
- LCM
- MDETERM, MINVERSE, MMULT
- MULTINOMIAL
- PRODUCT
- QUOTIENT
- SERIESSUM
- SUM
- SUMPRODUCT

MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition.

Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it.

Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines.

All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be *very* challenging. It is not something I plan to look at, at least not in the near future.

One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem.

SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF *does* ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change.

SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now.

FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected.

When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does.

QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version.

The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form.

SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility.

I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.
BlackyTay pushed a commit to BlackyTay/PhpSpreadsheet that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants