-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Continue MathTrig Breakup - Problem Children #1954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity": - BASE - FACT - LCM - MDETERM, MINVERSE, MMULT - MULTINOMIAL - PRODUCT - QUOTIENT - SERIESSUM - SUM - SUMPRODUCT MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition. Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it. Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines. All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be *very* challenging. It is not something I plan to look at, at least not in the near future. One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem. SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF *does* ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change. SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now. FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected. When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does. QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version. The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form. SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility. I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.
|
I've got to tell you, the last 4 hours playing whack-a-mole with Scrutinizer over doc blocks were not fun. |
|
Excellent contribution again, thank you ! It's already merged, but I still have remarks. Have you considered alternatives to "one class with one static method for each excel functions" ? If all classes have exactly one single public static method, then technically we could consider using pure functions instead ? Also why the prefix "func" and duplication of class in method name ? Have you considered an alternative such as To me it seems slightly more predictable and more discoverable, so better learning curve ? Also for Scrutinizer, I plan to add PHPStan soonish. This should significantly improve the feedback loop for developers and reduce alerts on Scrutinizer... |
|
Thank you for merging so quickly. As for your questions: Using "execute" rather than "funcWhatever" seems like a good idea. I'll use that moving forward. I chose the "func" option because, even though this has been deprecated, I was worried that PHP or one of the static analysis tools might confuse it with a constructor if it matched the class name, even case-insensitively. "Execute" will also solve that problem. I'll look forward to PHPStan. I can easily run the other tools in my test environment, but not Scrutinizer, so anything which it reports comes as a surprise, and it comes very late - usually about 15 minutes after everything else has completed. If PHPStan can eliminate some of those surprises, I'm all for it. My background isn't writing object-oriented code (I'm old), so it s possible that I haven't made the same choice as someone more familiar with it would have made. What follows is my explanation of why I did it as I did, not an attempt to say that it's the best way to have done it. Breaking everything up into its own class seemed like a reasonable choice to me. For example, it follows exactly the model that is being used for all the Test classes. I could have grouped similar methods together (as I did for the three matrix methods), but "similar" is in the mind of the beholder, and the simplistic approach of one Excel function per member seems pretty discoverable, whereas grouping things would make that harder. And there are, of course, many Excel functions which aren't really related to anything else. A lot of the classes have private methods in addition to the one public method, So, just making them pure functions isn't as straightforward as defining them inside a class. Anyhow, I'm not sure why making them pure functions would be advantageous. I did consider making all the functions inherit from a base class (like Helpers) which implements the common validation methods as protected rather than public. I just couldn't see why that would be an advantage, and, anyhow, that doesn't really change the one public function per class model. But I would be willing to switch to that if you think that's preferable. |
Not really a good idea at all, because PHP can't autoload functions. Instead, userland functions are all loaded into memory for every server request. Upshot of that would be excessive memory usage, even if a spreadsheet didn't have any formulae.
The only real areas of concern are Excel function names that are a PHP reserved word. Way back when, I discovered that the hard way with the
e.g.
A couple of obvious groupings for the Date functions might be In MathTrig, there's the ROUND()/FLOOR()/CEILING() groups of functions. At least method renaming is easy with an IDE, allowing it to hunt down all calls to that method in the codebase; and it's only the big
Age is no excuse ;-) I'm over 60, but I always advocate that learning and exploring is good to keep you feeling younger. And I've been working with PHPExcel/PHPSpreadsheet for over 14 years now. I guess that makes me to blame for allowing a lot of these problems to build up over the years. And I'm still learning even now, and experimenting with whether validations should be in a separate Helper or Validations class; or a base class that the Excel function classes can extend; or as a Trait. But the work that we're doing refactoring those Excel function classes is paying real dividends improving the overall quality of the code (making it easier to maintain and easier to test function by function in the future), and improving the test coverage to catch and handle those cases where an invalid argument would otherwise trigger a fatal exception in userland code. Scrutinizer bears this out, with code quality rating going from 4.98 at the beginning of the year to 5.98 today, test coverage up from 77.6% to 81.55%, and issues down from 972 to 780. |
|
Ad as a general comment about Helpers class, validation base class or a validation trait; they're all basically internal, not public access, so any approach to rationalising them after refactoring all the Excel functions can be done without breaking BC. It's only the refactoring that requires deprecating the public Excel function methods. I also want to do a rationalisation of the deprecation docblocks as well before the next release. |
|
Thank for your inputs. If I read you both correctly, we should:
PS: @oleibman, we invited you to join the team. If you accept you'll be able to close issues, merge PRs, and push directly to this repo (but not on master) |
As decided in #1954 (comment)
Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity": - BASE - FACT - LCM - MDETERM, MINVERSE, MMULT - MULTINOMIAL - PRODUCT - QUOTIENT - SERIESSUM - SUM - SUMPRODUCT MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition. Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it. Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines. All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be *very* challenging. It is not something I plan to look at, at least not in the near future. One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem. SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF *does* ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change. SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now. FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected. When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does. QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version. The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form. SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility. I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.
Continuing the process of breaking MathTrip.php up into smaller classes. This round takes care of all functions which might be an impediment to installing due to either uncovered code or "complexity":
MathTrig and the members in directory MathTrig are now 100% covered. Many tests have been added, and some edge-case bugs are corrected. Some cases where PhpSpreadsheet had rejected numeric values stored as strings have been changed to accept them whenever Excel does; there had been no tests for that condition.
Boolean arguments are now accepted as arguments wherever Excel accpets them. Taking a cue from what has been done in Engineering, the parameter validation now happens in a routine which issues Exceptions for invalid values; this simplifies the code in the functions themselves. Thank you for doing that; I did not foresee how useful that was when I first looked at it.
Consistent with earlier changes of this nature, the versions in the MathTrig class remain, with a doc block indicating deprecation, and a stub call to the new routines.
All tests except for MINVERSE and MMULT are now handled in the context of a spreadsheet rather than a direct call to the calculation function which implements it. PhpSpreadsheet would need to handle dynamic arrays in order to test MINVERSE and MMULT in a spreadsheet context. Implementing that looks like it might be very challenging. It is not something I plan to look at, at least not in the near future.
One parsing problem turned up in the test conversion. It is in one of the SUMIF tests. It takes me to an area in Calculation where the comment says "I don't even want to know what you did to get here". It did not show up in the previous incarnation because, by using a direct call, the previous test managed to bypass the parsing. I have confirmed that this problem shows up in earlier releases of PhpSpreadsheet, so the changes in this PR did not cause it - they merely exposed it. I have left the test intact, but marked it "incomplete" for documentation purposes. I have not been able to get a handle on what's going wrong yet. I will probably open an issue on it if I can't resolve it soon. However, the test in question isn't a "real world" issue, and the error wasn't caused by this change, so I see no reason to delay this pending a resolution of the problem.
SUM had an idiosyncratic moment of its own. It had been ignoring non-numeric values, but Excel returns VALUE in that situation. So I changed it and wrote some new tests, which worked, but ... SUMIF uses several levels of indirection to get to SUM, and SUMIF does ignore non-numeric values, so a SUMIF test broke. SUM is a really simple function; the most practical approach seemed to be to clone it, with the string-accepting version being used by the Legacy version (which is called by SUMIF), and the non-string-accepting version being used in the Calculation Function table. That seems far easier and more practical than, for instance, adding a boolean parameter to the variable parameter list. As a follow-up, I will change SUMIF to explicitly call the appropriate new version, but I did not want to add that to this already large change.
SUM again - although it was fully covered beforehand, there was not a specific test member for it. There is now.
FACT had been coded to fail Gnumeric requests where the numeric argument has a decimal portion. However, Gnumeric does accept such an argument, and, unlike Excel and ODS, does not truncate it, but returns the result of a Gamma function call instead. This has been corrected.
When LCM included arguments which contained both 0 and a negative number, it returned 0 or NUM, whichever it found first. It is changed to always return NUM in that circumstance, as Excel does.
QUOTIENT had been documented as taking a variadic list of arguments. In fact, it takes exactly 2 - numerator and denominator - and the docblock and signature is fixed, even in the deprecated version.
The SERIESSUM docbock and signature are more accurate, even in the deprecated version. It is changed to ignore nulls, as Excel does, rather than return VALUE, and is one of the routines which previously rejected numbers in string form.
SUBTOTAL tests had used mocking for some reason. These are replaced with normal tests. And SUBTOTAL had a big surprise in store. That part of it which deals with hidden cells cares only whether the row is hidden, and doesn't care about the column's visibility.
I struggled with whether it should be SubTotal or Subtotal. I think the latter is correct, so that's how I proceeded. I don't think there are likely to be any other capitalization controversies.
This is:
Checklist:
Why this change is needed?