Skip to content

IsShortArrayOrList: refactor for better accuracy and improved performance #392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jrfnl
Copy link
Member

@jrfnl jrfnl commented Oct 25, 2022

Aside from adding a sh*tload more tests, this PR also contains functional changes, though nothing which impacts the API.

Functional (non-test) commits

Arrays::isShortArray()/Lists::isShortList(): consolidate the logic [1]

This is step 1 in a refactor to improve short list/short array determination.

In this commit, the pre-existing code from the Arrays::isShortArray() and Lists::isShortList() methods is consolidated into a class which handles both determinations.

This initial refactor does not contain any functional changes other than to remove (most of) the code which handled BC with PHPCS < 3.7.1.

Includes updating the @covers tags. Only the caching related tests are now still testing the Arrays::isShortArray() and Lists::isShortList() methods. Everything else is testing the new PHPCSUtils\Internal\IsShortArrayOrList class.

IsShortArrayOrList::isShortArrayBracket(): account for newly discovered tokenizer issue

Turns out a square bracket after a non-braced control structure is always tokenized as T_OPEN_SQUARE_BRACKET, while it should be tokenized as T_OPEN_SHORT_ARRAY.

The fix for this issue will be included in PHPCS 3.7.2, but that version hasn't been released yet.

Refs:

IsShortArrayOrList::isInForeach(): improve logic

This commit improves the stability of the IsShortArrayOrList::isInForeach() method by:

  • Using the Context::inForeachCondition() method to determine which part of a foreach condition we are looking at.
  • Handling non-nested short arrays in the "before as" part of the condition, which should improve performance a little.
  • Ensuring that only non-nested short lists in the "after as" part of the condition are handled by this method.
    Anything nested should be handled based on the outer set of brackets as otherwise the results will be unreliable.

This also prevents incorrect results for code like this foreach(["a"=>[]] as $v), where the empty array in the "before as" would previously be identified as a short list due to the double arrow before it.

Includes additional tests.

Refs:

Arrays::isShortArray()/Lists::isShortList(): consolidate the caching [2]

This is step 2 in a refactor to improve short list/short array determination.

While caching had previously already been implemented, as things were, there could be four caches which would each refer to the same result, i.e.

  1. A cache for an opener token passed to Arrays::isShortArray().
  2. A cache for an closer token of the same set of brackets passed to Arrays::isShortArray().
  3. A cache for an opener token of the same set of brackets passed to Lists::isShortList().
  4. A cache for an closer token of the same set of brackets passed to Lists::isShortList().

This also meant that, depending on how the methods were called, the underlying IsShortArrayOrList class could be called up to four times to determine what a set of brackets signified, while the result would already be known after the first time and is not subject to change.

This commit introduces a wrapper class around the IsShortArrayOrList class which will handle the caching and will only create one cache entry for all four potential starting points, making the caching far more efficient and effective.

Includes adding a set of dedicated tests to cover the functionality in the IsShortArrayOrListWithCache class, replacing the previous caching tests in the IsShortArrayOrListTest class.

IsShortArrayOrList: only accept bracket opener tokens

As the new IsShortArrayOrListWithCache class will now be the entry point to the IsShortArrayOrList class, we can rely upon the token being passed to the IsShortArrayOrList always being an opener for a complete set of brackets and can remove some redundancy from the IsShortArrayOrList class.

Includes making the exception for when an incorrect token is passed more specific.

Includes updating the IsShortArrayOrList tests to no longer pass closer tokens to the IsShortArrayOrList class.

IsShortArrayOrList: refactor for accuracy and improved performance [3]

This is step 3 in a refactor to improve short list/short array determination.

This refactor adds a number of extra interim steps to try and avoid, whenever possible, walking the complete file to find "outer" brackets for nested short lists/arrays.

This should improve performance of the functionality.

Additionally, the interim steps will also provide higher accuracy of the results.

Includes adding a set of dedicated tests to cover the functionality in each method in the IsShortArrayOrList class.

The generic PHPCSUtils\Internal\IsShortArrayOrList\IsShortArrayOrListTest still remains for arbitrary short list vs short array tests (though some tests have moved to the method specific test classes).

jrfnl added 19 commits October 25, 2022 10:29
* Add test with PHP 8.0 nullsafe object operators to safeguard against a potential tokenizer issue.
    As per upstream PR squizlabs/PHP_CodeSniffer 3046
* Improve a test case.
    Ensure that the test actually needs to token walk and doesn't hit the "next after is closer of wrapping brackets" condition.
* Improve a test case.
    Ensure that the test actually needs to token walk and doesn't hit the "next after is semicolon" condition.
* Improve some test case names ensuring that all test cases are prefixed with the expected outcome, i.e. "short array" or "short list".
* Add extra test covering a particular parse error with `foreach`.
* Add extra tests involving arrays with operators before/after.
* Move a test up to a more appropriate place as it doesn't involve an invalid syntax.
This is step 1 in a refactor to improve short list/short array determination.

In this commit, the pre-existing code from the `Arrays::isShortArray()` and `Lists::isShortList()` methods is consolidated into a class which handles both determinations.

This initial refactor does not contain any functional changes other than to remove (most of) the code which handled BC with PHPCS < 3.7.1.

Includes updating the `@covers` tags. Only the caching related tests are now still testing the `Arrays::isShortArray()` and `Lists::isShortList()` methods. Everything else is testing the new `PHPCSUtils\Internal\IsShortArrayOrList` class.
This commit moves the original test set for the `IsShortArrayOrList` functionality from the `Utils\Lists` test directory to the `Internal\IsShortArrayOrList` test directory.

Includes updating the `README-IsShortArrayTests.md` file and adding a `README-IsShortListTests.md` file documenting where the tests are located.
…rList` class

This commit updates most tests to use the `IsShortArrayOrList` class directly instead of going via the `Arrays::isShortArray()` and `Lists::isShortList()` wrapper functions.

Reasons:
* Running each tests via both wrapper functions, while the output from the underlying class will be the same (after the code consolidation), is redundant.
* By using the `IsShortArrayOrList` class directly in the tests, we bypass the caching which is handled in the wrapper functions.
    This makes sure that the tests are "pure" and not incidentally affected by the caching.

Additionally:
* The `testValidDataProvider()` methods, which handled some quality control for the tests, have been removed as they are no longer needed.
* Split the tests using the `IsShortArrayOrListTest::dataNotShortArrayShortListBracket()` into two different data sets with associated tests:
    1. A test set for testing the behaviour when non-square bracket/non-short array tokens are being passed.
    2. A test set for testing the behaviour when square bracket/short array tokens which are in actual fact real square brackets are being passed.
* The test class and file names and `@covers` tags  for the "TokenizerBC" tests have been adjusted to be more specific to the method within the `isShortArrayOrList` class covered by these tests.
    The `@covers` tags have also been moved to the class docblocks as these tests now only need one `@covers` tag for the whole class.

Altogether, this reduces the number of tests, while still maintaining the same level of quality control and still covering all tests cases.
… tests

Making sure that:
1. All types of nesting is being tested, i.e. start of outer, middle of outer, end of outer.
2. The contents of the brackets holds no clue to whether it is an array or list, to make sure the test is "pure".

These tests _could_ now be considered redundant as the tokenizer issues they were testing for in the context of BC support for older PHPCS version, should no longer exist now the minimum PHPCS version has gone up.

Keeping the tests in place (and expanding them) will safeguard against regressions in the PHPCS tokenizer, so is the safer choice.
Split off the tests which are specific to the `IsShortArrayOrList::__construct()` method to a dedicated test file and add some additional tests to make sure the method is fully covered.
Split off the tests which are specific to the `IsShortArrayOrList::isSquareBracket()` method to a dedicated test file and add some additional tests just in case.

Note: one test with square brackets remains in the original test file to cover the function call to `isSquareBracket()` within the `solve()` method.
…ed tokenizer issue

Turns out a square bracket after a non-braced control structure is always tokenized as `T_OPEN_SQUARE_BRACKET`, while it should be tokenized as `T_OPEN_SHORT_ARRAY`.

The fix for this issue will be included in PHPCS 3.7.2, but that version hasn't been released yet.

Refs:
* sirbrillig/phpcs-variable-analysis 263
* squizlabs/PHP_CodeSniffer 3632
Split off the tests which are specific to the `IsShortArrayOrList::isInForeach()` method to a dedicated test file.

Includes adding numerous extra tests.

Note: one test involving `foreach` remains in the original test file to cover the function call to `isInForeach()` within the `solve()` method.
This commit improves the stability of the `IsShortArrayOrList::isInForeach()` method by:
* Using the `Context::inForeachCondition()` method to determine which part of a `foreach` condition we are looking at.
* Handling non-nested short arrays in the "before as" part of the condition, which should improve performance a little.
* Ensuring that only non-nested short lists in the "after as" part of the condition are handled by this method.
    Anything nested should be handled based on the _outer_ set of brackets as otherwise the results will be unreliable.

This also prevents incorrect results for code like this `foreach(["a"=>[]] as $v)`, where the empty array in the "before as" would previously be identified as a short list due to the double arrow before it.

Includes additional tests.

Refs:
* PHPCompatibility/PHPCompatibility 1341
Make sure all named data sets use keys within the data set as well for a more readable data provider.
This is step 2 in a refactor to improve short list/short array determination.

While caching had previously already been implemented, as things were, there could be four caches which would each refer to the same result, i.e.
1. A cache for an opener token passed to `Arrays::isShortArray()`.
2. A cache for an closer token of the same set of brackets passed to `Arrays::isShortArray()`.
3. A cache for an opener token of the same set of brackets passed to `Lists::isShortList()`.
4. A cache for an closer token of the same set of brackets passed to `Lists::isShortList()`.

This also meant that, depending on how the methods were called, the underlying `IsShortArrayOrList` class could be called up to four times to determine what a set of brackets signified, while the result would already be known after the first time and is not subject to change.

This commit introduces a wrapper class around the `IsShortArrayOrList` class which will handle the caching and will only create one cache entry for all four potential starting points, making the caching far more efficient and effective.

Includes adding a set of dedicated tests to cover the functionality in the `IsShortArrayOrListWithCache` class, replacing the previous caching tests in the `IsShortArrayOrListTest` class.
As the new `IsShortArrayOrListWithCache` class will now be the entry point to the `IsShortArrayOrList` class, we can rely upon the token being passed to the `IsShortArrayOrList` always being an _opener_ for a complete set of brackets and can remove some redundancy from the `IsShortArrayOrList` class.

Includes making the exception for when an incorrect token is passed more specific.

Includes updating the `IsShortArrayOrList` tests to no longer pass closer tokens to the `IsShortArrayOrList` class.
This is step 3 in a refactor to improve short list/short array determination.

This refactor adds a number of extra interim steps to try and avoid, whenever possible, walking the complete file to find "outer" brackets for nested short lists/arrays.

This should improve performance of the functionality.

Additionally, the interim steps will also provide higher accuracy of the results.

Includes adding a set of dedicated tests to cover the functionality in each method in the `IsShortArrayOrList` class.

The generic `PHPCSUtils\Internal\IsShortArrayOrList\IsShortArrayOrListTest` still remains for arbitrary short list vs short array tests (though some tests have moved to the method specific test classes).
A significant number of these tests would previously lead to incorrect results for the short list/short array determination. The refactor fixed these.

* Add tests involving short lists with short arrays as keys.
    I seriously mean the inline remark: "Please sack anyone who writes code like this".
* Add extra test for nested short list with empties.
* Add extra tests involving comma separated lists of variables as in that case, there is the most significant risk of mis-identifying the token.
* Add extra tests for short arrays/lists as values in a short array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant