Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

php-cs-fixer chokes on this file #3996

Closed
flack opened this issue Sep 20, 2018 · 20 comments
Closed

php-cs-fixer chokes on this file #3996

flack opened this issue Sep 20, 2018 · 20 comments
Labels

Comments

@flack
Copy link

flack commented Sep 20, 2018

As discussed in #3224, here's my standalone report:

The PHP version you are using ($ php -v):

PHP 7.2.10-0ubuntu0.18.04.1 (cli) (built: Sep 13 2018 13:45:02) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.2.10-0ubuntu0.18.04.1, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.6.0, Copyright (c) 2002-2018, by Derick Rethans

PHP CS Fixer version you are using ($ php-cs-fixer -V):

PHP CS Fixer 2.13.0 Yogi's BBQ by Fabien Potencier and Dariusz Ruminski

The command you use to run PHP CS Fixer:

php-cs-fixer fix -vvv --rules=no_spaces_after_function_name,elseif,braces,encoding,full_opening_tag,single_blank_line_at_eof,function_declaration,line_ending,method_argument_space,no_trailing_whitespace_in_comment,no_closing_tag,switch_case_semicolon_to_colon,switch_case_space,no_trailing_whitespace,whitespace_after_comma_in_array,whitespace_after_comma_in_array,no_whitespace_before_comma_in_array,no_empty_statement t_user.php 

If applicable, please provide minimum samples of PHP code (as plain text, not screenshots):

I had to rename the file to .txt, otherwise github wouldn't let me upload it. Should be changed back beofre trying to reproduce

t_user.txt

@keradus
Copy link
Member

keradus commented Sep 20, 2018

worst case scenario: automatically skip files that are bigger than 1MB or sth...

@flack
Copy link
Author

flack commented Sep 20, 2018

@keradus I don't think the file size is the problem here. I just deleted half the lines and ran again (630 KB), and still nothing.

My first try to anonymize the file was to do file_get_contents, replace everything with x and write it back to a new file with file_put_contents. This new file (also 1.4MB) is processed by php-cs-fixer in 0.172s, whereas the version attached here never seems to finish (longest I tried was 10 minutes). My (completely uneducated) guess would be that it has something to do with the encoding

@flack
Copy link
Author

flack commented Sep 20, 2018

Actually, ignore my last remark. Turns out I made an error in the PHP anonymized version and had accidentally also replaced the opening <?php tag, so it saw the whole thing as text only..

@flack
Copy link
Author

flack commented Sep 20, 2018

in case it helps: I tried running the rules one by one, and this is the one that's killing it: whitespace_after_comma_in_array. All the others run in a couple of seconds, but this one doesn't seem to terminate

@keradus
Copy link
Member

keradus commented Sep 20, 2018

crazy idea - did you let rule run for 30 minutes ? I wonder is it simply slow, but in the end finishing, or it fall into some never-ending loop

@flack
Copy link
Author

flack commented Sep 20, 2018

I let it run for about an hour, but then I had to reboot the machine because of something else. Will try again tomorrow and see if it finishes before the weekend..

@keradus
Copy link
Member

keradus commented Sep 20, 2018

thanks for info. with 1 hours execution time, I bet more time won't help, it's some infinite loop in that case

@kubawerlos
Copy link
Contributor

I've found a root of the problem - it's whitespace_after_comma_in_array - any time new whitespace is inserted all tokens behind the index must be moved (in a loop).

Add some echo $index; before this line to see it is progressing.

@keradus
Copy link
Member

keradus commented Sep 21, 2018

if we instert sth in the middle of tokens, we shall rearrange the token instances, yes

but that shall still finish execution ultimately, like @flack run it for an hour

@flack
Copy link
Author

flack commented Sep 21, 2018

Small update: I tried running it again, and it actually finished eventually:

time php-cs-fixer fix -vvv --rules=whitespace_after_comma_in_array file.php 
Loaded config default.
Using cache file ".php_cs.cache".
F
Legend: ?-unknown, I-invalid file syntax, file ignored, S-Skipped, .-no changes, F-fixed, E-error
   1) file.php (whitespace_after_comma_in_array)

Fixed all files in 13289.346 seconds, 292.000 MB memory used

real    221m29,486s
user    221m26,669s
sys     0m0,777s

so it took almost four hours to run one fixer on this file. I think it's safe to say that fixing it manually would have been faster, even if I took a lunch break in between :-)

Isn't there a way to optimize this? The file contains 66307 commas, so (if we assume every one of those needed a whitespace), it took 0.2s per comma to process.

@keradus
Copy link
Member

keradus commented Sep 23, 2018

thanks for running it !
out of the blue - if you would run the PHP CS Fixer again on that already fixed file (ensure to remove .php_cs.cache first, if it exists), how long would it execute ? shall be way faster, can you confirm ?


Isn't there a way to optimize this?

if you would find one, you are welcome to incorporate it ;)

File is big, way bigger than file created by human. Maybe we shall officially suggest to not bother about CS for autogenerated, DB-migration/dump files...

Overall issue is that if we have 50k places to insert new token, we need to move the following tokens 50k times, as @kubawerlos mentioned.

In general, it is possible to change how we operate on tokens collection:

  • use SplDoublyLinkedList instead of SplFixedArray, thus adding/removing elements would be O(1) instead of O(n), yet the we would not be able to hop over indexes, so would be tricky
  • use regular array, which implements both, list and array, side effect is unknown, experiment can be done to see the result of performance change, yet experiment may not be fast to implement
  • implement sth like multiInsertAt, similar to insertAt, yet one allowing to insert multiple tokens at the same time, yet at randomly provided indexes - with that approach, we can gather "where to add new elements", without actually adding it at the same time, and then add all new items to collection in single method call, iterating over original collection only once.

Of course, there may be more possible solutions.

From my point of view, amount of work to even try a single solution is huge, while issue we are facing here is not common (almost nobody try to run a PHP CS Fixer against that kind of files)

@flack
Copy link
Author

flack commented Sep 23, 2018

@keradus yeah, that sounds like a lot of effort. A naive idea I just has would be maybe not to insert tokens into the existing stream, but rather write (both existing and new tokens) to a new one. So the function gets the old stream in and returns the modified one. Would use more memory (probably twice as much?), but if you only append to the new stream, you wouldn't have to move so much stuff.

Anyhow, I agree it's a rather exotic situation, and skipping these kinds of files is fine. But when you run against a large file tree, you still have the problem that php-cs-fixer output just stops moving and it gives you no indication where and why. #3997 would help, or like you said earlier maybe skipping files bigger than a certain size (but that size would have to be rather small to get into the minutes range in the worst case I guess).

@flack
Copy link
Author

flack commented Sep 24, 2018

P.S.: Just for completeness's sake: I ran again aginst the fixed file, and as expected it's a lot faster now:

time php-cs-fixer fix -vvv --rules=whitespace_after_comma_in_array file.php                                                                                                                                       
Loaded config default.                                                                                                                     
Using cache file ".php_cs.cache".                                                                                                          
.
Legend: ?-unknown, I-invalid file syntax, file ignored, S-Skipped, .-no changes, F-fixed, E-error

Fixed all files in 6.924 seconds, 224.000 MB memory used

real    0m7,069s
user    0m7,020s
sys     0m0,050s

@keradus
Copy link
Member

keradus commented Sep 25, 2018

awesome, thanks for checking that ;)

keradus added a commit to keradus/PHP-CS-Fixer that referenced this issue Jan 18, 2019
@keradus
Copy link
Member

keradus commented Jan 18, 2019

I know it took a while, @flack , but the fix for this was not obvious.
Please, take a look at #4250 , I added your file as test-case.

What you reported originally is:

Fixed all files in 13289.346 seconds, 292.000 MB memory used

What I experience on my PR with my laptop is:

ker@dus:~/github/PHP-CS-Fixer λ php php-cs-fixer fix -vvv --rules=whitespace_after_comma_in_array  tests/Fixtures/Integration/misc/meta_insert_64566_tokens.test-in.php
Loaded config default from "/home/d.ruminski/github/PHP-CS-Fixer/.php_cs.dist".
Paths from configuration file have been overridden by paths provided as command arguments.
F
Legend: ?-unknown, I-invalid file syntax, file ignored, S-Skipped, .-no changes, F-fixed, E-error
   1) tests/Fixtures/Integration/misc/meta_insert_64566_tokens.test-in.php (whitespace_after_comma_in_array)

Fixed all files in 4.182 seconds, 290.000 MB memory used

3.7h vs 4s ;)

@flack
Copy link
Author

flack commented Jan 22, 2019

@keradus thanks for the followup! And wow, those are some impressive numbers :-)

I've checked out #4250 locally and let it run against the same project as earlier. Unfortunately, there still seems to be something wrong with my local setup, I've started the run like 30mins ago, and it's still ongoing and hogging 100% of one CPU core. I'll try to narrow it down a bit more tomorrow and if I find something interesting, I'll post on the PR

@keradus
Copy link
Member

keradus commented Jan 22, 2019

I only updated one of the rule, maybe other one would require similar improvement. feel free to dig into it, let's solve it as well. thanks !

@flack
Copy link
Author

flack commented Jan 23, 2019

@keradus looks like this was a false alarm: Turns out when you run php-cs-fixer in the PHP-CS-Fixer directory, it'll still take the globally installed version from /usr/local/bin, you have to use ./php-cs-fixer instead, i.e. what I did yesterday was in fact testing the old version again...

So my real numbers look like this:

Old version (2.13.0): 
Fixed all files in 14345.327 seconds, 292.000 MB memory used

New version (2.12.7-DEV + PR #4250): 
Fixed all files in 34.657 seconds, 294.000 MB memory used

So everything looks good now. Thanks again for the massive speedup!

@keradus
Copy link
Member

keradus commented Jan 23, 2019

Great to hear , love to see it working !

@keradus
Copy link
Member

keradus commented Jan 25, 2021

#4250 got merged, closing this topic for now

@keradus keradus closed this as completed Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants