Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequireTidyCode doesn't work correctly on files with utf8 #925

Closed
sopov opened this issue Sep 17, 2020 · 8 comments
Closed

RequireTidyCode doesn't work correctly on files with utf8 #925

sopov opened this issue Sep 17, 2020 · 8 comments

Comments

@sopov
Copy link

sopov commented Sep 17, 2020

We have two test files, one with a comment in ASCII and one with utf8-chars:

test.pl:

#!/usr/bin/perl
#
# test

and test8.pl with same content but instert to line with commentte >utf8 chars< st

#!/usr/bin/perl
#
# te тест st

Let's check the files with the only RequireTidyCode-policy:

iMac% perlcritic --single-policy=RequireTidyCode test.pl test8.pl
test.pl source OK
test8.pl:1:1 Code is not tidy (CodeLayout::RequireTidyCode)
iMac%
@petdance petdance changed the title Wrong works of CodeLayout::RequireTidyCode-policy for files with utf8 RequireTidyCode doesn't work correctly on files with utf8 Sep 17, 2020
@intrigeri
Copy link

intrigeri commented Nov 3, 2021

I suppose #876 will fix that.

@lechner
Copy link

lechner commented Dec 15, 2021

Hi,

Earlier versions of perltidy and perlcritic worked fine for years. For some reason, this issue appeared only recently (in Debian's 1.140-1).

The associated Debian bug breaks parts of Lintian's CI (on Salsa).

We do not enforce style for builds, but Perl::Critic is, well, critical for our project maintenance. It ensures that code submitted by a wide range of contributors adheres to a common standard. We implemented almost all policies we could find, and are working on more.

Would it be possible to restore the behavior from Debian's 1.138-2?

Thanks for your great software!

Kind regards
Felix Lechner

@petdance
Copy link
Member

petdance commented Dec 15, 2021

@lechner: We need to know more about what your problems are.

Do you have a minimal example of a file that used to pass but no longer does?

Are 1.138-1 and 1.140 the versions of Perl::Critic that you were using? What is different between 1.138-1 and stock 1.138? And from 1.140-2 and stock 1.140?

I'm looking in the change log and I don't see anything that sounds related to your problem.

1.140 Tue Mar 23 21:42:19 CDT 2021

    [New Features]
    Subroutines::RequireFinalReturn now lets you specify a
    terminal_methods parameter to specify methods that should been as
    terminal.  This is like the terminal_funcs parameter, but for methods.
    Thanks, Robin Smidsrød and Mike Bruins. (GH #920)


1.139_01 Tue Sep  1 23:52:18 CDT 2020

    [Fixes]
    Removed an extra /x in RequireInterpolationOfMetachars.pm that caused
    deprecation warnings in Perl 5.22 and higher.  (GH #822)

    Documentation::RequirePackageMatchesPodName now recognizes the package
    name if it's in C<< I<> >> or C<< B<> >> markup.  Thanks, Renée Bäcker.
    (GH #913)


1.138 Fri Jan 24 15:50:34 CST 2020

    Stable release.  No changes since 1.137_01.

There seem to be a lot of moving parts, and I don't know that Perl::Critic itself is what changed to cause your problem. Nor do I know that it isn't.

@gregoa
Copy link

gregoa commented Dec 16, 2021

@lechner
Copy link

lechner commented Dec 16, 2021

Hi,

I also see the issue when keeping perlcritic at 1.138-2 and upgrading only perltidy to version 20210717-1. Was the issue perhaps misfiled?

I used the test8.pl file from above:

➤ perlcritic --single-policy=RequireTidyCode test.pl
Code is not tidy at line 1, column 1.  See page 33 of PBP.  (Severity: 1)

Thank you!

Kind regards
Felix Lechner

@shancock9
Copy link

shancock9 commented Feb 23, 2022

The latest release of Perl::Tidy has a parameter --encode-output-strings (-eos) which can be set to fix this problem. The parameter was added to fix a similar problem between Code::TidyAll and Perl::Tidy; for more information see houseabsolute/perl-code-tidyall#84.

The problem is that when perltidy decodes text as utf8, it returns a decoded string by default. The new flag -eos tells it to return an encoded string, which is what perlcritic needs.

The reason that some users saw this problem arise at some point is that in Perl::Tidy release 20200619 the default encoding was changed from none to guess because the guess algorithm was found to be very reliable and convenient, especially when mixed file types are being processed. It meant that a user did not need to remember and set the encoding of each file. But this caused files such as test8 above to be identified and processed as utf8, and the output was then returned in decoded form to perlcritic, causing an error.

So a simple way to fix the problem would be to set enc='none', and this is the workaround that some tidyall users had been using. This will probably work as long as wide characters are restricted to comments and pod, since they are normally processed by perltidy line-by-line rather than character-by-character, but a more general fix is to update to the latest Perl::Tidy and set the -eos flag. This will allow Perl::Tidy to process files with international characters correctly and avoid any chance that it encounters an unexpected character.

To illustrate how it might be done in the code, so that individual users do not need to set this flag, I updated my version of
module RequireTidyCode.pm like this

    local @ARGV = qw(-nst -nse -nb);

    # Also add --encode-output-strings (-eos) for PT releases in 2022 and later
    # to tell perltidy that we want encoded character strings returned.  See
    # https://github.com/houseabsolute/perl-code-tidyall/issues/84
    # https://github.com/perltidy/perltidy/issues/83
    push @ARGV, '-eos' if ( $Perl::Tidy::VERSION > 20220101 );

With this update, the test case test8.pl did not give an error. The check on version number insures that this module will still work with older versions of Perl::Tidy, which would not recognize the new flag.

nschloe pushed a commit to live-clones/lintian that referenced this issue Jun 11, 2022
…ntil Perl-Critic/Perl-Critic#925 is fixed

This more or less reverts 1a3f6ee
(Replace "Copyright (C)" with the Unicode copyright symbol in our own
code for consistency.) for now.
nschloe pushed a commit to live-clones/lintian that referenced this issue Jun 11, 2022
nschloe pushed a commit to live-clones/lintian that referenced this issue Jun 11, 2022
…ic/Perl-Critic#925 is fixed

Actually all of them are copyright comment except for a single Unicode
SECTION SIGN character somewhere else.
nschloe pushed a commit to live-clones/lintian that referenced this issue Jun 11, 2022
…itic#925 is fixed

Lintian's own test t/scripts/01-critic/libs.t fails otherwise.
@shancock9
Copy link

shancock9 commented Jul 25, 2022

Updating to the most recent release of Perl::Tidy (v20220613 or later) should fix this problem. It has the above mentioned -eos flag set 'on' by default.

@sopov
Copy link
Author

sopov commented Jul 26, 2022

confirmed, fixed.

bash-3.2$ docker run -v $PWD:/app -w /app perl:5.36 bash -c 'cpanm Perl::Critic && perlcritic --single-policy=RequireTidyCode test.pl test8.pl'
--> Working on Perl::Critic
...
Successfully installed Perl-Tidy-20220613
...
Successfully installed Perl-Critic-1.140
43 distributions installed
test.pl source OK
test8.pl source OK
bash-3.2$ grep -n . test*.pl
test.pl:1:#!/usr/bin/perl
test.pl:2:#
test.pl:3:# test
test8.pl:1:#!/usr/bin/perl
test8.pl:2:#
test8.pl:3:# te тест st
bash-3.2$

@sopov sopov closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants