Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to turn on utf8 encoding #374

Closed
schwern opened this issue Apr 23, 2013 · 11 comments
Closed

Easy way to turn on utf8 encoding #374

schwern opened this issue Apr 23, 2013 · 11 comments
Labels
Milestone

Comments

@schwern
Copy link
Contributor

schwern commented Apr 23, 2013

Provide an easy way to turn on UTF8 filehandle encoding both as a Test::Builder method and an import option.

# Would add ":encoding(utf8)" to output, todo_output and failure_output.
use Test::More utf8 => 1;

Test::Builder->new->add_io_layers(@layers);

This makes life easier for people using UTF8 rather than having to mess with use open and do it at the right time.

@dagolden
Copy link

Now that we can target 5.8.1 as a minimum, can we just make UTF-8 the default for the Test::Builder handles now?

@schwern
Copy link
Contributor Author

schwern commented Sep 10, 2013

I think this is a good idea on first reflection.

Two concerns: 1) backwards compatibility and 2) the assumption that UTF-8 is the correct default.

To the first, folks shouldn't be doing things like shoving images down those pipes, they're probably shoving text into a raw binary pipe. I don't think switching on UTF-8 will be any worse than it is now. I'm a bit concerned about things like people who are currently using Latin-1 without changing the encoding... but I'm not really an encoding person so this is faffing.

To the second, UTF-8 is probably the best default we can come up with. It's gotta be better than a raw byte pipe.

@dagolden
Copy link

I think that if people are intentionally doing Latin1 or whatever then they've probably already set the handles how they need it. What the UTF-8 default does is allow perl's internal byte representation through without complaining about wide characters.

Put differently, currently, it "works" with wide character warnings (whether or not it displays right on the user's terminal), but making UTF-8 the default means the wide character warnings would go away.

@cpansprout
Copy link
Contributor

But it would also mangle the output for those of us who explicitly encode stuff before printing it out, to avoid the whole encoded handle issue.

@dagolden
Copy link

@cpansprout I'd wager that's an even smaller percentage of authors than those who manually set their handles to UTF-8. I suspect the overall impact would be less than hash randomization was.

But, hey, someone could smoke CPAN and see what regresses.

@Astara
Copy link

Astara commented Nov 11, 2013

I had to bump around in the dark to get utf8 to work and eventually ended up with a BAIL_OUT if utf8 had been included in PERL5OPT, (-Mutf8). There didn't seem to be anyway to fix the problems.
I either ended up with my UTF8 encoded string being treated as latin1 (so individual bytes of the UTF-8 strings were printed as latin1) OR I ended up with the string "upgraded" [sic], so each of my bytes in the utf-8 string were encoded again into utf8. Neither was great.

There seems to be no way of having -Mutf8 in your PERL5OPTS and making Test::More work with UTF8 progs...

Might have to test with a shell wrapper, before the test prog is called, or use some other 'harness'...

So just wanted to add a 'me-too', to this problem as it is very difficult to work around.

@dagolden
Copy link

I've since come across Term::Encoding and think maybe setting encoding to whatever is the terminal default might be very handy. Either Term::Encoding could be a prereq or it could just be inlined.

@Astara
Copy link

Astara commented Nov 11, 2013

On 11/10/2013 5:58 PM, David Golden wrote:

I've since come across Term::Encoding and think maybe setting encoding
to whatever is the terminal default might be very handy. Either
Term::Encoding could be a prereq or it could just be inlined.


Reply to this email directly or view it on GitHub
#374 (comment).

You can test the problem by downloading P-1.1.4.tar.gz from CPAN
(P.1.1.5.tar.gz has an attempted fix where I call a shell script to set ENV
before calling any test progs -- both to unset utf8 in the ENV, AND to
set both the PATH and PERL5LIB values that should (crossing fingers) work.

(hey, "it works for me"... snort... ;-))

But to see the utf8 error, run the make test part with PERL5OPT="-Mutf8"
in your ENV.

I think that's the trigger. (I have PERL5OPT="-Mutf8 -CSA" in mine,
BTW/perl5.16.2).

@exodist
Copy link
Member

exodist commented Oct 30, 2014

The latest alphas have encoding support baked in. The easy way to specify an encoding is:

use Test::Stream encoding => 'WHATEVER';
or:
use Test::Stream 'utf8';

toolchain has convinced me to add as few new features directly to Test::More as possible. In the alphas Test::Stream is the underlying heart of Test::More and Test::Builder based tools, so if you have a new enough version of Test-* Test::Stream is present and usable. It is also the interface for turning off some legacy things, and turning on fork support.

The encoding work is not easily backported to stable, so I am going to close this ticket as fixed by the alphas.

I did not read this thread in depth, if I missed something necessary please re-open and spell it out for me :-)

@exodist exodist closed this as completed Oct 30, 2014
@karenetheridge
Copy link
Member

@exodist the feature that would be good to have here was @dagolden's suggestion - #374 (comment) - default to the current Term::Encoding (it woudl have to be inlined, or we core this module).

@exodist exodist modified the milestone: Backlog Nov 22, 2014
@exodist
Copy link
Member

exodist commented May 3, 2016

The Test2 stuff makes it very easy to do this as a Test2::Plugin::TermEncoding or similar. No need to make this part of Test-Simple.

@exodist exodist closed this as completed May 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants