New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`mbtiny generate` writes files with double encoding to UTF-8 #36

Closed
zakame opened this Issue Oct 9, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@zakame

zakame commented Oct 9, 2018

While doing gugod/Hijk#26 for the CPAN PRC, the META.* files generated by mbtiny generate write bad output for Ævar's name, for example in META.yml:

author:
  - 'Kang-min Liu <gugod@gugod.org>'
  - "�\x86var Arnfj�¶r�° Bjarmason <avar@cpan.org>"
  - 'Borislav Nikolov <jack@sofialondonmoskva.com>'
  - 'Damian Gryski <damian@gryski.com>'

I've confirmed that both the source 'Hijk.pm' is in Unicode:

$ file lib/Hijk.pm 
lib/Hijk.pm: Perl5 module source, UTF-8 Unicode text

and it has =encoding utf8 in its pod.

Full context in gugod/Hijk#26 (review)

@zakame

This comment has been minimized.

Show comment
Hide comment
@zakame

zakame Oct 9, 2018

Isolated this down to a double-encode in $dist->get_file, so it is not completely without setting encoding as initially claimed:

my $dist = App::ModuleBuildTiny::Dist->new(regenerate => \%files);
for my $filename ($dist->files) {
write_text($filename, $dist->get_file($filename)) if $dist->is_generated($filename);
}

sub get_file {
my ($self, $filename) = @_;
return if not exists $self->{files}{$filename};
my $raw = $self->{files}{$filename};
return $raw ? encode_utf8($raw) : read_binary($filename);
}

  DB<22> x $filename
0  'META.json'
  DB<23> x $dist->{files}{$filename}
0  "{\cJ   \"abstract\" : \"Fast & minimal low-level HTTP client\",\cJ   \"author\" : [\cJ      \"Kang-min Liu <gugod\@gugod.org>\",\cJ      \"Ævar Arnfjörð Bjarmason <avar\@cpan.org>\",\cJ      \"Borislav Nikolov <jack\@sofialondonmoskva.com>\",\cJ      \"Damian Gryski <damian\@gryski.com>\"\cJ   ],\cJ   \"dynamic_config\" : 0,\cJ   \"generated_by\" : \"App::ModuleBuildTiny version 0.023\",\cJ   \"license\" : [\cJ      \"mit\"\cJ   ],\cJ   \"meta-spec\" : {\cJ      \"url\" : \"http://search.cpan.org/perldoc?CPAN::Meta::Spec\",\cJ      \"version\" : 2\cJ   },\cJ   \"name\" : \"Hijk\",\cJ   \"prereqs\" : {\cJ      \"configure\" : {\cJ         \"requires\" : {\cJ            \"Module::Build::Tiny\" : \"0\"\cJ         }\cJ      },\cJ      \"develop\" : {\cJ         \"requires\" : {\cJ            \"App::ModuleBuildTiny\" : \"0.023\"\cJ         }\cJ      },\cJ      \"runtime\" : {\cJ         \"requires\" : {\cJ            \"Time::HiRes\" : \"0\"\cJ         }\cJ      },\cJ      \"test\" : {\cJ         \"requires\" : {\cJ            \"HTTP::Server::Simple::PSGI\" : \"0\",\cJ            \"Net::Ping\" : \"2.41\",\cJ            \"Plack\" : \"0\",\cJ            \"Test::Exception\" : \"0\",\cJ            \"Test::More\" : \"0\"\cJ         }\cJ      }\cJ   },\cJ   \"provides\" : {\cJ      \"Hijk\" : {\cJ         \"file\" : \"lib/Hijk.pm\",\cJ         \"version\" : \"0.27\"\cJ      }\cJ   },\cJ   \"release_status\" : \"stable\",\cJ   \"resources\" : {\cJ      \"repository\" : {\cJ         \"type\" : \"git\",\cJ         \"url\" : \"https://github.com/gugod/Hijk.git\",\cJ         \"web\" : \"https://github.com/gugod/Hijk\"\cJ      }\cJ   },\cJ   \"version\" : \"0.27\",\cJ   \"x_serialization_backend\" : \"JSON::PP version 2.97001\",\cJ   \"x_static_install\" : \"1\"\cJ}\cJ"
  DB<24> x $dist->get_file($filename)
0  "{\cJ   \"abstract\" : \"Fast & minimal low-level HTTP client\",\cJ   \"author\" : [\cJ      \"Kang-min Liu <gugod\@gugod.org>\",\cJ      \"�var Arnfjörð Bjarmason <avar\@cpan.org>\",\cJ      \"Borislav Nikolov <jack\@sofialondonmoskva.com>\",\cJ      \"Damian Gryski <damian\@gryski.com>\"\cJ   ],\cJ   \"dynamic_config\" : 0,\cJ   \"generated_by\" : \"App::ModuleBuildTiny version 0.023\",\cJ   \"license\" : [\cJ      \"mit\"\cJ   ],\cJ   \"meta-spec\" : {\cJ      \"url\" : \"http://search.cpan.org/perldoc?CPAN::Meta::Spec\",\cJ      \"version\" : 2\cJ   },\cJ   \"name\" : \"Hijk\",\cJ   \"prereqs\" : {\cJ      \"configure\" : {\cJ         \"requires\" : {\cJ            \"Module::Build::Tiny\" : \"0\"\cJ         }\cJ      },\cJ      \"develop\" : {\cJ         \"requires\" : {\cJ            \"App::ModuleBuildTiny\" : \"0.023\"\cJ         }\cJ      },\cJ      \"runtime\" : {\cJ         \"requires\" : {\cJ            \"Time::HiRes\" : \"0\"\cJ         }\cJ      },\cJ      \"test\" : {\cJ         \"requires\" : {\cJ            \"HTTP::Server::Simple::PSGI\" : \"0\",\cJ            \"Net::Ping\" : \"2.41\",\cJ            \"Plack\" : \"0\",\cJ            \"Test::Exception\" : \"0\",\cJ            \"Test::More\" : \"0\"\cJ         }\cJ      }\cJ   },\cJ   \"provides\" : {\cJ      \"Hijk\" : {\cJ         \"file\" : \"lib/Hijk.pm\",\cJ         \"version\" : \"0.27\"\cJ      }\cJ   },\cJ   \"release_status\" : \"stable\",\cJ   \"resources\" : {\cJ      \"repository\" : {\cJ         \"type\" : \"git\",\cJ         \"url\" : \"https://github.com/gugod/Hijk.git\",\cJ         \"web\" : \"https://github.com/gugod/Hijk\"\cJ      }\cJ   },\cJ   \"version\" : \"0.27\",\cJ   \"x_serialization_backend\" : \"JSON::PP version 2.97001\",\cJ   \"x_static_install\" : \"1\"\cJ}\cJ"
  DB<25> 

It would seem that correct way would be something along the lines of

if ($dist->is_generated($filename)) {
	open my $fh, '> :raw', $filename or die "Could not generate $filename: $!";
	print $fh Encode::encode 'UTF-8', $dist->{files}{$filename};
}

zakame commented Oct 9, 2018

Isolated this down to a double-encode in $dist->get_file, so it is not completely without setting encoding as initially claimed:

my $dist = App::ModuleBuildTiny::Dist->new(regenerate => \%files);
for my $filename ($dist->files) {
write_text($filename, $dist->get_file($filename)) if $dist->is_generated($filename);
}

sub get_file {
my ($self, $filename) = @_;
return if not exists $self->{files}{$filename};
my $raw = $self->{files}{$filename};
return $raw ? encode_utf8($raw) : read_binary($filename);
}

  DB<22> x $filename
0  'META.json'
  DB<23> x $dist->{files}{$filename}
0  "{\cJ   \"abstract\" : \"Fast & minimal low-level HTTP client\",\cJ   \"author\" : [\cJ      \"Kang-min Liu <gugod\@gugod.org>\",\cJ      \"Ævar Arnfjörð Bjarmason <avar\@cpan.org>\",\cJ      \"Borislav Nikolov <jack\@sofialondonmoskva.com>\",\cJ      \"Damian Gryski <damian\@gryski.com>\"\cJ   ],\cJ   \"dynamic_config\" : 0,\cJ   \"generated_by\" : \"App::ModuleBuildTiny version 0.023\",\cJ   \"license\" : [\cJ      \"mit\"\cJ   ],\cJ   \"meta-spec\" : {\cJ      \"url\" : \"http://search.cpan.org/perldoc?CPAN::Meta::Spec\",\cJ      \"version\" : 2\cJ   },\cJ   \"name\" : \"Hijk\",\cJ   \"prereqs\" : {\cJ      \"configure\" : {\cJ         \"requires\" : {\cJ            \"Module::Build::Tiny\" : \"0\"\cJ         }\cJ      },\cJ      \"develop\" : {\cJ         \"requires\" : {\cJ            \"App::ModuleBuildTiny\" : \"0.023\"\cJ         }\cJ      },\cJ      \"runtime\" : {\cJ         \"requires\" : {\cJ            \"Time::HiRes\" : \"0\"\cJ         }\cJ      },\cJ      \"test\" : {\cJ         \"requires\" : {\cJ            \"HTTP::Server::Simple::PSGI\" : \"0\",\cJ            \"Net::Ping\" : \"2.41\",\cJ            \"Plack\" : \"0\",\cJ            \"Test::Exception\" : \"0\",\cJ            \"Test::More\" : \"0\"\cJ         }\cJ      }\cJ   },\cJ   \"provides\" : {\cJ      \"Hijk\" : {\cJ         \"file\" : \"lib/Hijk.pm\",\cJ         \"version\" : \"0.27\"\cJ      }\cJ   },\cJ   \"release_status\" : \"stable\",\cJ   \"resources\" : {\cJ      \"repository\" : {\cJ         \"type\" : \"git\",\cJ         \"url\" : \"https://github.com/gugod/Hijk.git\",\cJ         \"web\" : \"https://github.com/gugod/Hijk\"\cJ      }\cJ   },\cJ   \"version\" : \"0.27\",\cJ   \"x_serialization_backend\" : \"JSON::PP version 2.97001\",\cJ   \"x_static_install\" : \"1\"\cJ}\cJ"
  DB<24> x $dist->get_file($filename)
0  "{\cJ   \"abstract\" : \"Fast & minimal low-level HTTP client\",\cJ   \"author\" : [\cJ      \"Kang-min Liu <gugod\@gugod.org>\",\cJ      \"�var Arnfjörð Bjarmason <avar\@cpan.org>\",\cJ      \"Borislav Nikolov <jack\@sofialondonmoskva.com>\",\cJ      \"Damian Gryski <damian\@gryski.com>\"\cJ   ],\cJ   \"dynamic_config\" : 0,\cJ   \"generated_by\" : \"App::ModuleBuildTiny version 0.023\",\cJ   \"license\" : [\cJ      \"mit\"\cJ   ],\cJ   \"meta-spec\" : {\cJ      \"url\" : \"http://search.cpan.org/perldoc?CPAN::Meta::Spec\",\cJ      \"version\" : 2\cJ   },\cJ   \"name\" : \"Hijk\",\cJ   \"prereqs\" : {\cJ      \"configure\" : {\cJ         \"requires\" : {\cJ            \"Module::Build::Tiny\" : \"0\"\cJ         }\cJ      },\cJ      \"develop\" : {\cJ         \"requires\" : {\cJ            \"App::ModuleBuildTiny\" : \"0.023\"\cJ         }\cJ      },\cJ      \"runtime\" : {\cJ         \"requires\" : {\cJ            \"Time::HiRes\" : \"0\"\cJ         }\cJ      },\cJ      \"test\" : {\cJ         \"requires\" : {\cJ            \"HTTP::Server::Simple::PSGI\" : \"0\",\cJ            \"Net::Ping\" : \"2.41\",\cJ            \"Plack\" : \"0\",\cJ            \"Test::Exception\" : \"0\",\cJ            \"Test::More\" : \"0\"\cJ         }\cJ      }\cJ   },\cJ   \"provides\" : {\cJ      \"Hijk\" : {\cJ         \"file\" : \"lib/Hijk.pm\",\cJ         \"version\" : \"0.27\"\cJ      }\cJ   },\cJ   \"release_status\" : \"stable\",\cJ   \"resources\" : {\cJ      \"repository\" : {\cJ         \"type\" : \"git\",\cJ         \"url\" : \"https://github.com/gugod/Hijk.git\",\cJ         \"web\" : \"https://github.com/gugod/Hijk\"\cJ      }\cJ   },\cJ   \"version\" : \"0.27\",\cJ   \"x_serialization_backend\" : \"JSON::PP version 2.97001\",\cJ   \"x_static_install\" : \"1\"\cJ}\cJ"
  DB<25> 

It would seem that correct way would be something along the lines of

if ($dist->is_generated($filename)) {
	open my $fh, '> :raw', $filename or die "Could not generate $filename: $!";
	print $fh Encode::encode 'UTF-8', $dist->{files}{$filename};
}

@zakame zakame changed the title from `mbtiny generate` writes files without setting encoding to `mbtiny generate` writes files with double encoding to UTF-8 Oct 9, 2018

@Leont

This comment has been minimized.

Show comment
Hide comment
@Leont

Leont Oct 12, 2018

Owner

This is fixed in 0.024

Owner

Leont commented Oct 12, 2018

This is fixed in 0.024

@Leont Leont closed this Oct 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment