Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode::MIME::Header clean up #68

Merged
merged 22 commits into from Oct 26, 2016
Merged

Encode::MIME::Header clean up #68

merged 22 commits into from Oct 26, 2016

Conversation

pali
Copy link
Contributor

@pali pali commented Oct 21, 2016

This patch series clean up and refactor Encode::MIME::Header module.

New features:

  • support for CHECK flags in Encode::MIME::Header
  • new function Encode::find_mime_encoding() which returns encode object specified by MIME name
  • extended test cases

Changes:

  • usage of strict UTF-8 encoding in Encode::MIME::Header
  • incorrect UTF-8 sequences are replaced by correct Unicode replacement char
  • no croak on invalid input string anymore when called with FB_DEFAULT
  • fix for https://rt.cpan.org/Public/Bug/Display.html?id=114034
  • rewritten Encode::MIME::Header POD documentation

Please recheck if you agree with changes in POD documentation. If something needs to be extended or fixed then let me know.

pali added 22 commits October 6, 2016 21:48
…block

Otherwise function prototypes will be totally ignored and our variables not
parsed correctly. This is also reason for why perl throw warning message:

Name "Encode::MIME::Header::STRICT_DECODE" used only once: possible typo
…E words

Broken email clients (like Thunderbird 38) sometimes do not encode spaces
and so they appear in MIME words. To make sure that Encode::MIME::Header
will be able to decode emails generated by Thunderbird allow spaces in non
strict mode of decoder.
Fix functions which calculate length of encoded MIME word, formula for
base64 was incorrect. And do not call bytes::length() function because
using it is wrong approach. Rather encode string to UTF-8 and after that
count bytes with standard length().
It acts same as find_encoding() but search only for object with mime name.
Address https://rt.cpan.org/Public/Bug/Display.html?id=114034
* Use Encode::find_mime_encoding() for retrieving encoding object needed for inner string
* Always use strict UTF-8 encoder and decoder for inner strings
* Decoding inner strings is not called with flag Encode::FB_PERLQQ anymore
* Propagate check flags from MIME encoder/decoder to inner string encoder/decoder

Previous behaviour with Encode::FB_PERLQQ can be achieved by passing
Encode::FB_PERLQQ check flag to decode().
Implemented is also modification of input string (when requested by flags).
Now implementation of encoder now should match behaviour described in
Encode module documentation.
…ort for check flags

Now decoder fully respect check flags and so it does not croak anymore when
it is called without any flag (or Encode::FB_DEFAULT flag) and when in
input string is invalid or incorrect charset. It means that implementation
of MIME decoder should match behaviour described in Encode module
documentation.

Invalid and unknown MIME words are not touched and they stay in decoded
output as were on input.

Previous behaviour (croak on unknown charset) can be achieved by passing
Encode::FB_CROAK check flag to decode().
Perl versions prior to 5.9.4 have recursive Regular expressions Engine
which cause stack overflow.
Basically I rewrote whole module so add me as author of it.
Add examples and describe what is module doing.
…ded-word as 76 bytes

Module MIME::Base64 has hardcoded its own limit to maximum 76 bytes per
string which cannot be changed. Longer strings are first split and then
each part encoded separately. Therefore for correct functionality of
Encode::MIME::Header it is not possible to have larger MIME encoded-words
as 76 bytes.
@dankogai
Copy link
Owner

Thank you!

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Nov 28, 2016
Upstream changes:
$Revision: 2.87 $ $Date: 2016/10/28 05:03:52 $
! Encode.xs t/taint.t
  Pulled: Disable _utf8_on and _utf8_off for tainted values
  dankogai/p5-encode#74
! Encode.xs MANIFEST t/rt65541.t t/rt76824.t t/rt86327.t
  Pulled: Fix crash 'panic: sv_setpvn called with negative strlen'
  dankogai/p5-encode#73
! Encode.xs MANIFEST t/rt113164.t
  Pulled: Fix crash caused by undefined behaviour between
  two sequence points
  dankogai/p5-encode#72
! Encode.xs  MANIFEST lib/Encode/CN/HZ.pm lib/Encode/Encoder.pm
  t/decode.t t/magic.t t/rt85489.t t/utf8ref.t
  Pulled: Fix handling of undef, ref, typeglob, UTF8, COW and magic
  scalar argument in all XS functions
  dankogai/p5-encode#70
! Encode/_T.e2x t/at-cn.t t/at-tw.t t/enc_data.t t/enc_module.t
  t/encoding-locale.t t/encoding.t t/jperl.t t/mime-name.t t/undef.t
  Pulled: Fix unit tests
  dankogai/p5-encode#69
! Encode.pm lib/Encode/MIME/Header.pm lib/Encode/MIME/Name.pm
  t/mime-header.t t/mime-name.t t/taint.t
  Pulled: Encode::MIME::Header clean up
  dankogai/p5-encode#68
! Encode.xs
  Pulled: Generate CHECK value functions with newCONSTSUB()
    instead with direct XS
  dankogai/p5-encode#67
! Encode.xs
  Pulled: Encode::utf8: Fix count of replacement characters
  for overflowed and overlong UTF-8 sequences
  dankogai/p5-encode#65
! Encode.xs t/fallback.t t/utf8strict.t
  Pulled: Encode::utf8: Fix processing invalid UTF-8 subsequences
  dankogai/p5-encode#63
! Encode.pm t/utf8ref.t
  Pulled: Fix return value of Encode::encode_utf8(undef)
  https://rt.cpan.org/Ticket/Display.html?id=116904
  dankogai/p5-encode#62
@amit777
Copy link

amit777 commented Feb 15, 2017

I found a difference in the way perl 5.20.3 and 5.24.1 are encoding a test string that's being used in an email From: header. It's not clear why this is breaking some downstream processing of email (which may be a separate bug), however I'm curious whether both outputs are technically correct:

#this code outputs 2 different things on 5.20.3 and 5.24.1
my $perl_str = '¥Test User <testuser@example.com>';
my $encoded_str = encode("MIME-Header", $perl_str);
print $encoded_str;

# 5.24.1 output:=?UTF-8?B?w4LCpVRlc3QgVXNlciA8dGVzdHVzZXJAZXhhbXBsZS5jb20+?=
# 5.20.3 output:=?UTF-8?B?w4LCpVRlc3QgVXNlciA=?=<testuser@example.com>

@amit777
Copy link

amit777 commented Feb 15, 2017

I was reading through the RFC http://www.faqs.org/rfcs/rfc2047.html and it seems like the definition of "encoded-word" may not be adhered to in this rewrite of the module. Is this the correct place to post comments on this or should I file a github issue? I don't interact with perl much so I'm not sure what the best way to flag an issue is.

@pali
Copy link
Contributor Author

pali commented Feb 15, 2017

@amit777 Output =?UTF-8?B?w4LCpVRlc3QgVXNlciA=?=<testuser@example.com> is incorrect because as per RFC2047 there must be space between ...A=?= and <testu...>. Output from perl 5.24.1 is technically correct per RFC2047, but not suitable for From header. As From header is structured and has special grammar, generic module like this MIME-Header cannot be used for it. If you look into updated documentation for Email::MIME::Header module you should see that this module is for unstructured email headers or RFC822 'text' token. Note that it is not possible to write "generic" module which will work for any structured email header as module itself does not know according to which structure should be whole email header encoded.

So if you want to MIME encode From header (or To/CC/Bcc/) then you first need to split it into RFC822 'text' tokens. Then MIMe encode each text token which is per RFC2047 allowed to encode and after that combine output to one string.

use utf8;
my $name = '¥Test User';
my $address = 'testuser@example.com';
my $encoded = encode('MIME-Header, $name) . " <$address>";

Token representing address is not possible to MIME-encode, so e.g. some check that it contains ASCII characters only should be used... Or better that validate email address, but be careful! Grammar for email address is strange, see RFC2822 for it.

Btw, if you thinking that there should be module which encode From header correctly, then look at my patches for Email::MIME rjbs/Email-MIME#35 and my Email::Address::XS module (https://github.com/pali/Email-Address-XS). Note that Email::Address is broken.

I hope this will help you to understand whole problem about From/To/Cc/... headers and how Email-MIME was terribly broken prior Encode 2.83 (https://metacpan.org/pod/Encode::MIME::Header#BUGS). Emails generated by old perl versions were just broken and were not parsable by compliant RFC2047 parsers. This is basically not acceptable and rather breaking compatibility which fix these problems as stay with nonsense and broken encoder.

@amit777
Copy link

amit777 commented Feb 15, 2017 via email

@pali
Copy link
Contributor Author

pali commented Feb 15, 2017

If you need to construct strings for From/To/Cc/... headers look at my Email::Address::XS module (https://github.com/pali/Email-Address-XS). It is not on cpan yet, but I would like to header some feedback about it (if is really useful for users!). It provides everything needed and should be fast and RFC2822 correct. MIME-encoding of /phrase/ needs to be done manually, but you can pass it via encode: my $address = Email::Address::XS->new(phrase => encode('MIME-Header', $phrase), address => $address); my $value = $address->format();

@amit777
Copy link

amit777 commented Feb 15, 2017 via email

@amit777
Copy link

amit777 commented Feb 15, 2017

I was able to install your module in my perlbrew environment.. will test and report back if it works for me or not. THank you!

@amit777
Copy link

amit777 commented Feb 16, 2017

It looks like Email::Address::XS works well for me. I don't have extensive test cases around it, but it seems to work as a dropin replacement for Email::Address. Would love to see this on cpan! thanks again.

@pali
Copy link
Contributor Author

pali commented Feb 18, 2017

I did some last fixes to Email::Address::XS and now it is on cpan: https://metacpan.org/pod/Email::Address::XS

halstead pushed a commit to openembedded/meta-openembedded that referenced this pull request Feb 11, 2018
* Fix RDEPENDS
* RCONFLICTS with perl-misc
* LIC_FILES_CHKSUM is based on META.json, which has changed
  but license remains the same

Changes:

2.94 2018/01/09 05:53:00
! lib/Encode/Alias.pm
  Fixed: deep recursion in Encode::find_encoding when decoding
  bad MIME header
  dankogai/p5-encode#127
! Encode.pm
  Pulled: Include more information about Encode::is_utf8() that it
  should not be normally used
  dankogai/p5-encode#126
  Pulled: Remove misleading documentation about UTF8 flag
  dankogai/p5-encode#125

2.93 2017/10/06 22:21:53
! lib/Encode/MIME/Name.pm t/mime-name.t
  Pulled: Add "euc-cn" => "EUC-CN" alias to Encode::MIME::Name
  dankogai/p5-encode#124
! encoding.pm
  Pulled: Propagate fatal errors from the encoding pragma back to the caller
  Resolves rt #100427
  dankogai/p5-encode#123
  https://rt.cpan.org/Ticket/Display.html?id=100427
! lib/Encode/CN/HZ.pm lib/Encode/JP/JIS7.pm lib/Encode/MIME/Header.pm
  t/decode.t
  Pulled: Uninitialized value fixes #122
  dankogai/p5-encode#122
! Makefile.PL
  Pulled: Fix -Werror=declaration-after-statement for gcc 4.1.2
  dankogai/p5-encode#121

2.92 2017/07/18 07:15:29
! Encode.pm  MANIFEST lib/Encode/Alias.pm
+ t/use-Encode-Alias.t
  Pulled: Fix loading Encode::Alias before Encode
  dankogai/p5-encode#118
! Makefile.PL
  Pulled: Fix gccversion Argument "630 20170516" isn't numeric
   dankogai/p5-encode#118
! lib/Encode/MIME/Header.pm t/mime-header.t
  Pulled: Encode::MIME::Header: Fix parsing quoted-printable text
    in strict mode
  dankogai/p5-encode#115
! Encode.pm
  use define_encoding() instead of tweaking $Encode::Encoding{utf8}.
  dankogai/p5-encode@208d094#commitcomment-22698036

2.91 2017/06/22 08:11:05
! Encode.pm
  Addressed: RT#122167: use parent q{Encode::Encoding}; fails:
    Can't locate object
  https://rt.cpan.org/Ticket/Display.html?id=122167
! Makefile.PL
  Pulled: fix gcc warnings for older gcc < 4.0
  dankogai/p5-encode#114

2.90 2017/06/10 17:23:50
! Makefile.PL
  Pulled: Include all contributors into META
  dankogai/p5-encode#111
! bin/enc2xs bin/ucmlint encoding.pm
  lib/Encode/Encoding.pm lib/Encode/GSM0338.pm t/CJKT.t
  Pulled: Where possible do not depend on value of $@,
    instead use return value of eval
  dankogai/p5-encode#110
! Encode.xs
  Pulled: Fix more XS problems in Encode.xs file
  dankogai/p5-encode#109
! encoding.pm lib/Encode/Encoding.pm t/guess.t
  Pulled: Small fixes
  dankogai/p5-encode#108
! Encode.pm Makefile.PL
  Pulled: Load modules Encode::MIME::Name and Storable normally
  dankogai/p5-encode#107
! Unicode/Unicode.pm lib/Encode/Alias.pm lib/Encode/Encoding.pm
  lib/Encode/Unicode/UTF7.pm
  Pulled: Remove no warnings 'redefine'; and correctly loaddependences
  dankogai/p5-encode#106
! Encode.pm Encode.xs Unicode/Unicode.pm Unicode/Unicode.xs
  Pulled: Remove PP stubs and reformat predefine_encodings()
  dankogai/p5-encode#104
! Encode.pm Encode.xs
  Pulled: Run Encode XS BOOT code at compile time
  dankogai/p5-encode#103
! Encode.pm Unicode/Unicode.pm lib/Encode/Encoding.pm
  lib/Encode/Guess.pm lib/Encode/JP/JIS7.pm lib/Encode/MIME/Header.pm
  lib/Encode/MIME/Header/ISO_2022_JP.pm
  Pulled: Use Encode::define_encoding and propagate carp/croak message
  dankogai/p5-encode#102
! t/truncated_utf8.t t/utf8messages.t
  Pulled: Fixes for older perl versions
  dankogai/p5-encode#101
! Encode.xs encoding.pm t/enc_eucjp.t t/enc_utf8.t
  Pulled: cperl fixes: encoding undeprecated, no strict hashpairs
  dankogai/p5-encode#100
! MANIFEST
  Pulled: Add missing tests into MANIFEST file
  dankogai/p5-encode#99
! Encode.xs t/fallback.t
  Pulled: Cleanup code for handling fallback/replacement characters
  dankogai/p5-encode#98

2.89 2017/04/21 05:20:14
! Encode.pm Encode.xs MANIFEST t/enc_eucjp.t t/enc_utf8.t
+ t/utf8messages.t
  Pulled: Fixes for Encode::utf8
  dankogai/p5-encode#97
! Encode.pm
  Pulled: Fix documentation about CHECK coderef
  dankogai/p5-encode#96
! Encode.xs
  Pulled: For efficiency use newSVpvn() instead of newSVpv()
    in do_fallback_cb()
  dankogai/p5-encode#95
! Encode.xs
  Pulled Call Encode callback function with integer argument correctly
  dankogai/p5-encode#94
! lib/Encode/CN/HZ.pm lib/Encode/GSM0338.pm lib/Encode/JP/JIS7.pm
  lib/Encode/KR/2022_KR.pm lib/Encode/MIME/Header.pm
  lib/Encode/MIME/Header/ISO_2022_JP.pm lib/Encode/Unicode/UTF7.pm
  t/undef.t
  Pulled: Fix all Encode modules so their encode(undef) and decode(undef)
    calls returns undef
  dankogai/p5-encode#93
+ t/whatwg-aliases.json t/whatwg-aliases.t
  Pulled: New (failing) tests for aliases defined in WHATWG Encoding spec #92
  dankogai/p5-encode#92
! Encode.pm
  Pulled: Update documentation for UTF-8
  dankogai/p5-encode#91
! Encode.xs t/truncated_utf8.t
  Pulled: Consume correct number of bytes on malformed
! Encode.pm Unicode/Unicode.pm
  Pulled: document str2bytes and bytes2str
  dankogai/p5-encode#86
! Encode.xs t/fallback.t t/truncated_utf8.t
  Pulled: Fix appending correct number of Unicode replacement characters
  dankogai/p5-encode#84

2.88 2016/11/29 23:29:23
! t/taint.t
  Pulled: Fix test t/taint.t to pass when Encode::ConfigLocal is present
  dankogai/p5-encode#83
! Makefile.PL Unicode/Makefile.PL bin/enc2xs lib/Encode/Alias.pm
  t/Aliases.t t/enc_data.t t/enc_module.t t/encoding.t t/jperl.t
  Pulled: various fixes
  dankogai/p5-encode#82
! t/mime-header.t
  Pulled: Fix test t/mime-header.t to pass on HP-UX 11.23/64 U
    with perl v5.8.3
  dankogai/p5-encode#81
! t/Encode.t
  Pulled: Extend COW tests for UTF-8 and Latin1
  dankogai/p5-encode#80
! Encode.xs Unicode/Unicode.xs
  Pulled: Rmv impediment to compiling under C++11
  dankogai/p5-encode#78
! Encode.xs Unicode/Unicode.xs
  Pulled: Do not use expressions in macros SvTRUE, SvPV, SvIV,
    attr and attr_true
  dankogai/p5-encode#77
! Unicode/Unicode.xs t/magic.t
  Pulled: Fix handling of undef, COW and magic scalar argument
    in Unicode.xs
  dankogai/p5-encode#76
! Encode.xs encoding.pm
  Fix 2 of 3 problems Steve Hay found.
  1. C89 compiler failures (patch attached).
  2. encoding.pm has changed slightly but has no $VERSION++
  Message-Id: <CADED=K6ve_DAzRXPX=EsjtUDnZppAaw+BP1Ziw_fU5f32k+Wyg@mail.gmail.com>

2.87 2016/10/28 05:03:52
! Encode.xs t/taint.t
  Pulled: Disable _utf8_on and _utf8_off for tainted values
  dankogai/p5-encode#74
! Encode.xs MANIFEST t/rt65541.t t/rt76824.t t/rt86327.t
  Pulled: Fix crash 'panic: sv_setpvn called with negative strlen'
  dankogai/p5-encode#73
! Encode.xs MANIFEST t/rt113164.t
  Pulled: Fix crash caused by undefined behaviour between
  two sequence points
  dankogai/p5-encode#72
! Encode.xs  MANIFEST lib/Encode/CN/HZ.pm lib/Encode/Encoder.pm
  t/decode.t t/magic.t t/rt85489.t t/utf8ref.t
  Pulled: Fix handling of undef, ref, typeglob, UTF8, COW and magic
  scalar argument in all XS functions
  dankogai/p5-encode#70
! Encode/_T.e2x t/at-cn.t t/at-tw.t t/enc_data.t t/enc_module.t
  t/encoding-locale.t t/encoding.t t/jperl.t t/mime-name.t t/undef.t
  Pulled: Fix unit tests
  dankogai/p5-encode#69
! Encode.pm lib/Encode/MIME/Header.pm lib/Encode/MIME/Name.pm
  t/mime-header.t t/mime-name.t t/taint.t
  Pulled: Encode::MIME::Header clean up
  dankogai/p5-encode#68
! Encode.xs
  Pulled: Generate CHECK value functions with newCONSTSUB()
    instead with direct XS
  dankogai/p5-encode#67
! Encode.xs
  Pulled: Encode::utf8: Fix count of replacement characters
  for overflowed and overlong UTF-8 sequences
  dankogai/p5-encode#65
! Encode.xs t/fallback.t t/utf8strict.t
  Pulled: Encode::utf8: Fix processing invalid UTF-8 subsequences
  dankogai/p5-encode#63
! Encode.pm t/utf8ref.t
  Pulled: Fix return value of Encode::encode_utf8(undef)
  https://rt.cpan.org/Ticket/Display.html?id=116904
  dankogai/p5-encode#62

2.86 2016/08/10 18:08:45
! encoding.pm t/enc_data.t t/enc_eucjp.t t/enc_module.t t/enc_utf8.t
  t/encoding.t t/jperl.t
  Fixed: #116196: [PATCH] Synchronize encoding.pm with blead
  https://rt.cpan.org/Ticket/Display.html?id=116196
! Byte/Makefile.PL
  Patched: #111421: Won't build with statically built perls
  https://rt.cpan.org/Public/Bug/Display.html?id=111421
! Encode.xs encoding.pm
  Pulled: Fixes for 5.8.x compilation failures
  dankogai/p5-encode#60
! Encode.xs
  Patched: RT#116817 [PATCH] Avoid a C++ comment
  https://rt.cpan.org/Ticket/Display.html?id=116817

2.85 2016/08/04 03:15:58
! Encode.pm bin/enc2xs bin/encguess bin/piconv bin/ucmlint bin/unidump
  Pulled: CVE-2016-1238: avoid loading optional modules from .
  dankogai/p5-encode#58
! Encode.pm t/utf8warnings.t
  Pulled: Rethrow 'utf8' warnings in from_to as well #57
  dankogai/p5-encode#57
! Encode.xs
  Pulled and fixed:
    Encode::utf8: Performance optimization for strict UTF-8 encoder #56
  dankogai/p5-encode#56
! t/Encode.t
  s/use Test/use Test::More/
! t/Encode.t t/decode.t
  Skip tests that pass typeglobs to decode if perl < v5.16
! Encode.xs t/cow.t
  Patched: #115540 (from_to affecting COW strings)
  https://rt.cpan.org/Ticket/Display.html?id=115540
! Encode.xs t/Encode.t t/decode.t
  Merged: RT#115168:
    [PATCH] Passing regex globals to decode() results in wrong result
  https://rt.cpan.org/Ticket/Display.html?id=115168
! Makefile.pl
  Pulled: t/encoding-locale.t fails with Test::More@0.80 or before.
  dankogai/p5-encode#55
! Encode.pm
  Pulled: In-place modifications made explicit in docs for encode(),
  decode() and decode_utf8()
  dankogai/p5-encode#54

2.84 2016/04/11 07:17:02
! lib/Encode/MIME/Header.pm
  Pulled: Encode::MIME::Header:
    Update description that this module is only for unstructured header
  dankogai/p5-encode#53
! lib/Encode/MIME/Header.pm t/mime-header.t
  Pulled: Encode::MIME::Header: Fix valid_q_chars, '-' needs to be escaped
  dankogai/p5-encode#52

Signed-off-by: Tim Orling <timothy.t.orling@linux.intel.com>
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants