Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
dankogai committed Sep 15, 2015
1 parent fd47e38 commit 27682d0
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 4 deletions.
6 changes: 6 additions & 0 deletions Changes
Expand Up @@ -3,6 +3,12 @@
# $Id: Changes,v 2.76 2015/07/31 02:18:28 dankogai Exp dankogai $
#
$Revision: 2.76 $ $Date: 2015/07/31 02:18:28 $
! Unicode/Unicode.xs Unicode/Unicode.pm
Address RT#107043: If no BOM is found, the routine dies.
When you decode from UTF-(16|32) without -BE or LE without BOM,
Encode now assumes BE accordingly to RFC2781 and the Unicode
Standard version 8.0
https://rt.cpan.org/Public/Bug/Display.html?id=107043
! Makefile.PL encoding.t
Mend pull/42
! Encode.xs Makefile.PL encoding.pm encoding.t
Expand Down
8 changes: 7 additions & 1 deletion Unicode/Unicode.pm
Expand Up @@ -176,7 +176,13 @@ simply treated as a normal character (ZERO WIDTH NO-BREAK SPACE).
When BE or LE is omitted during decode(), it checks if BOM is at the
beginning of the string; if one is found, the endianness is set to
what the BOM says. If no BOM is found, the routine dies.
what the BOM says.
=item Default Byte Order
When no BOM is found, Encode 2.76 and blow croaked. Since Encode
2.77, it falls back to BE accordingly to RFC2781 and the Unicode
Standard version 8.0
=item *
Expand Down
16 changes: 13 additions & 3 deletions Unicode/Unicode.xs
Expand Up @@ -166,9 +166,19 @@ CODE:
endian = 'V';
}
else {
croak("%"SVf":Unrecognised BOM %"UVxf,
*hv_fetch((HV *)SvRV(obj),"Name",4,0),
bom);
/* No BOM found, use big-endian fallback as specified in
* RFC2781 and the Unicode Standard version 8.0:
*
* The UTF-16 encoding scheme may or may not begin with
* a BOM. However, when there is no BOM, and in the
* absence of a higher-level protocol, the byte order
* of the UTF-16 encoding scheme is big-endian.
*
* If the first two octets of the text is not 0xFE
* followed by 0xFF, and is not 0xFF followed by 0xFE,
* then the text SHOULD be interpreted as big-endian.
*/
s -= size;
}
}
#if 1
Expand Down

0 comments on commit 27682d0

Please sign in to comment.