Browse files

Improved Unicode docs, per Yuval Kogman's advice.

  • Loading branch information...
1 parent 8e5fe6d commit e705288abb227414edd698f403d4ae384c20fe5c @chromatic committed Jul 2, 2010
Showing with 39 additions and 2 deletions.
  1. +3 −0 CREDITS
  2. +36 −2 sections/values.pod
@@ -89,3 +89,6 @@ E:
N: Yary Hluchan
+N: Yuval Kogman
@@ -185,12 +185,46 @@ ASCII strings.
For far more detail what Unicode is, encodings, and how to manage incoming and
outgoing data in a Unicode world, see C<perldoc perluniintro>.
+=head4 Unicode in Your Data
+X<Encode; decode()>
+X<Encode; encode()>
+Perl 5's internal encoding shields you from most of the details of Unicode and
+character sets and representations, but you have to meet it part way. In
+particular, the sooner you can ask Perl to convert from data that you have to
+its own internal format, the fewer odd conversions and mismatches you will
+suffer. The core module C<Encode> provides a function named C<decode()> to
+convert a scalar containing data in a known format to the internal encoding.
+For example, if you have UTF-8 data:
+=begin programlisting
+ my $string = decode('utf8', $data);
+=end programlisting
+The corresponding C<encode()> function converts from Perl's internal encoding
+to the desired output encoding:
+=begin programlisting
+ my $latin1 = encode('iso-8859-1', $string);
+=end programlisting
+If you always convert at the inputs and outputs of your program, you will avoid
+many problems.
=for author
Probably need C<binmode> here.
=end for
+=head4 Unicode in Your Programs
X<pragmas; utf8>
@@ -205,9 +239,9 @@ Unicode characters in strings as well in identifiers:
use utf8;
- sub £_to_¥ { ... }
+ sub E<pound>_to_E<yen> { ... }
- my $pounds = £_to_¥('1000£');
+ my $pounds = E<pound>_to_E<yen>('1000E<pound>');
=end programlisting

0 comments on commit e705288

Please sign in to comment.