You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the input and output of Reader uses the same encoding.
It is often needed to read a stream of one encoding (e.g. UTF-8), and output
string of another encoding (e.g. UTF-16). Or in the other way, stringify a DOM
from one encoding (e.g. UTF-16) to an output stream of another encoding (e.g.
UTF-8)
The most simple solution is converting the stream into a memory buffer of
another encoding. This requires more memory storage and memory access.
Another solution is to convert the input stream into another encoding before
sending it to the parser. However, only characters in JSON string type are
really the ones necessary to be converted. Conversion of other characters just
wastes time.
The third solution is letting the parser distinguish the input and output
encoding. It uses an encoding converter to convert characters of JSON string
type. However, since the output length may longer than the original length, in
situ parsing cannot be permitted.
Try to design a mechanism to generalize encoding conversion. And it should
support UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. It can also support
automatic encoding detection with BOM, while incurring some overheads in
dynamic dispatching.
Original issue reported on code.google.com by milo...@gmail.com on 26 Nov 2011 at 4:33
The text was updated successfully, but these errors were encountered:
Reader/Writer can now perform transcoding with Transcoder.
New EncodedInputStream can decode characters from byte input stream
New EncodedOutputStream can encode characters to byte output stream
New AutoUTFInputStream can specify an UTF encoding in runtime, or detect UTF
encoding from the beginning of stream (BOM and RFC4627). And then it can
dynamically delicate operations to the actual UTF encoding.
New AutoUTFOutputStream can specify an UTF encoding in runtime, optionally
writes BOM.
New AutoUTF can do operations according to UTF encoding type in the
input/output stream.
All AutoXXX classes can handle UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE.
Original comment by milo...@gmail.com on 3 Dec 2011 at 4:43
Original issue reported on code.google.com by
milo...@gmail.com
on 26 Nov 2011 at 4:33The text was updated successfully, but these errors were encountered: