Encoding conversion #5

miloyip · 2014-06-06T07:16:33Z

From milo...@gmail.com on November 27, 2011 00:33:27

Currently, the input and output of Reader uses the same encoding.

It is often needed to read a stream of one encoding (e.g. UTF-8), and output string of another encoding (e.g. UTF-16). Or in the other way, stringify a DOM from one encoding (e.g. UTF-16) to an output stream of another encoding (e.g. UTF-8)

The most simple solution is converting the stream into a memory buffer of another encoding. This requires more memory storage and memory access.

Another solution is to convert the input stream into another encoding before sending it to the parser. However, only characters in JSON string type are really the ones necessary to be converted. Conversion of other characters just wastes time.

The third solution is letting the parser distinguish the input and output encoding. It uses an encoding converter to convert characters of JSON string type. However, since the output length may longer than the original length, in situ parsing cannot be permitted.

Try to design a mechanism to generalize encoding conversion. And it should support UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. It can also support automatic encoding detection with BOM, while incurring some overheads in dynamic dispatching.

Original issue: http://code.google.com/p/rapidjson/issues/detail?id=4

miloyip · 2014-06-06T07:16:35Z

From milo...@gmail.com on December 02, 2011 20:43:44

Reader/Writer can now perform transcoding with Transcoder.
New EncodedInputStream can decode characters from byte input stream
New EncodedOutputStream can encode characters to byte output stream
New AutoUTFInputStream can specify an UTF encoding in runtime, or detect UTF encoding from the beginning of stream (BOM and RFC4627 ). And then it can dynamically delicate operations to the actual UTF encoding.
New AutoUTFOutputStream can specify an UTF encoding in runtime, optionally writes BOM.
New AutoUTF can do operations according to UTF encoding type in the input/output stream.
All AutoXXX classes can handle UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE.

Status: Fixed

miloyip closed this as completed Jun 6, 2014

tunage mentioned this issue Nov 3, 2014

MemoryPoolAllocator<> >]: Assertion `IsObject()' failed. #191

Closed

RSpliet mentioned this issue Apr 14, 2015

Crash when passing nearly-empty document #301

Closed

xinthose mentioned this issue Aug 19, 2015

Seg Fault #411

Closed

lonelymemo mentioned this issue Sep 2, 2016

GetInt coredump #729

Open

shierei mentioned this issue Mar 31, 2017

Can IStreamWrapper work with istream? #918

Closed

This was referenced May 16, 2017

AddMember中申请内存出错 #956

Closed

json对象写入到StringBuffer出core #960

Closed

uscanner mentioned this issue Apr 26, 2018

segfault when calling HasMember() after upgrading to 5fd779d91f56fff1cd7dd2ef230010543c0c790a #1236

Open

StilesCrisis mentioned this issue May 11, 2018

Parsing "128.74836467836484838364836483643636483648e-336" causes a crash #1251

Closed

EnchantedJohn mentioned this issue May 15, 2018

ERROR: AddressSanitizer: heap-buffer-overflow in rapidjson::GenericStringStream<rapidjson::UTF8<char> >::Peek() const #1257

Closed

lichuan mentioned this issue Oct 29, 2018

rapidjson段错误问题 #1390

Open

dnj12345 mentioned this issue Apr 25, 2019

Crash when using custom writer #1500

Closed

amamidela mentioned this issue Sep 4, 2019

Segmentation fault when parse json object #1561

Open

MishraKhushbu mentioned this issue Sep 10, 2019

Always gets a core dump while creating json string (rapidjson) #1565

Open

hyhtemple mentioned this issue May 28, 2020

读取一个json文档后，在文档中追加对象，出现崩溃 #1725

Open

This was referenced Jan 13, 2021

Document对象调用 Accept崩溃 #1827

Closed

Document 调用Accept崩溃问题 #1828

Open

jin-long mentioned this issue Sep 22, 2021

IsNumber() is true, but GetInt() assert_fail #1938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding conversion #5

Encoding conversion #5

miloyip commented Jun 6, 2014

miloyip commented Jun 6, 2014

Encoding conversion #5

Encoding conversion #5

Comments

miloyip commented Jun 6, 2014

miloyip commented Jun 6, 2014