When I run the following code
import json_stream
em_dash = '—' #chr(8212)
payload = ('{"test": "' + em_dash*10 + '"}').encode('utf-8')
chunk_size = 10
itr = (payload[i:i+chunk_size] for i in range(0, len(payload), chunk_size))
data = json_stream.load(itr)
test = data['test']
print(test)
I get this error message
OSError: I/O error while parsing (index 15): Custom { kind: Other, error: "incomplete utf-8 byte sequence from index 0" }
The problem seems to be with the rust tokenizer. If I do json_stream.load(itr, tokenizer=json_stream.tokenizer.tokenize) it works.