You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wes McKinney / @wesm:
The case where a record batch "does not reference a dictionary" is pretty esoteric. A dictionary-encoded field would have to have all null values. Let's discuss further on the mailing list
Ji Liu / @tianchen92:
Sure, we did some thing for reading interleaved data and not support writing like that format via ARROW-6040. I started a ML discuss for this.
Per discussions in the following threads, as spec(http://arrow.apache.org/docs/format/IPC.html#streaming-format) described, as long as a record batch doesn't reference a dictionary they can be interleaved.
#4960
#5146
Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040, But it’s impossible to write data in this format.
cases below should be supported:
i. have a record batch of one dictionary encoded column S
Schema
RecordBatch: S=[null, null, null, null]
DictionaryBatch: ['abc', 'efg']
Recordbatch: S=[0, 1, 0, 1]
ii. have a record batch of two dictionary encoded column S1, S2
Schema
DictionaryBatch S1: ['ab', 'cd']
RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]
DictionaryBatch S2: ['cc', 'dd']
RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]
This issue is used to record this problem, and should be done after a ML discuss.
Reporter: Ji Liu / @tianchen92
Assignee: Ji Liu / @tianchen92
Note: This issue was originally created as ARROW-6308. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: