Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Support write interleaved dictionaries and batches in IPC stream #22689

Closed
asfimport opened this issue Aug 21, 2019 · 3 comments
Closed

Comments

@asfimport
Copy link

Per discussions in the following threads, as spec(http://arrow.apache.org/docs/format/IPC.html#streaming-format) described, as long as a record batch doesn't reference a dictionary they can be interleaved.

#4960

#5146

Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040,  But it’s impossible to write data in this format.

cases below should be supported:

i. have a record batch of one dictionary encoded column S

  1. Schema

  2. RecordBatch: S=[null, null, null, null]

  3. DictionaryBatch: ['abc', 'efg']

  4. Recordbatch: S=[0, 1, 0, 1]

    ii. have a record batch of two dictionary encoded column S1, S2

  5. Schema

  6. DictionaryBatch S1: ['ab', 'cd']

  7. RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]

  8. DictionaryBatch S2: ['cc', 'dd']

  9. RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]

    This issue is used to record this problem, and should be done after a ML discuss.

Reporter: Ji Liu / @tianchen92
Assignee: Ji Liu / @tianchen92

Note: This issue was originally created as ARROW-6308. Please see the migration documentation for further details.

@asfimport
Copy link
Author

@asfimport
Copy link
Author

Wes McKinney / @wesm:
The case where a record batch "does not reference a dictionary" is pretty esoteric. A dictionary-encoded field would have to have all null values. Let's discuss further on the mailing list

@asfimport
Copy link
Author

Ji Liu / @tianchen92:
Sure, we did some thing for reading interleaved data and not support writing like that format via ARROW-6040. I started a ML discuss for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants