Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionaries can only hold a maximum of 4096 indices #17433

Closed
asfimport opened this issue Aug 23, 2017 · 5 comments
Closed

Dictionaries can only hold a maximum of 4096 indices #17433

asfimport opened this issue Aug 23, 2017 · 5 comments
Assignees
Milestone

Comments

@asfimport
Copy link

Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values.

Basically the error can be traced to line 95 of DictionaryEncoder.java (setter.invoke(mutator, i, encoded);). It seems that the indices array which hold the encoded values is allocated on line 84 as indices.allocateNew() and it seems that allocateNew() only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.

Reporter: Shayan Monshizadeh
Assignee: Li Jin / @icexelloss

Original Issue Attachments:

Note: This issue was originally created as ARROW-1407. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Emilio Lahr-Vivaz / @elahrvivaz:
FYI, you don't have to use the DictionaryEncoder class. If you don't mind mapping your dictionary values yourself, you can do something like:

NullableIntVector vector = new FieldType(true, MinorType.INT.getType, dictionaryEncoding).createNewSingleVector(name, allocator, callBack);
vector.getMutator().setSafe(i, j);

@asfimport
Copy link
Author

Wes McKinney / @wesm:
@icexelloss or @alphalfalfa any interest in taking a look at this?

@asfimport
Copy link
Author

Li Jin / @icexelloss:
I'll take this one.

@asfimport
Copy link
Author

Li Jin / @icexelloss:
PR: #1024

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 1024
#1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants