You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values.
Basically the error can be traced to line 95 of DictionaryEncoder.java (setter.invoke(mutator, i, encoded);). It seems that the indices array which hold the encoded values is allocated on line 84 as indices.allocateNew() and it seems that allocateNew() only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.
Emilio Lahr-Vivaz / @elahrvivaz:
FYI, you don't have to use the DictionaryEncoder class. If you don't mind mapping your dictionary values yourself, you can do something like:
Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values.
Basically the error can be traced to line 95 of DictionaryEncoder.java (
setter.invoke(mutator, i, encoded);
). It seems that the indices array which hold the encoded values is allocated on line 84 asindices.allocateNew()
and it seems thatallocateNew()
only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.Reporter: Shayan Monshizadeh
Assignee: Li Jin / @icexelloss
Original Issue Attachments:
Note: This issue was originally created as ARROW-1407. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: