You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Paul Taylor / @trxcllnt: @TheNeuralBit since RecordBatch extends StructVector, and StructVectors implement toJSON(), we should be able to implement toJSON() on the Table by calling toJSON() on each inner RecordBatch. This would yield rows as compact Arrays of each value, but we could also apply the MapView to each RecordBatch if we wanted rows as JS Objects of key/value pairs instead. Alternatively, we could refactor the internal tableRowsToString() method to yield compact rows.
Since serializing table rows to JS Arrays or Objects can easily exceed the memory limit of a single node process, it's probably worth exposing a row generator function. I ran into this issue piping the toString() result to the console, so we can follow the same pattern here.
It also might be valuable to name it something else, and reserve toJSON() for generating the Arrow JSON format. The rational here is that toJSON() is automatically invoked by JSON.stringify(), which is most commonly used for serialization and deserialization, making this possible:
Brian Hulette / @TheNeuralBit:
[~paul.e.taylor] Good points. Definitely makes sense that toJSON() should return data parse-able by Table.from. I made a new ticket for creating that version of toJSON(). Maybe this accessor could be called toObjectArray()?
My primary motivation for this was to be able to easily access data from a relatively small Table, one that has been heavily filtered or aggregated somehow, so I wasn't too concerned with exceeding memory limits. That being said making it a generator or at least having a generator version could be useful for large tables as well.
Currently,
CountByResult
has its owntoJSON
method, but there should be a more general one for everyDataFrame
.CountByResult.toJSON
returns:A more general
toJSON
could just return a list of objects with an entry for each column. For the aboveCountByResult
, the output would look like:Reporter: Brian Hulette / @TheNeuralBit
Note: This issue was originally created as ARROW-2202. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: