[JS] Add DataFrame.toJSON #18166

asfimport · 2018-02-22T22:43:32Z

Currently, CountByResult has its own toJSON method, but there should be a more general one for every DataFrame.

CountByResult.toJSON returns:

{
  "keyA": 10,
  "keyB": 10,
  ...
}

A more general toJSON could just return a list of objects with an entry for each column. For the above CountByResult, the output would look like:

[
  {value: "keyA", count: 10},
  {value: "keyB", count: 10},
  ...
]

Reporter: Brian Hulette / @TheNeuralBit

_{Note: This issue was originally created as ARROW-2202. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2018-02-26T06:39:55Z

Paul Taylor / @trxcllnt:
@TheNeuralBit since RecordBatch extends StructVector, and StructVectors implement toJSON(), we should be able to implement toJSON() on the Table by calling toJSON() on each inner RecordBatch. This would yield rows as compact Arrays of each value, but we could also apply the MapView to each RecordBatch if we wanted rows as JS Objects of key/value pairs instead. Alternatively, we could refactor the internal tableRowsToString() method to yield compact rows.

Since serializing table rows to JS Arrays or Objects can easily exceed the memory limit of a single node process, it's probably worth exposing a row generator function. I ran into this issue piping the toString() result to the console, so we can follow the same pattern here.

It also might be valuable to name it something else, and reserve toJSON() for generating the Arrow JSON format. The rational here is that toJSON() is automatically invoked by JSON.stringify(), which is most commonly used for serialization and deserialization, making this possible:

const newTable = Table.from(JSON.parse(JSON.stringify(oldTable)));

asfimport · 2018-02-26T14:32:56Z

Brian Hulette / @TheNeuralBit:
[~paul.e.taylor] Good points. Definitely makes sense that toJSON() should return data parse-able by Table.from. I made a new ticket for creating that version of toJSON(). Maybe this accessor could be called toObjectArray()?

My primary motivation for this was to be able to easily access data from a relatively small Table, one that has been heavily filtered or aggregated somehow, so I wasn't too concerned with exceeding memory limits. That being said making it a generator or at least having a generator version could be useful for large tables as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JS] Add DataFrame.toJSON #18166

[JS] Add DataFrame.toJSON #18166

asfimport commented Feb 22, 2018

asfimport commented Feb 26, 2018

asfimport commented Feb 26, 2018

[JS] Add DataFrame.toJSON #18166

[JS] Add DataFrame.toJSON #18166

Comments

asfimport commented Feb 22, 2018

asfimport commented Feb 26, 2018

asfimport commented Feb 26, 2018