Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Add DataFrame.toJSON #18166

Open
asfimport opened this issue Feb 22, 2018 · 2 comments
Open

[JS] Add DataFrame.toJSON #18166

asfimport opened this issue Feb 22, 2018 · 2 comments

Comments

@asfimport
Copy link
Collaborator

Currently, CountByResult has its own toJSON method, but there should be a more general one for every DataFrame.

CountByResult.toJSON returns:

{
  "keyA": 10,
  "keyB": 10,
  ...
}

A more general toJSON could just return a list of objects with an entry for each column. For the above CountByResult, the output would look like:

[
  {value: "keyA", count: 10},
  {value: "keyB", count: 10},
  ...
]

Reporter: Brian Hulette / @TheNeuralBit

Note: This issue was originally created as ARROW-2202. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Paul Taylor / @trxcllnt:
@TheNeuralBit since RecordBatch extends StructVector, and StructVectors implement toJSON(), we should be able to implement toJSON() on the Table by calling toJSON() on each inner RecordBatch. This would yield rows as compact Arrays of each value, but we could also apply the MapView to each RecordBatch if we wanted rows as JS Objects of key/value pairs instead. Alternatively, we could refactor the internal tableRowsToString() method to yield compact rows.

Since serializing table rows to JS Arrays or Objects can easily exceed the memory limit of a single node process, it's probably worth exposing a row generator function. I ran into this issue piping the toString() result to the console, so we can follow the same pattern here.

It also might be valuable to name it something else, and reserve toJSON() for generating the Arrow JSON format. The rational here is that toJSON() is automatically invoked by JSON.stringify(), which is most commonly used for serialization and deserialization, making this possible:

const newTable = Table.from(JSON.parse(JSON.stringify(oldTable)));

@asfimport
Copy link
Collaborator Author

Brian Hulette / @TheNeuralBit:
[~paul.e.taylor] Good points. Definitely makes sense that toJSON() should return data parse-able by Table.from. I made a new ticket for creating that version of toJSON(). Maybe this accessor could be called toObjectArray()?

My primary motivation for this was to be able to easily access data from a relatively small Table, one that has been heavily filtered or aggregated somehow, so I wasn't too concerned with exceeding memory limits. That being said making it a generator or at least having a generator version could be useful for large tables as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant