Skip to content
Permalink
Browse files
Added a Note about Large Numbers of Columns (#42)
* Added information about large numbers of columns

* Update README.md

Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
  • Loading branch information
TheCedarPrince and quinnj committed Oct 18, 2020
1 parent 8ba8002 commit 5fd86f6cab7338eb49a28b103d9a20cfb850389c
Showing 1 changed file with 5 additions and 0 deletions.
@@ -50,6 +50,11 @@ Read an arrow formatted table, from:

Returns a `Arrow.Table` object that allows column access via `table.col1`, `table[:col1]`, or `table[1]`.

The Apache Arrow standard is foremost a "columnar" format and saves a variety of metadata with each column (such as column name, type, length, etc.).
A data set which has tens of thousands of columns is probably not well suited for the arrow format and may cause dramatic file size increases when one saves to a `arrow` file.
If it is possible to reshape the data such that there are not as many columns, `Arrow.Table` should not have as many problems.
A simple method Julia provides is to simply execute `transpose(data)` to switch the rows and columns of your data if that does not interfere with one's analysis.

NOTE: the columns in an `Arrow.Table` are views into the original arrow memory, and hence are not easily
modifiable (with e.g. `push!`, `append!`, etc.). To mutate arrow columns, call `copy(x)` to materialize
the arrow data as a normal Julia array.

0 comments on commit 5fd86f6

Please sign in to comment.