-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parquet format #27
Conversation
|
||
private | ||
|
||
def read_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the big downside of reading Parquet in Ruby that you MUST load the whole file into memory which is 💩. Nothing we can do about it, but something to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as the input data is partitioned sensibly we should be ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet sucks! Is there any reason the devs wouldn't update to add stream support? Is there a something about the implementation that requires it to be in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = msg.to_parquet(@schema, **opts) if msg.respond_to?(:to_parquet) | ||
|
||
res = @batch.push(msg) | ||
flush_table if @batch.size >= @batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the writer side, we can work in batches. It's not ideal but at least it's a viable method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and make sense.
No description provided.