Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Replace Parquet File Writer with Gzipped Jsonl File Writer #60

Merged
merged 6 commits into from
Feb 22, 2024

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Feb 22, 2024

Resolves: #50

Add Jsonl file writer.

This file writer better supports variable schemas that were breaking the Parquet writer.

@aaronsteers aaronsteers changed the title fix lint issues Feat: Add Jsonl File Writer Feb 22, 2024
@aaronsteers aaronsteers changed the title Feat: Add Jsonl File Writer Feat: Replace Parquet File Writer with Gzipped Jsonl File Writer Feb 22, 2024
Copy link
Contributor Author

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this successfully with source-shopify and source-klaviyo.

Copy link
Contributor

@bindipankhudi bindipankhudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! Looks good to me.

@aaronsteers
Copy link
Contributor Author

@bindipankhudi - Just FYI, there's apparently a bug in DuckDB where if you try to load a blank json file, it will just hang indefinitely. Because of this, I had to refactor the no-data treatment slightly. I modified it so that we don't actually create a batch, but we do still loop through all the streams and finalize all of them. During the finalize step, if there are no batches, then we exit after making sure the table exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants