Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

live-tests: add duckdb backend #35923

Merged
merged 1 commit into from Mar 11, 2024

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Mar 8, 2024

Small 馃巵 for @bleonard : the live testing tool persists airbyte messages to a DuckDb file.

Copy link

vercel bot commented Mar 8, 2024

The latest updates on your projects. Learn more about Vercel for Git 鈫楋笌

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs 猬滐笍 Ignored (Inspect) Visit Preview Mar 11, 2024 10:26am

Copy link
Contributor Author

alafanechere commented Mar 8, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @alafanechere and the rest of your teammates on Graphite Graphite

@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_pass_connection_id branch from e1cb546 to 470af72 Compare March 8, 2024 14:55
@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_add_duckdb_backend branch 2 times, most recently from 9630662 to 644e3ea Compare March 8, 2024 15:00
@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_pass_connection_id branch from 470af72 to f3f8112 Compare March 8, 2024 15:27
@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_add_duckdb_backend branch from 644e3ea to fed1b63 Compare March 8, 2024 15:28
@alafanechere alafanechere marked this pull request as ready for review March 8, 2024 16:07
@alafanechere alafanechere requested a review from a team as a code owner March 8, 2024 16:07
@bleonard
Copy link
Contributor

bleonard commented Mar 8, 2024

It's not a present for me. This will help run comparisons at scale!

@octavia-squidington-iv octavia-squidington-iv requested review from a team March 9, 2024 00:03
@bleonard
Copy link
Contributor

bleonard commented Mar 9, 2024

I bet @aaronsteers knows a way to load into duck directly from the jsonl files

@aaronsteers
Copy link
Collaborator

Indeed! There's a function called read_json_auto(), which we're using in PyAirbyte.

https://duckdb.org/docs/data/json/overview.html#read_json_auto-function

@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_pass_connection_id branch 4 times, most recently from 84304b1 to ee897ca Compare March 11, 2024 08:20
@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_add_duckdb_backend branch from fed1b63 to 41c5bc7 Compare March 11, 2024 08:20
Base automatically changed from augustin/03-08-live-tests_pass_connection_id to master March 11, 2024 08:50
@alafanechere alafanechere force-pushed the augustin/03-08-live-tests_add_duckdb_backend branch from 41c5bc7 to 715e72e Compare March 11, 2024 10:26
@alafanechere
Copy link
Contributor Author

@bleonard @aaronsteers @erohmensing thanks for your suggestion
I ended up re-using the FileBackend logic to write jsonl files and then load them to Duckdb.
I think it has a reduced memory footprint compared to the use of Pandas. And it makes the backend code a lot lighter.

@alafanechere alafanechere enabled auto-merge (squash) March 11, 2024 10:29
@alafanechere alafanechere merged commit 4e05272 into master Mar 11, 2024
28 checks passed
@alafanechere alafanechere deleted the augustin/03-08-live-tests_add_duckdb_backend branch March 11, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants