Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only write airbyte messages to duckdb on read #38647

Merged
merged 2 commits into from
May 24, 2024

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented May 24, 2024

This fixes an issue that @askarpets encountered during regression test runs for multiple connection IDs for source-pinterest (report here).

    | Traceback (most recent call last):
    |   File "/root/.cache/pypoetry/virtualenvs/live-tests-9TtSrW0h-py3.10/lib/python3.10/site-packages/asyncer/_main.py", line 164, in value_wrapper
    |     value = await partial_f()
    |   File "/app/src/live_tests/commons/connector_runner.py", line 171, in _run
    |     await execution_result.save_artifacts(self.output_dir, self.duckdb_path)
    |   File "/app/src/live_tests/commons/models.py", line 374, in save_artifacts
    |     self.save_airbyte_messages(output_dir, duckdb_path)
    |   File "/app/src/live_tests/commons/models.py", line 361, in save_airbyte_messages
    |     self.backend.write(self.airbyte_messages)
    |   File "/app/src/live_tests/commons/backends/duckdb_backend.py", line 64, in write
    |     duck_db_conn.sql(
    | duckdb.duckdb.NotImplementedException: Not implemented Error: Duplicate name "ad_group_id" in struct auto-detected in JSON, try ignore_errors=true

I was able to reproduce the issue locally.

I'm still not entirely sure why this is just coming up now for source-pinterest, but don't see a major advantage to storing the results for non-read commands in duckdb so am proposing this change. I haven't tried setting ignore_errors=true because it didn't feel like quite the right thing to do e.g. during read, but am open to other opinions.

Note: I found this issue that may be related, but rolling back the version didn't help.

@clnoll clnoll requested a review from a team as a code owner May 24, 2024 13:03
Copy link

vercel bot commented May 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview May 24, 2024 1:25pm

@clnoll clnoll requested a review from alafanechere May 24, 2024 13:03
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll Duckdb was updated to a new minor version (0.10.3) in #38571 . Can you try downgrading it to 0.10.12 to check if it's a regression due to the update?

@clnoll
Copy link
Contributor Author

clnoll commented May 24, 2024

Just added a note on that @alafanechere - I downgraded to 10.0.2 and it didn't fix the problem perhaps a bigger downgrade will work. Checking.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 Can you please bump the package version?

@@ -168,7 +168,7 @@ async def _run(
http_dump=await self.http_proxy.retrieve_http_dump() if self.http_proxy else None,
executed_container=executed_container,
)
await execution_result.save_artifacts(self.output_dir, self.duckdb_path)
await execution_result.save_artifacts(self.output_dir, self.duckdb_path if airbyte_command == "read" else None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a mapping like:

persist_command_to_duck_db = {
"read": True,
"write": True,
"spec": False,
"discover": False
}
Suggested change
await execution_result.save_artifacts(self.output_dir, self.duckdb_path if airbyte_command == "read" else None)
await execution_result.save_artifacts(self.output_dir, self.duckdb_path if persist_command_to_duck_db[airbyte_command] else None)

With a comment saying why we disable persistance on other commands?

@clnoll
Copy link
Contributor Author

clnoll commented May 24, 2024

@alafanechere downgrading to 0.10.1 fixed the issue so I think that's the way to go here. Sorry for the back and forth.

@clnoll clnoll requested a review from alafanechere May 24, 2024 13:24
@clnoll clnoll force-pushed the catherine/fix-duckdb-on-catalog-write branch from 85fa9d0 to 11d9b7e Compare May 24, 2024 13:25
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Definitely a preferable fix!

@clnoll clnoll merged commit 828a637 into master May 24, 2024
26 checks passed
@clnoll clnoll deleted the catherine/fix-duckdb-on-catalog-write branch May 24, 2024 13:46
@natikgadzhi
Copy link
Contributor

natikgadzhi commented May 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants