Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 'extend_existing' to Sqla Table object #626

Merged
merged 3 commits into from
Aug 11, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion src/astro/databases/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,10 @@ def get_sqla_table(self, table: Table) -> SqlaTable:
:param table: Astro Table to be converted to SQLAlchemy table instance
"""
return SqlaTable(
table.name, table.sqlalchemy_metadata, autoload_with=self.sqlalchemy_engine
table.name,
table.sqlalchemy_metadata,
autoload_with=self.sqlalchemy_engine,
extend_existing=True,
Comment on lines +482 to +483
Copy link
Collaborator

@tatiana tatiana Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without these arguments, the Python SDK would raise an exception if concurrent processes tried to create the same table (what was happening in the CI).
By introducing this change, we may be hiding this type of issue - making it harder for users to troubleshoot this anti-pattern (having two processes trying to create a table with same name).

Copy link
Collaborator Author

@utkarsharma2 utkarsharma2 Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana This was my first guess as well I converted persistent tables to temp tables, but the issue still persisted - https://github.com/astronomer/astro-sdk/runs/7765749178?check_suite_focus=true

And I don't think busting the cache will hide the issue if anything this will bring up the issue sooner since now you will be referring to the database directly and not the cache. I don't understand how this is an antipattern.

However, this comes at the cost of running a query to get table columns every time we create the table object, but since we are not handling all the table operations via SQLA. SQLA doesn't have any knowledge about updates done to a table by some operator. So, in my opinion, this should be added to keep such situations in check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana But I do understand your point, having temp would be better, and chances of collision would be reduced. I can make this change on top of existing changes. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the discussion on a call with @tatiana, we concluded the following:

  1. The extend_existing is required and was the root cause of the issue in CI.
  2. Persistent tables in example dag can also lead to conflicts when there are parallel test suites running.

Based on the above points I have updated the PR to reflect those changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you very much, @utkarsharma2 !

)

# ---------------------------------------------------------
Expand Down