Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 'extend_existing' to Sqla Table object #626

Merged
merged 3 commits into from
Aug 11, 2022
Merged

Conversation

utkarsharma2
Copy link
Collaborator

@utkarsharma2 utkarsharma2 commented Aug 10, 2022

Description

What is the current behavior?

Quite often this test group fails to run in the CI:

nox -s "test-3.8(airflow='2.3')" -- --splits 12 --group 1 --cov=src --cov-report=xml --cov-branch

With:

=========================== short test summary info ============================
FAILED tests/test_example_dags.py::test_example_dag[example_snowflake_partial_table_with_append]
==== 1 failed, 29 passed, 329 deselected, 22 warnings in 294.40s (0:04:54) =====

Even if there were no changes that affected this test/our code-base, for instance:
#480
https://github.com/astronomer/astro-sdk/runs/7231300739

closes: #516

What is the new behavior?

The issue was with SQLA's reflection cache which keeps track of the tables created and maintains all the tables in a map
schema_columns[self.normalize_name(table_name)]. When this test was individually run there are no issues on local since SQLA cache is just initialized just for this test and because of which this was an intermittent issue CI. The fix is to introduce a parameter extend_existing which basically ensures the cache is not used.

Ran test multiple time to ensure this is working:
https://github.com/astronomer/astro-sdk/actions/runs/2832783413

Does this introduce a breaking change?

Nope

@codecov
Copy link

codecov bot commented Aug 10, 2022

Codecov Report

Merging #626 (580dea0) into main (1288291) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #626   +/-   ##
=======================================
  Coverage   93.30%   93.30%           
=======================================
  Files          41       41           
  Lines        1672     1672           
  Branches      211      211           
=======================================
  Hits         1560     1560           
  Misses         89       89           
  Partials       23       23           
Impacted Files Coverage Δ
src/astro/databases/base.py 96.12% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@utkarsharma2 utkarsharma2 marked this pull request as ready for review August 10, 2022 14:15
Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 👍

@tatiana
Copy link
Collaborator

tatiana commented Aug 10, 2022

@utkarsharma2 isn't the root cause of the issue because concurrent tests are trying to create a table with the same name (home and homes2)?

Wouldn't it be better to change the example DAG to create tables with unique names? Would this be enough to fix the problem? If we didn't set the names of these tables explicitly, it would also have the advantage that clean up would delete the temporary tables with unique names by the end of the execution.

Comment on lines +482 to +483
autoload_with=self.sqlalchemy_engine,
extend_existing=True,
Copy link
Collaborator

@tatiana tatiana Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without these arguments, the Python SDK would raise an exception if concurrent processes tried to create the same table (what was happening in the CI).
By introducing this change, we may be hiding this type of issue - making it harder for users to troubleshoot this anti-pattern (having two processes trying to create a table with same name).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana This was my first guess as well I converted persistent tables to temp tables, but the issue still persisted - https://github.com/astronomer/astro-sdk/runs/7765749178?check_suite_focus=true

And I don't think busting the cache will hide the issue if anything this will bring up the issue sooner since now you will be referring to the database directly and not the cache. I don't understand how this is an antipattern.

However, this comes at the cost of running a query to get table columns every time we create the table object, but since we are not handling all the table operations via SQLA. SQLA doesn't have any knowledge about updates done to a table by some operator. So, in my opinion, this should be added to keep such situations in check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana But I do understand your point, having temp would be better, and chances of collision would be reduced. I can make this change on top of existing changes. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the discussion on a call with @tatiana, we concluded the following:

  1. The extend_existing is required and was the root cause of the issue in CI.
  2. Persistent tables in example dag can also lead to conflicts when there are parallel test suites running.

Based on the above points I have updated the PR to reflect those changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you very much, @utkarsharma2 !

tatiana
tatiana previously approved these changes Aug 11, 2022
@tatiana tatiana dismissed their stale review August 11, 2022 13:34

Noticed an issue after approving

@utkarsharma2 utkarsharma2 merged commit cd5c5e9 into main Aug 11, 2022
@utkarsharma2 utkarsharma2 deleted the FixFlakyTests branch August 11, 2022 13:39
utkarsharma2 added a commit that referenced this pull request Aug 12, 2022
What is the current behavior?

Quite often this test group fails to run in the CI:

nox -s "test-3.8(airflow='2.3')" -- --splits 12 --group 1 --cov=src --cov-report=xml --cov-branch

With:

=========================== short test summary info ============================
FAILED tests/test_example_dags.py::test_example_dag[example_snowflake_partial_table_with_append]
==== 1 failed, 29 passed, 329 deselected, 22 warnings in 294.40s (0:04:54) =====

Even if there were no changes that affected this test/our code-base, for instance:
#480
https://github.com/astronomer/astro-sdk/runs/7231300739

closes: #516
What is the new behavior?

The issue was with SQLA's reflection cache which keeps track of the tables created and maintains all the tables in a map
schema_columns[self.normalize_name(table_name)]. When this test was individually run there are no issues on local since SQLA cache is just initialized just for this test and because of which this was an intermittent issue CI. The fix is to introduce a parameter extend_existing which basically ensures the cache is not used.

Ran test multiple time to ensure this is working:
https://github.com/astronomer/astro-sdk/actions/runs/2832783413
Does this introduce a breaking change?

Nope
kaxil pushed a commit that referenced this pull request Aug 18, 2022
What is the current behavior?

Quite often this test group fails to run in the CI:

nox -s "test-3.8(airflow='2.3')" -- --splits 12 --group 1 --cov=src --cov-report=xml --cov-branch

With:

=========================== short test summary info ============================
FAILED tests/test_example_dags.py::test_example_dag[example_snowflake_partial_table_with_append]
==== 1 failed, 29 passed, 329 deselected, 22 warnings in 294.40s (0:04:54) =====

Even if there were no changes that affected this test/our code-base, for instance:
#480
https://github.com/astronomer/astro-sdk/runs/7231300739

closes: #516
What is the new behavior?

The issue was with SQLA's reflection cache which keeps track of the tables created and maintains all the tables in a map
schema_columns[self.normalize_name(table_name)]. When this test was individually run there are no issues on local since SQLA cache is just initialized just for this test and because of which this was an intermittent issue CI. The fix is to introduce a parameter extend_existing which basically ensures the cache is not used.

Ran test multiple time to ensure this is working:
https://github.com/astronomer/astro-sdk/actions/runs/2832783413
Does this introduce a breaking change?

Nope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix flaky test_example_dag[example_snowflake_partial_table_with_append]
4 participants