-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake Extractor hangs jobs it is a part of, connection must be closed. #868
Comments
I've also faced this issue, another possible solution is adapting the base extractor for SQLAlchemy to use engine-level transactions, documented here and here. |
It is simple and good to implement a method Anyway, I'm waiting for the Pull Request because I've also faced this issue. |
I've opened the PR here: amundsen-io/amundsendatabuilder#453 |
change is merged |
I realized that PR #453 would cause AttributeErrors if there were problems during I've raised a second PR to fix this, along with a small refactoring: amundsen-io/amundsendatabuilder#454 |
…n-io#868) * Revert "fix: fix cron to run later today (amundsen-io#867)" This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3. * won't mess up again Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
This addresses amundsen-io#868, where extractions using the Snowflake extractor would hand until the connection was closed. I've added close() calls to the other SQL Alchemy based extractors since it is a general expectation of SQL Alchemy and most of the infrastructure existed to do it already. Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
* Revert "fix: fix cron to run later today (#867)" This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3. * won't mess up again Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
This addresses #868, where extractions using the Snowflake extractor would hand until the connection was closed. I've added close() calls to the other SQL Alchemy based extractors since it is a general expectation of SQL Alchemy and most of the infrastructure existed to do it already. Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
…n-io#868) * Revert "fix: fix cron to run later today (amundsen-io#867)" This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3. * won't mess up again Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
This addresses amundsen-io#868, where extractions using the Snowflake extractor would hand until the connection was closed. I've added close() calls to the other SQL Alchemy based extractors since it is a general expectation of SQL Alchemy and most of the infrastructure existed to do it already. Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
…n-io#868) * Revert "fix: fix cron to run later today (amundsen-io#867)" This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3. * won't mess up again Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Expected Behavior
With appropriate configuration, the Snowflake Extractor should run to completion, as other extractors do.
Current Behavior
When running on our CI/CD server, where I'm running all of our jobs, jobs that use the
SnowflakeMetadataExtractor
were hanging indefinitely. Here is a snippet of the end of the log:The process hung a the second to last line, before I manually canceled it, resulting in the final line.
Possible Solution
A "monkey patch" like the following resolves the issue for me:
This subclasses the existing SnowflakeMetadataExtractor, and reaches in to the
SQLAlchemyExtractor
that it delegates to, and closes the connection. This resolves the problem, and the [Rollback] line where the process was hanging before is now seen at the end of the extraction process, like I would expect:Snowflake's documentation warns about not explicitly closing connections, and I think this is the consequence of failing to close the connection.
As the code stands today, the
Task
class already calls the extractor's close method. But the SQLAlchemyExtractor and SnowflakeMetadataExtractor don't implement close, so their method is a simple pass statement implemented on the Scoped class. I think a good approach to address this would be to implement close behavior for both SQLAlchemyExtractor and SnowflakeMetadataExtractor.SQLAlchemy engines can also be disposed, and SQLAlchemy's documentation there suggests that some applications might benefit from not even using the connection pool, but I think that is getting further in to uncommon approaches.
I would be happy to submit a PR to add a concrete
close
method to both SQLAlchemyExtractor and SnowflakeMetadataExtractor, if that approach seems correct to others.Steps to Reproduce
I am not sure how much of this is specific to our environment, or if others were working around this by killing the tasks?
Context
I had to avoid fully automating our Snowflake ingests before fixing this problem.
Your Environment
python:3.7
Docker image under GoCD.The text was updated successfully, but these errors were encountered: