Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake Extractor hangs jobs it is a part of, connection must be closed. #868

Closed
davcamer opened this issue Jan 5, 2021 · 5 comments
Closed
Labels
type:bug An unexpected problem or unintended behavior

Comments

@davcamer
Copy link
Contributor

davcamer commented Jan 5, 2021

Expected Behavior

With appropriate configuration, the Snowflake Extractor should run to completion, as other extractors do.

Current Behavior

When running on our CI/CD server, where I'm running all of our jobs, jobs that use the SnowflakeMetadataExtractor were hanging indefinitely. Here is a snippet of the end of the log:

INFO:databuilder.loader.file_system_neo4j_csv_loader:Deleting directory /var/tmp/amundsen/snowflake_production/relationships
INFO:databuilder.loader.file_system_neo4j_csv_loader:Deleting directory /var/tmp/amundsen/snowflake_production/nodes
INFO:root:Job completed
INFO:snowflake.connector.cursor:query: [ROLLBACK]
[..] On Cancel Task: Kills child processes

The process hung a the second to last line, before I manually canceled it, resulting in the final line.

Possible Solution

A "monkey patch" like the following resolves the issue for me:

    class ClosingSnowflakeMetadataExtractor(SnowflakeMetadataExtractor):
        def close(self) -> None:
            logging.info("Closing the SnowflakeMetadataExtractor's _alchemy_extractor's connection.")
            self._alchemy_extractor.connection.close()

    snowflake_extractor = ClosingSnowflakeMetadataExtractor()
    noop_transformer = NoopTransformer()
    csv_loader = FsNeo4jCSVLoader()

    snowflake_structure_task = DefaultTask(extractor=snowflake_extractor, loader=csv_loader, transformer=noop_transformer)

This subclasses the existing SnowflakeMetadataExtractor, and reaches in to the SQLAlchemyExtractor that it delegates to, and closes the connection. This resolves the problem, and the [Rollback] line where the process was hanging before is now seen at the end of the extraction process, like I would expect:

INFO:root:Closing the SnowflakeMetadataExtractor's _alchemy_extractor's connection.
INFO:snowflake.connector.cursor:query: [ROLLBACK]
INFO:snowflake.connector.cursor:query execution done
INFO:databuilder.publisher.neo4j_csv_publisher:Publishing Node csv files [

Snowflake's documentation warns about not explicitly closing connections, and I think this is the consequence of failing to close the connection.

As the code stands today, the Task class already calls the extractor's close method. But the SQLAlchemyExtractor and SnowflakeMetadataExtractor don't implement close, so their method is a simple pass statement implemented on the Scoped class. I think a good approach to address this would be to implement close behavior for both SQLAlchemyExtractor and SnowflakeMetadataExtractor.

SQLAlchemy engines can also be disposed, and SQLAlchemy's documentation there suggests that some applications might benefit from not even using the connection pool, but I think that is getting further in to uncommon approaches.

I would be happy to submit a PR to add a concrete close method to both SQLAlchemyExtractor and SnowflakeMetadataExtractor, if that approach seems correct to others.

Steps to Reproduce

  1. Run a SnowflakeMetadataExtractor job...
  2. In a docker container...
  3. Observe that it hangs after publishing with a [Rollback] message from the Snowflake driver.

I am not sure how much of this is specific to our environment, or if others were working around this by killing the tasks?

Context

I had to avoid fully automating our Snowflake ingests before fixing this problem.

Your Environment

  • Amundsen version used: amundsen-databuilder==4.0.3
  • Data warehouse stores: Snowflake, MySQL
  • Deployment (k8s or native): Amundsen Databuilder is running in a python:3.7 Docker image under GoCD.
@feng-tao feng-tao added Project: Databuilder type:bug An unexpected problem or unintended behavior labels Jan 8, 2021
@raels0
Copy link

raels0 commented Jan 12, 2021

I've also faced this issue, another possible solution is adapting the base extractor for SQLAlchemy to use engine-level transactions, documented here and here.

@shuichiro-makigaki
Copy link
Contributor

It is simple and good to implement a method SQLAlchemyExtractor.close(), I think. (and, diff size becomes small.)
As a basic of programming, opened connections always should be closed.

Anyway, I'm waiting for the Pull Request because I've also faced this issue.

@davcamer
Copy link
Contributor Author

davcamer commented Mar 3, 2021

I've opened the PR here: amundsen-io/amundsendatabuilder#453

@feng-tao feng-tao closed this as completed Mar 3, 2021
@feng-tao
Copy link
Member

feng-tao commented Mar 3, 2021

change is merged

@davcamer
Copy link
Contributor Author

davcamer commented Mar 4, 2021

I realized that PR #453 would cause AttributeErrors if there were problems during init for any of the extractors: close would still be called but the attribute for the inner extractor or connection might be undefined or a None value.

I've raised a second PR to fix this, along with a small refactoring: amundsen-io/amundsendatabuilder#454

dorianj pushed a commit to dorianj/amundsen that referenced this issue Apr 25, 2021
…n-io#868)

* Revert "fix: fix cron to run later today (amundsen-io#867)"

This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3.

* won't mess up again

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
dorianj pushed a commit to dorianj/amundsen that referenced this issue Apr 25, 2021
This addresses amundsen-io#868, where extractions using the Snowflake extractor would
hand until the connection was closed. I've added close() calls to the other
SQL Alchemy based extractors since it is a general expectation of SQL Alchemy
and most of the infrastructure existed to do it already.

Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
feng-tao pushed a commit that referenced this issue May 7, 2021
* Revert "fix: fix cron to run later today (#867)"

This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3.

* won't mess up again

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
feng-tao pushed a commit that referenced this issue May 7, 2021
This addresses #868, where extractions using the Snowflake extractor would
hand until the connection was closed. I've added close() calls to the other
SQL Alchemy based extractors since it is a general expectation of SQL Alchemy
and most of the infrastructure existed to do it already.

Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
zacr pushed a commit to SaltIO/amundsen that referenced this issue May 13, 2022
…n-io#868)

* Revert "fix: fix cron to run later today (amundsen-io#867)"

This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3.

* won't mess up again

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this issue Jun 30, 2022
This addresses amundsen-io#868, where extractions using the Snowflake extractor would
hand until the connection was closed. I've added close() calls to the other
SQL Alchemy based extractors since it is a general expectation of SQL Alchemy
and most of the infrastructure existed to do it already.

Signed-off-by: Dave Cameron <dcameron@digitalocean.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this issue Jun 30, 2022
…n-io#868)

* Revert "fix: fix cron to run later today (amundsen-io#867)"

This reverts commit 160d08a5f1a7f93026e093033f2ecca94a9a4ce3.

* won't mess up again

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug An unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants