-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow import times on Python 3.12 #369
Comments
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
AnalysisAfter eluding me for several months I've finally snaked out the issue. The good news is that it's not an issue with tl;dr There is a bug in the I created a minimum reproducible example.
# test_file.py
def test_downloader_import_time():
start = time.time()
from databricks.sql.cloudfetch import downloader
end = time.time()
difference = end - start
assert difference < 1, f"It took {difference} seconds to import the downloader"
$ python -m pytest test_file.py
================ 1 passed in 0.07s ===============
$ python -m pytest test_file.py --cov
FAILED test_file.py::test_downloader_import_time - AssertionError: It took 207.19054412841797 seconds to import the downloader WorkaroundsFrom what I can tell, there is no way to avoid this completely until cc: @benc-db |
Oh wow! Fantastic find. That would also explain some of the other instabilities we observed with Python.12. Yep. It will be easy to disable --coverage for Python 3.12 for us! |
Python 3.12 introduced a new (much faster) way of tracking and monitoring execution of python code by tools like coverage tracking using sysmon (PEP 669). This however also apparently heavily impacted performance of coverage tracking for Python 3.12 when PEP 669 is not used. The coverage library since 7.4.0 has an experimental support for PEP 669 that can be enabled with COVERAGE_CORE=sysmon env variable and a number of users confirmed it fixes the problem. We are using 7.4.4 coverage already so we should enable this mode to speed up our coverage tracking. That should also allow us to remove databricks from excluded providers. See databricks/databricks-sql-python#369 for databricks case and nedbat/coveragepy#1665 for coverage bug.
PR here: apache/airflow#38194 - once I see it working I will likely be able to remove databricks provider exclusion for 3.12 :) . We are already using the version of coverage that should have PEP 669 support so I just enabled it for coverage tests and 🤞 |
Very curious to see how this goes for you. I tried to using a newer version of coverage for my reproduction above but didn't see any noticeable difference (perhaps 5% faster). |
I will let you know :) |
Generally speaking - what we saw in 3.12 builds WITH coverage enabled was that sometimes it took way longer to complete the tests (30%-50% slower) - but not always. That was really puzzling that I saw it in our However, the coverage theory fits it perfectly - because our PRs are running usually a subset of tests, those that are relevant to the change coming in the PR, so we do not run coverage there - we only use coverage in the It happened frequently enough (multiple times a day in canary builds) to see the result of that change rather quickly. |
) Python 3.12 introduced a new (much faster) way of tracking and monitoring execution of python code by tools like coverage tracking using sysmon (PEP 669). This however also apparently heavily impacted performance of coverage tracking for Python 3.12 when PEP 669 is not used. The coverage library since 7.4.0 has an experimental support for PEP 669 that can be enabled with COVERAGE_CORE=sysmon env variable and a number of users confirmed it fixes the problem. We are using 7.4.4 coverage already so we should enable this mode to speed up our coverage tracking. That should also allow us to remove databricks from excluded providers. See databricks/databricks-sql-python#369 for databricks case and nedbat/coveragepy#1665 for coverage bug.
so far,so good |
We excluded Python 3.12 from Databricks provider, because it was failing our Python 3.12 tests intermittently (but often enough to make a difference). It turned out that this was caused by running the tests with coverage enabled and PEP 669 implementation in Python 3.12 impacting intermittently performance of tests run with coverage. However seems that experimenetal PEP 669 support implemented in coverage 7.4.0 is nicely handling the performance issues and after apache#38194 we shoudl be able to enable Python 3.12 for Databricks without impacting our tests. Related: databricks/databricks-sql-python#369
Looks good so far - I have not seen side effects observed in main for Python 3.12 yet - after several builds. Enabling databricks now back apache/airflow#38207 and we will know for sure once we merge, because it has been failing almost always before. |
We excluded Python 3.12 from Databricks provider, because it was failing our Python 3.12 tests intermittently (but often enough to make a difference). It turned out that this was caused by running the tests with coverage enabled and PEP 669 implementation in Python 3.12 impacting intermittently performance of tests run with coverage. However seems that experimenetal PEP 669 support implemented in coverage 7.4.0 is nicely handling the performance issues and after apache#38194 we shoudl be able to enable Python 3.12 for Databricks without impacting our tests. Related: databricks/databricks-sql-python#369
We excluded Python 3.12 from Databricks provider, because it was failing our Python 3.12 tests intermittently (but often enough to make a difference). It turned out that this was caused by running the tests with coverage enabled and PEP 669 implementation in Python 3.12 impacting intermittently performance of tests run with coverage. However seems that experimenetal PEP 669 support implemented in coverage 7.4.0 is nicely handling the performance issues and after #38194 we shoudl be able to enable Python 3.12 for Databricks without impacting our tests. Related: databricks/databricks-sql-python#369
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
Eventually I just disabled coverage for Python 3.12. The tests with coverage on Python 3.12 took a long time and some of them even timed out inside coverage's sysmon. |
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
Databricks is excluded for Python 3.12 because running databricks-sql-python imports on Python 3.12 Cause extremely long import times: databricks/databricks-sql-python#369 and until the problem is fixed, we exclude Python 3.12 for Databricks provider
…che#38194) Python 3.12 introduced a new (much faster) way of tracking and monitoring execution of python code by tools like coverage tracking using sysmon (PEP 669). This however also apparently heavily impacted performance of coverage tracking for Python 3.12 when PEP 669 is not used. The coverage library since 7.4.0 has an experimental support for PEP 669 that can be enabled with COVERAGE_CORE=sysmon env variable and a number of users confirmed it fixes the problem. We are using 7.4.4 coverage already so we should enable this mode to speed up our coverage tracking. That should also allow us to remove databricks from excluded providers. See databricks/databricks-sql-python#369 for databricks case and nedbat/coveragepy#1665 for coverage bug.
We excluded Python 3.12 from Databricks provider, because it was failing our Python 3.12 tests intermittently (but often enough to make a difference). It turned out that this was caused by running the tests with coverage enabled and PEP 669 implementation in Python 3.12 impacting intermittently performance of tests run with coverage. However seems that experimenetal PEP 669 support implemented in coverage 7.4.0 is nicely handling the performance issues and after apache#38194 we shoudl be able to enable Python 3.12 for Databricks without impacting our tests. Related: databricks/databricks-sql-python#369
We just enabled Python 3.12 support for Apache Airflow and we started to experienced extremely slow import times for example dags referring to Databricks
This has been previously reported here: #292 as occuring on Python 3.12 and closed after the authors switched to earlier Python versions, but the underlying issue had not been investigated and fixed and it occurs and it is very apparent in Airflow CI
Example failure that we experience (https://github.com/apache/airflow/actions/runs/8246010879/job/22551487439#step:6:15264)
This is not a blocker for Airflow - we will simply exclude Python 3.12 for databricks provider - so databricks provider will not be available for Python 3.12. But it would be great to fix it eventualy. Once the problem is fixed, we should be able to test it by re-enabling databricks for Python 3.12 and running a number of CI builds for Airflow "canary" buillds where the issue is apparent.
The text was updated successfully, but these errors were encountered: