Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow package import times in debug #292

Closed
ofx53 opened this issue Nov 27, 2023 · 6 comments
Closed

Slow package import times in debug #292

ofx53 opened this issue Nov 27, 2023 · 6 comments

Comments

@ofx53
Copy link

ofx53 commented Nov 27, 2023

Hi,

Environment:
Python 3.12.0 (also tried 3.8, 3.10)
databricks-sql-connector==3.0.0
SQLAlchemy==2.0.23
IDE: Vscode

When trying to connect to Databricks with SQLalchemy, we noticed huge delays in debug mode when trying to connect to the endpoint, taking multiple minutes. Digging through the code, we found that the hanging happened when importing the databricks.sqlalchemy package.

import databricks.sqlalchemy

Digging further in the code to find where is the problem, we found it was the
ttypes.py that contains ~111k lines that take a long time to parse in debug.

When trying to time it out with a simple script:

import time

if __name__ == "__main__":
    start_time = time.time()
    from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink
    print("Time to load: from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink", time.time() - start_time)

Run without debugging:
Time to load: from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink 0.02857661247253418
Run with debugging:
Time to load: from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink 203.38593983650208

Would there be a way to reduce loading time of the package in debug?

Thanks in advance!

@susodapop
Copy link
Contributor

I think the culprit is somewhere else in your environment or code. I used your reproduction steps and I get sub-second results in VSCode debug mode. I develop this connector in VSCode as well and have never seen this such delays.

ttypes.py is integral to the connector's functioning so eliminating it is not an option.

@susodapop
Copy link
Contributor

If you can share a complete code sample we can try to dig deeper into this. It should not take more than a few seconds to connect to a Databricks compute resource. But ttypes alone isn't the issue.

@ofx53
Copy link
Author

ofx53 commented Nov 28, 2023

Hi!

Thank you for the quick answer. Upon further research, the issue lies with the docker image used.

Pulling the image in the dockerfile using FROM python:3.12 does lead to the issue. The source of the issue was not found.

For now the team will use an Ubuntu image in development and a minimal python3.12 image for the production app.

Thanks!

@susodapop
Copy link
Contributor

Thanks for following up with your findings!

@boorboor
Copy link

boorboor commented Feb 27, 2024

Hi
I'm experiencing same problem while importing this module using python version 3.12 (not in a container) tries with python 3.11 are fine.

This is happening only when I run my application tests suit. Need to do more analysis...

image

@susodapop
Copy link
Contributor

After some experimentation I've found the root cause of this and written about it here #369 (comment)

The only workaround I can think of at the moment is to run your tests without the --cov flag ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants