Skip to content

sql.connect wrongly reports a timeout when attempting to connect to a non-existing warehouse #481

@haschdl

Description

@haschdl

Disclaimer: I am a Databricks employee.

Calling databricks.sql.connect with a non-existing warehouse_id will raise a generic time-out exception, coming from the ThriftBackend.

It's only by digging in the source code that one can find the default of 30(!) retries - which can be set by including _retry_stop_after_attempts_count in sql.connect.

As a user of SQL Client, I'd like to easily make a distinction between a timeout (=unknown circumstances, makes more sense to retry) and a non-existing/deleted warehouse (=hard fact, makes no sense to keep trying to connect).

connection = sql.connect(
            server_hostname=server_hostname,
            http_path=f"""/sql/1.0/warehouses/{warehouse_id}""",
            credentials_provider=_credentials_provider
        )

Exception:

  
  HTTPSConnectionPool(host='[xxxx.databricks.com](http://xxxx.cloud.databricks.com/)', port=443):   
  Max retries exceeded with url: /sql/1.0/warehouses/786786d78562786  
  (Caused by ResponseError('too many 404 error responses')).

Alternatives considered

  1. One can add another dependency to the SDK and check if the Warehouse exists. It seems overkill to add the SDK just for that purpose. It really should be part of the connect client.
  2. One can parse the exception text and apply some heuristics to form an educated guess with the 404 reply. This is hacky and might have too many edge-cases. E.g. calling /sql/1.0/**warehoses**/xxxxxwould raise the exact same exception.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions