Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too large queries produce MaxRetryError #413

Closed
rth opened this issue Jul 11, 2024 · 5 comments · Fixed by #414
Closed

Too large queries produce MaxRetryError #413

rth opened this issue Jul 11, 2024 · 5 comments · Fixed by #414

Comments

@rth
Copy link

rth commented Jul 11, 2024

Previously when a too big query was made #383 we got 0 rows as output (as discussed in that issue) . With the changes in #405 for me it now produces an MaxRetryError which is better but the error message is misleading (and also retying so many times is slow).

The minimal code I'm using is,

    from databricks import sql as databricks_sql

    db = databricks_sql.connect(
        server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
        http_path=os.getenv("DATABRICKS_HTTP_PATH"),
        access_token=os.getenv("DATABRICKS_TOKEN"),
        _tls_no_verify=True
    )
    cursor = db.cursor()
    cursor.execute("<my-query>")
    data = cursor.fetchall()

If the query is small, it works with no warnings.

If the query is too big produces the following MaxRetryError with a nested SSLError. There is no way to detect a too big query in the HTTP response status without retying N times and reaching MaxRetryError? And also I have the impression that _tls_no_verify is not passed somewhere in this case, which produces those SSLError. cc @kravets-levko

urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='xxxx.blob.core.windows.net', port=443): Max retries exceeded with url: /jobs/999999/sql/2024-07-11/14/results_2024-07-11T14:44:24Z_ef4c56f4-6fc6-43ca-b8dd-009eeb472cd4?sig=xxxx&se=2024-07-11T14%3A59%3A26Z&sv=2019-02-02&spr=https&sp=r&sr=b (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "fetch-databricks.py", line 13, in main
    cursor.execute("select * from sgg.site_cbs_coatify_view")
  File "databricks/sql/client.py", line 768, in execute
    execute_response = self.thrift_backend.execute_command(
  File "databricks/sql/thrift_backend.py", line 869, in execute_command
    return self._handle_execute_response(resp, cursor)
  File "databricks/sql/thrift_backend.py", line 966, in _handle_execute_response
    return self._results_message_to_execute_response(resp, final_operation_state)
  File "databricks/sql/thrift_backend.py", line 770, in _results_message_to_execute_response
    arrow_queue_opt = ResultSetQueueFactory.build_queue(
  File "databricks/sql/utils.py", line 84, in build_queue
    return CloudFetchQueue(
  File "databricks/sql/utils.py", line 175, in __init__
    self.table = self._create_next_table()
  File "databricks/sql/utils.py", line 238, in _create_next_table
    downloaded_file = self.download_manager.get_next_downloaded_file(
  File "databricks/sql/cloudfetch/download_manager.py", line 68, in get_next_downloaded_file
    file = task.result()
  File "python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "databricks/sql/cloudfetch/downloader.py", line 95, in run
    response = session.get(
  File "python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "python3.10/site-packages/requests/adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='xxxxxx.blob.core.windows.net', port=443): Max retries exceeded with url: /jobs/123445/sql/2024-07-11/14/results_2024-07-11T14:44:24Z_ef4c56f4-6fc6-43ca-b8dd-009eeb472cd4?sig=xxxxxxxse=2024-07-11T14%3A59%3A26Z&sv=2019-02-02&spr=https&sp=r&sr=b (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

Versions

requests                      2.32.3
urllib3                       2.2.2
databricks-sql-python  main
@kravets-levko
Copy link
Contributor

@rth Can you please enable debug logging and share the log?

import logging
from databricks import sql

logging.basicConfig(level=logging.DEBUG)

# other your code

Also, can you try to access that failed link using wget/curl/browser? I'm curious if there are some SSL issues on server, or if that's something on our side

@kravets-levko
Copy link
Contributor

kravets-levko commented Jul 11, 2024

+ additional question: do you use any kind of proxy, firewall, VPN, or something that may affect ssl cert validation?

@kravets-levko
Copy link
Contributor

+ if you are able to path databricks/sql on your machine - can you try this? locate this file and line - https://github.com/databricks/databricks-sql-python/blob/main/src/databricks/sql/cloudfetch/downloader.py#L96 - and add a verify=False parameter

@rth
Copy link
Author

rth commented Jul 11, 2024

Thanks for your feedback @kravets-levko !

Yes, I'm behind a corporate proxy that does SSL cerificate rewrites, so by itself SSLError s are expected.
If I modify downloader.py#L96 to add verify=False actually it works even for a big query that previously failed. It's just confusing because I though I disabled SSL verification since smaller queries worked fine.

Any chance you could allows users to disable ssl verification in that section without editing the code? For instance either via the _tls_no_verify (or ssl_verify) parameter passed to connect or even via monkeypatching some object in databricks.sql.cloudfetch.downloader if option a) would be difficult.

If still relevant some of the debug logs in the first case where it failed are below,

DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_min given_or_default 1.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_max given_or_default 60.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_stop_after_attempts_count given_or_default 30
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_default given_or_default 5.0
DEBUG:databricks.sql.thrift_backend:Sending request: OpenSession(<REDACTED>)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.14.azuredatabricks.net:443
DEBUG:urllib3.connectionpool:https://xxxx.azuredatabricks.net:443 "POST /sql/protocolv1/o/xxxx/0605-141634-2g07xge9 HTTP/11" 200 171
DEBUG:databricks.sql.thrift_backend:Received response: TOpenSessionResp(<REDACTED>)
INFO:databricks.sql.client:Successfully opened session 37f9a404-d4f2-4f21-aafc-464e03cf22e0
DEBUG:databricks.sql.thrift_backend:Sending request: ExecuteStatement(<REDACTED>)
DEBUG:urllib3.connectionpool:https://xxxx.azuredatabricks.net:443 "POST /sql/protocolv1/o/xxxxx/0605-141634-2g07xge9 HTTP/11" 200 14371
DEBUG:databricks.sql.thrift_backend:Received response: TExecuteStatementResp(<REDACTED>)
DEBUG:databricks.sql.utils:Initialize CloudFetch loader, row set start offset: 0, file list:
DEBUG:databricks.sql.utils:- start row offset: 0, row count: 49152
DEBUG:databricks.sql.utils:- start row offset: 49152, row count: 49152
DEBUG:databricks.sql.utils:- start row offset: 98304, row count: 22480
DEBUG:databricks.sql.utils:- start row offset: 120784, row count: 49152
DEBUG:databricks.sql.utils:- start row offset: 169936, row count: 49152
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: adding file link, start offset 0, row count: 49152
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: adding file link, start offset 49152, row count: 49152
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: adding file link, start offset 98304, row count: 22480
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: adding file link, start offset 120784, row count: 49152
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: adding file link, start offset 169936, row count: 49152
DEBUG:databricks.sql.utils:CloudFetchQueue: Trying to get downloaded file for row 0
DEBUG:databricks.sql.cloudfetch.download_manager:ResultFileDownloadManager: schedule downloads
DEBUG:databricks.sql.cloudfetch.download_manager:- start: 0, row count: 49152
DEBUG:databricks.sql.cloudfetch.downloader:ResultSetDownloadHandler: starting file download, offset 0, row count 49152
DEBUG:databricks.sql.cloudfetch.download_manager:- start: 49152, row count: 49152
DEBUG:databricks.sql.cloudfetch.downloader:ResultSetDownloadHandler: starting file download, offset 49152, row count 49152
DEBUG:databricks.sql.cloudfetch.download_manager:- start: 98304, row count: 22480
DEBUG:databricks.sql.cloudfetch.downloader:ResultSetDownloadHandler: starting file download, offset 98304, row count 22480
DEBUG:databricks.sql.cloudfetch.download_manager:- start: 120784, row count: 49152
DEBUG:databricks.sql.cloudfetch.downloader:ResultSetDownloadHandler: starting file download, offset 120784, row count 49152
DEBUG:databricks.sql.cloudfetch.download_manager:- start: 169936, row count: 49152
DEBUG:databricks.sql.cloudfetch.downloader:ResultSetDownloadHandler: starting file download, offset 169936, row count 49152
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxx.blob.core.windows.net:443
DEBUG:urllib3.util.retry:Incremented Retry for (url='/jobs/xxxx/sql/2024-07-11/15/results_2024-07-11T15:40:05Z_05e1b6a6-3439-440c-8522-b428f40d3b6f?sig=xxx%2FxxtrcDCT8U%3D&se=2024-07-11T15%3A55%3A07Z&sv=2019-02-02&spr=https&sp=r&sr=b'): Retry(total=4, connect=None, read=None, redirect=None, status=None)

@kravets-levko
Copy link
Contributor

The thing is that for smaller results CloudFetch is not used, that's why you were able to get result. Thank you for your help and all the feedback you provide, and also I really appreciate that you were able to help me with debugging. PR will come in a minute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants