Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with using http proxy #22

Closed
andresebastian-moelle-ext opened this issue Jul 25, 2022 · 22 comments · Fixed by #81
Closed

Issues with using http proxy #22

andresebastian-moelle-ext opened this issue Jul 25, 2022 · 22 comments · Fixed by #81
Assignees

Comments

@andresebastian-moelle-ext

Hi there,

i am not able to use the sql.connect() with a http proxy:

os.environ['http_proxy'] = f"http://{user}:{pwd}@10.185.190.100:8080"
os.environ['https_proxy'] = f"http://{user}:{pwd}@10.185.190.100:8080"
self.connection = sql.connect(server_hostname, http_path, access_token)

I am getting the following error:

File "C:\Users\User\Documents\GitHub\eve_data_ingestion\3.8.10 64 bit\lib\site-packages\databricks\sql\__init__.py", line 48, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
  File "C:\Users\User\Documents\GitHub\eve_data_ingestion\3.8.10 64 bit\lib\site-packages\databricks\sql\client.py", line 109, in __init__
    self.thrift_backend = ThriftBackend(self.host, self.port, http_path,
  File "C:\Users\User\Documents\GitHub\eve_data_ingestion\3.8.10 64 bit\lib\site-packages\databricks\sql\thrift_backend.py", line 115, in __init__
    self._transport = thrift.transport.THttpClient.THttpClient(
  File "C:\Users\User\Documents\GitHub\eve_data_ingestion\3.8.10 64 bit\lib\site-packages\thrift\transport\THttpClient.py", line 86, in __init__
    self.proxy_auth = self.basic_proxy_auth_header(parsed)
  File "C:\Users\User\Documents\GitHub\eve_data_ingestion\3.8.10 64 bit\lib\site-packages\thrift\transport\THttpClient.py", line 101, in basic_proxy_auth_header
    cr = base64.b64encode(ap).strip()
  File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)

TypeError: a bytes-like object is required, not 'str'

Is this a known issue? Is there another way to use proxies?


Python 3.8.10 (64bit Windows)
databricks-sql-connector 2.0.2

@susodapop
Copy link
Contributor

Proxies aren't supported yet. We have an internal ticket to implement soon.

Related: databricks/dbt-databricks#111

@susodapop susodapop self-assigned this Jul 25, 2022
@andresebastian-moelle-ext
Copy link
Author

@susodapop Thanks for the quick answer! Is there any workaround possible at the moment? Or any eta when this will be implemented? :)

@susodapop
Copy link
Contributor

I don't know of a workaround.

ETA: within four weeks 👌

@susodapop
Copy link
Contributor

Quick update for anyone watching this issue: the work was postponed slightly. We're targeting September to release it.

@MRehanMS
Copy link

I know this sounds impatient - but is a September release still on the cards?

@susodapop
Copy link
Contributor

Not impatient at all! Thanks for the ping. Still targeting for this month.

@vijayinani
Copy link

Is this fixed? I am not able to continue development because of proxy not working in Python SQL connector.

@other-ode
Copy link

Not impatient at all! Thanks for the ping. Still targeting for this month.

Please, when is the release date for the fix?

@bilalaslamseattle
Copy link

Commenting since @susodapop is out of office -- we're still working on this.

@MRehanMS
Copy link

MRehanMS commented Nov 12, 2022 via email

@bilalaslamseattle
Copy link

@MRehanMS let me ask the team and find out the status.

@bilalaslamseattle
Copy link

@MRehanMS @susodapop is looking into this right now. Please expect a reply in the near future.

@susodapop
Copy link
Contributor

Just picking this up after the holiday. More soon

@susodapop
Copy link
Contributor

susodapop commented Nov 29, 2022

Thanks everyone for your patience.

tl;dr proxies should already work with databricks-sql-connector. I've provided an example. But proxies that use auth are blocked by apache/thrift#2565.

Background

Originally I wrote the following:

Proxies aren't supported yet. We have an internal ticket to implement soon.

I was wrong. databricks-sql-connector already supports proxies because the underlying thrift does (see here). As @andresebastian-moelle-ext noticed, it's important to specify your proxy as an environment variable in a way that can be detected by Python's urllib.request.getproxies() method (defined here).

Example using a proxy

I made a fresh virtual environment and installed the connector and pproxy, a package which lets us make a simple proxy for testing:

$ pip install databricks-sql-connector pproxy

Then I made a Python script that sets the HTTPS_PROXY environment variable and connects to Databricks:

# ~/db.py
import os
from databricks import sql

os.environ["HTTPS_PROXY"] = "https://localhost:8080"

host="***.cloud.databricks.com"
http_path="/sql/1.0/warehouses/***"
access_token="dapi***"


connection = sql.connect(
  server_hostname=host,
  http_path=http_path,
  access_token=access_token)


cursor = connection.cursor()

cursor.execute('SELECT * FROM RANGE(10)')
result = cursor.fetchall()
for row in result:
  print(row)

cursor.close()
connection.close()

In one terminal window I ran:

$ pproxy -v
Serving on :8080 by http,socks4,socks5 

And in a separate terminal window I ran:

$ python db.py
Row(id=0)
Row(id=1)
Row(id=2)
Row(id=3)
Row(id=4)
Row(id=5)
Row(id=6)
Row(id=7)
Row(id=8)
Row(id=9)

And in my first terminal window I see the following:

$ pproxy -v
Serving on :8080 by http,socks4,socks5 
http ::1:56259 -> ***.cloud.databricks.com:443
http ::1:56261 -> ***.cloud.databricks.com:443
http ::1:56263 -> ***.cloud.databricks.com:443

Issue RCA

So why does @andresebastian-moelle-ext 's example not work? I can reproduce it with a small change to my db.py script above by setting the following:

os.environ["HTTPS_PROXY"] = "https://user:pass@localhost:8080"

And then running it:

python db.py
Traceback (most recent call last):
  File "./db.py", line 11, in <module>
    connection = sql.connect(
  File "./venv/lib/python3.10/site-packages/databricks/sql/__init__.py", line 50, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
  File "./venv/lib/python3.10/site-packages/databricks/sql/client.py", line 170, in __init__
    self.thrift_backend = ThriftBackend(
  File "./venv/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 146, in __init__
    self._transport = databricks.sql.auth.thrift_http_client.THttpClient(
  File "./venv/lib/python3.10/site-packages/databricks/sql/auth/thrift_http_client.py", line 21, in __init__
    super().__init__(
  File "./venv/lib/python3.10/site-packages/thrift/transport/THttpClient.py", line 86, in __init__
    self.proxy_auth = self.basic_proxy_auth_header(parsed)
  File "./venv/lib/python3.10/site-packages/thrift/transport/THttpClient.py", line 102, in basic_proxy_auth_header
    cr = base64.b64encode(ap).strip()#.decode("utf-8")
  File "../.pyenv/versions/3.10.5/lib/python3.10/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'

This isn't a problem with the proxy specification. The issue is within thrift, in its basic_proxy_auth_header function, where it attempts to concatenate a Python string ("Basic ") with a bytes-like object output by the base64 library.

    def basic_proxy_auth_header(proxy):
        if proxy is None or not proxy.username:
            return None
        ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
                        urllib.parse.unquote(proxy.password))
        cr = base64.b64encode(ap).strip()
        return "Basic " + cr

If I fix this in thrift, then the connection including username and password works. This is a bug in the thrift implementation and a pull request was opened to address it here: apache/thrift#2565. That pull request is stalled because it's lacking a unit test. I'll see if we can provide one to get it merged.

@susodapop
Copy link
Contributor

susodapop commented Jan 12, 2023

After discussing this with @andrefurlan-db and @rcypher-databricks, we realised that we don't need to wait for thrift to solve this upstream because we are already subclassing their THttpClient class in our library. We've implemented the fix in #81. This will be released as part of databricks-sql-connector 2.3.1 in the coming days.

@susodapop
Copy link
Contributor

Hey @andresebastian-moelle-ext or @MRehanMS are either of you willing to help us test that this fix works? We have a preview build you can install pip install databricks-sql-connector==2.3.1.dev1 and check that your proxies work as expected?

@mbdtesting
Copy link

Hi there, I'm trying to use this python package to connect to Databricks and at the company I need a proxy with authentication. The proxy configuration is: http://<company-domain>\<username>:<password>@<company-proxy>:8080.
I tried the quick example shown by @susodapop with preview build 2.3.1-dev1 but I always got the error: Error during request to server: Tunnel connection failed: 407 authenticationrequired. Could you please help me with the configuration?

@susodapop
Copy link
Contributor

@mbdtesting This doesn't seem to be an issue with our connector. See this stackoverflow question with the exact error you pasted.

The issue may be that your corporate proxy needs extra HTTP headers included in requests from our connector. You can set these when you create your Client by passing a dictionary of http_headers. Since the exception is coming from your proxy it looks like Python is already routing the request correctly. You need to learn from the proxy server what part of the authentication flow has failed.

@mbdtesting
Copy link

@mbdtesting This doesn't seem to be an issue with our connector. See this stackoverflow question with the exact error you pasted.

The issue may be that your corporate proxy needs extra HTTP headers included in requests from our connector. You can set these when you create your Client by passing a dictionary of http_headers. Since the exception is coming from your proxy it looks like Python is already routing the request correctly. You need to learn from the proxy server what part of the authentication flow has failed.

Thank you so much for your help, analyzing the problem with the IT I found out that our proxy required a NTLM proxy for authentication. Looking on github I found out how to manage the connection using px. Once px script is started (with my proxy configuration), then, I can connect to databricks using the following proxy configuration: os.environ["HTTPS_PROXY"] = http://localhost:<px_chosen_port> (same example reported in this thread).

@bohrasaurabh
Copy link

bohrasaurabh commented Mar 10, 2023

@susodapop I again see this below stack trace on 2.4.0 release however 2.3.1.dev release works great with proxy...

requirements.txt

dash-design-kit==1.6.7
dash==2.5.1
gunicorn==20.0.4
pandas==1.3.5
numpy==1.21.1
databricks-sql-connector==2.4.0
requests==2.28.2

Error in execution:

i am here
Traceback (most recent call last):
  File "my.py", line 85, in <module>
    access_token='dapiMY_FULL_TOKEN')
  File "/app/.heroku/python/lib/python3.7/site-packages/databricks/sql/__init__.py", line 50, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
  File "/app/.heroku/python/lib/python3.7/site-packages/databricks/sql/client.py", line 186, in __init__
    **kwargs,
  File "/app/.heroku/python/lib/python3.7/site-packages/databricks/sql/thrift_backend.py", line 152, in __init__
    ssl_context=ssl_context,
  File "/app/.heroku/python/lib/python3.7/site-packages/databricks/sql/auth/thrift_http_client.py", line 22, in __init__
    uri_or_host, port, path, cafile, cert_file, key_file, ssl_context
  File "/app/.heroku/python/lib/python3.7/site-packages/thrift/transport/THttpClient.py", line 86, in __init__
    self.proxy_auth = self.basic_proxy_auth_header(parsed)
  File "/app/.heroku/python/lib/python3.7/site-packages/thrift/transport/THttpClient.py", line 101, in basic_proxy_auth_header
    cr = base64.b64encode(ap).strip()
  File "/app/.heroku/python/lib/python3.7/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'

@susodapop
Copy link
Contributor

Yep that's to be expected @bohrasaurabh because the PR hasn't merged.

@jakewski
Copy link

any way to make this possible per connection rather than using an environment variable? e.g. a multithreaded web server where we don't want to override the proxy for other traffic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants