Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_entities API might be throttled #206

Open
xiaoyongzhu opened this issue Jun 11, 2022 · 4 comments
Open

search_entities API might be throttled #206

xiaoyongzhu opened this issue Jun 11, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@xiaoyongzhu
Copy link
Contributor

Describe the bug
When there's a large amount of purview entities (currently we have > 20K), the search_entities API might have this error:

[1:54 PM] Xiaoyong Zhu

Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 700, in urlopen
self._prepare_proxy(conn)
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 994, in _prepare_proxy
conn.connect()
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/opt/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)


During handling of the above exception, another exception occurred:


Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='feathrazuretest3-purview1.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

Seems in the code, the search_entities API keeps calling the query API, and looks like purview has some enforced throttling in the backend which results this error.

@amiket23
Copy link
Contributor

We have been facing the same issue. We are dealing with similar amount of entities i.e. entities > 20k

@microcassidy
Copy link

Could you please supply your method parameters. I believe the discovery endpoint has a limit of 10k per page so it's normal to expect paging in the API.

@amiket23
Copy link
Contributor

amiket23 commented Jul 5, 2022

These are my supply parameters "("*", search_filter=filter_setup, limit=1000, starting_offset=0)". I noticed that irrespective of what supply parameters are given the generator object actually returns all the values so I have been turning the generator object directly to a list and getting all 18K entities at once. Here is the error message I received:

Traceback (most recent call last):
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 1040, in validate_conn
conn.connect()
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl
.py", line 449, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl
.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1041, in _create
self.do_handshake()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1310, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='datapurviewprod.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 437, in
upload_entities(inputfile_path, environment_type, subfolder)
File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 304, in upload_entities
field_typename_df = build_df(field_typename)
File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 61, in build_df
result = list(search)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\pyapacheatlas\core\discovery\purview.py", line 233, in _search_generator
results = self.query(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\pyapacheatlas\core\discovery\purview.py", line 163, in query
postResult = requests.post(
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\adapters.py", line 563, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='datapurviewprod.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

@microcassidy
Copy link

.disovery.search_entities returns a generator that paginates. The limit you are specifying is the HTTP pagesize, not the number of returned results from exhausting the iterator.

I was unable to replicate your error and I made >20K individual requests to an endpoint. Are you going through a proxy? A quick google of your error brought up a related issue with the az cli. Perhaps look into your proxy config and library versions of the dependencies.

preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

#Azure/azure-cli#19456 #

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants