You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background dlt has built in request client that we use mostly in our verified sources. we need to refine the retries and timeouts based on production behavior
Tasks
make the connection and retry timeout configurable. look at RunConfigurationrequest_timeout. let's use it. change the default connection timeout to 1 minute
make sure the session retries on connection timeouts
make sure the session retries on read timeouts
make sure the session retries on broken connections
Details
Actual problems on production
connection timeout (30 secs too short). (not sure we retry but we should)
broken connections with exception below
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Resources: 0/1 (0.0%) | Time: 14059.22s | Rate: 0.00/s
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ticket_metric_events: 2425 | Time: 14051.34s | Rate: 0.17/s
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ValueError: invalid literal for int() with base 16: b''
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - During handling of the above exception, another exception occurred:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 444, in _error_catcher
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - yield
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 828, in read_chunked
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self._update_chunk_length()
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 765, in _update_chunk_length
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise InvalidChunkLength(self, line)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - During handling of the above exception, another exception occurred:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 816, in generate
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - yield from self.raw.stream(chunk_size, decode_content=True)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 624, in stream
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - for line in self.read_chunked(amt, decode_content=decode_content):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 816, in read_chunked
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - with self._error_catcher():
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self.gen.throw(typ, value, traceback)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/urllib3/response.py", line 461, in _error_catcher
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise ProtocolError("Connection broken: %r" % e, e)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - During handling of the above exception, another exception occurred:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/pipe.py", line 702, in _get_source_item_current
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - item = next(gen)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/dags/zendesk/__init__.py", line 348, in ticket_metric_table
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - yield from metric_event_pages
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/dags/zendesk/helpers/talk_api.py", line 131, in get_pages_incremental
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - yield from self.get_pages(
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/dags/zendesk/helpers/talk_api.py", line 88, in get_pages
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - response = client.get(
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/sources/helpers/requests/retry.py", line 183, in <lambda>
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self.get = lambda *a, **kw: self.session.get(*a, **kw)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return self.request("GET", url, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return self(f, *args, **kw)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - do = self.iter(retry_state=retry_state)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 314, in iter
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return fut.result()
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return self.__get_result()
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise self._exception
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - result = fn(*args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/sources/helpers/requests/session.py", line 36, in request
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - resp = super().request(*args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - resp = self.send(prep, **send_kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 747, in send
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - r.content
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 899, in content
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 818, in generate
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise ChunkedEncodingError(e)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - The above exception was the direct cause of the following exception:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 273, in extract
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self._extract_source(storage, source, max_parallel_items, workers)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 823, in _extract_source
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - extract_id = extract_with_schema(storage, source, source_schema, self.collector, max_parallel_items, workers)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/extract.py", line 171, in extract_with_schema
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - extractor = extract(extract_id, source, storage, collector, max_parallel_items=max_parallel_items, workers=workers)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/extract.py", line 111, in extract
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - for pipe_item in pipes:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/pipe.py", line 525, in __next__
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - pipe_item = self._get_source_item()
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/pipe.py", line 686, in _get_source_item
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return self._get_source_item_current()
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/extract/pipe.py", line 720, in _get_source_item_current
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise ResourceExtractionError(pipe.name, gen, str(ex), "generator") from ex
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - dlt.extract.exceptions.ResourceExtractionError: In processing pipe ticket_metric_events: extraction of resource ticket_metric_events in generator ticket_metric_table caused an exception: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - The above exception was the direct cause of the following exception:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/dags/zendesk_by_resource.py", line 39, in <module>
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - resource_task(source_name, source_function, resource)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/dags/zendesk_by_resource.py", line 33, in resource_task
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - load_info = pipeline.run(
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 104, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - step_info = f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 138, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 448, in run
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - self.extract(data, table_name=table_name, write_disposition=write_disposition, columns=columns, primary_key=primary_key, schema=schema)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 104, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - step_info = f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 78, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - rv = f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 64, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 138, in _wrap
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - return f(self, *args, **kwargs)
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - ^^^^^^^^^^^^^^^^^^^^^^^^
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - File "/usr/local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 282, in extract
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - raise PipelineStepFailed(self, "extract", exc, ExtractInfo()) from exc
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract with exception:
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO -
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - <class 'dlt.extract.exceptions.ResourceExtractionError'>
[2023-07-04, 15:34:02 UTC] {pod_manager.py:235} INFO - In processing pipe ticket_metric_events: extraction of resource ticket_metric_events in generator ticket_metric_table caused an exception: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
The text was updated successfully, but these errors were encountered:
Background
dlt
has built in request client that we use mostly in our verified sources. we need to refine the retries and timeouts based on production behaviorTasks
RunConfiguration
request_timeout
. let's use it. change the default connection timeout to 1 minuteDetails
Actual problems on production
The text was updated successfully, but these errors were encountered: