-
Notifications
You must be signed in to change notification settings - Fork 528
Closed
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Issue: TimeoutErrors are handled gracefully coming from Playwright but not from asyncio#wait_for call. It makes Crawlee engine stops.
Expectation: All TimeoutErrors should be handled gracefully
Crawlee version: 1.1.1
Python version: 3.14
Stacktrace:
[crawlee.crawlers._playwright._playwright_crawler] ERROR An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated.
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_context_pipeline.py", line 114, in __call__
await final_context_consumer(cast('TCrawlingContext', crawling_context))
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\router.py", line 98, in __call__
return await self._default_handler(context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\xxx\src\xxx\crawl\shopee\timeout_repro.py", line 13, in default_handler
await asyncio.wait_for(never_future, timeout=5)
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 506, in wait_for
async with timeouts.timeout(timeout):
~~~~~~~~~~~~~~~~^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\timeouts.py", line 116, in __aexit__
raise TimeoutError from exc_val
TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1415, in __run_task_function
await self._run_request_handler(context=context)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1510, in _run_request_handler
await wait_for(
...<5 lines>...
)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_utils\wait.py", line 37, in wait_for
return await asyncio.wait_for(operation(), timeout.total_seconds())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_context_pipeline.py", line 120, in __call__
raise RequestHandlerError(e, crawling_context) from e
crawlee.errors.RequestHandlerError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1158, in _handle_request_error
await wait_for(
...<5 lines>...
)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_utils\wait.py", line 37, in wait_for
return await asyncio.wait_for(operation(), timeout.total_seconds())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1128, in _handle_request_retries
f'{get_one_line_error_summary_if_possible(error)}'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_logging_utils.py", line 52, in get_one_line_error_summary_if_possible
most_relevant_part = ',' + reduce_asyncio_timeout_error_to_relevant_traceback_parts(error)[-1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^
IndexError: list index out of range
[crawlee._autoscaling.autoscaled_pool] INFO Waiting for remaining tasks to finish
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_context_pipeline.py", line 114, in __call__
await final_context_consumer(cast('TCrawlingContext', crawling_context))
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\router.py", line 98, in __call__
return await self._default_handler(context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\xxx\src\xxx\crawl\shopee\timeout_repro.py", line 13, in default_handler
await asyncio.wait_for(never_future, timeout=5)
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 506, in wait_for
async with timeouts.timeout(timeout):
~~~~~~~~~~~~~~~~^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\timeouts.py", line 116, in __aexit__
raise TimeoutError from exc_val
TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1415, in __run_task_function
await self._run_request_handler(context=context)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1510, in _run_request_handler
await wait_for(
...<5 lines>...
)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_utils\wait.py", line 37, in wait_for
return await asyncio.wait_for(operation(), timeout.total_seconds())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_context_pipeline.py", line 120, in __call__
raise RequestHandlerError(e, crawling_context) from e
crawlee.errors.RequestHandlerError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\projects\xxx\src\xxx\crawl\shopee\timeout_repro.py", line 20, in <module>
asyncio.run(main())
~~~~~~~~~~~^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\runners.py", line 195, in run
return runner.run(main)
~~~~~~~~~~^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\base_events.py", line 725, in run_until_complete
return future.result()
~~~~~~~~~~~~~^^
File "C:\projects\xxx\src\xxx\crawl\shopee\timeout_repro.py", line 16, in main
await crawler.run(["https://browserleaks.com/"])
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 719, in run
await run_task
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 774, in _run_crawler
await self._autoscaled_pool.run()
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_autoscaling\autoscaled_pool.py", line 126, in run
await run.result
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_autoscaling\autoscaled_pool.py", line 277, in _worker_task
await asyncio.wait_for(
...<2 lines>...
)
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1449, in __run_task_function
await self._handle_request_error(primary_error.crawling_context, primary_error.wrapped_exception)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1158, in _handle_request_error
await wait_for(
...<5 lines>...
)
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\_utils\wait.py", line 37, in wait_for
return await asyncio.wait_for(operation(), timeout.total_seconds())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python313\Lib\asyncio\tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_basic_crawler.py", line 1128, in _handle_request_retries
f'{get_one_line_error_summary_if_possible(error)}'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
File "C:\projects\xxx\.venv\Lib\site-packages\crawlee\crawlers\_basic\_logging_utils.py", line 52, in get_one_line_error_summary_if_possible
most_relevant_part = ',' + reduce_asyncio_timeout_error_to_relevant_traceback_parts(error)[-1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^
IndexError: list index out of range
Repro code:
import asyncio
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
async def main():
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def default_handler(ctx: PlaywrightCrawlingContext):
ctx.log.info("Request %s", ctx.request.url)
never_future = asyncio.get_running_loop().create_future()
await asyncio.wait_for(never_future, timeout=5)
ctx.log.info("Never finished %s", ctx.request.url)
await crawler.run(["https://browserleaks.com/"])
if __name__ == '__main__':
asyncio.run(main())Metadata
Metadata
Assignees
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.