-
Notifications
You must be signed in to change notification settings - Fork 0
aplustools.web
TBA
class aplustools.web.request.AioHttpRequestHandler
Bases:
objectUnified request handler that uses aiohttp for asynchronous HTTP requests and integrates with AsyncIOResult to handle multiple async requests concurrently. Public-facing methods are synchronous, while the internal logic is fully async.
__init__() -> None
request(url: str, method: str = 'GET', data: Any = None) -> bytes
Public-facing synchronous method to submit a request. Internally runs the async code.request_many(urls: list[str], method: str = 'GET', data: Any = None) -> list[bytes]
Public-facing synchronous method to submit multiple requests. Internally runs the async code.shutdown() -> None
Shutdown the request handler and close the session if it exists.
class aplustools.web.request.BatchRequestHandler(min_workers: int = 2, max_workers: int = 10, workers_step: int = 5, check_interval: float = 5.0)
Bases:
objectA request handler that submits HTTP requests using a dynamic thread pool executor. It supports both synchronous and asynchronous modes for handling single or multiple requests at once.
Configurations known to work well are:
- (2, 10, 5), 5.0 (Diamond, testing 0.47->0.81 seconds for 10 web requests; 0.9->1.3 seconds for 20 web requests)
- ({min_workers e.g. 5}, {big n like 100}, 10), 2.0 (Gold, testing 0.7->1.2 seconds for 200 web requests)
- ({min_workers e.g. 5}, {big n like 100}, 5), 1.0 (Silver, testing 0.8->1.6 seconds for 200 web requests)
_poolA dynamically sized thread pool for executing tasks.
Type: _LazyDynamicThreadPoolExecutor __init__(min_workers: int = 2, max_workers: int = 10, workers_step: int = 5, check_interval: float = 5.0) -> None
Initializes the request handler with a dynamic thread pool.
Parameters:
- min_workers – The minimum number of worker threads to maintain in the pool.
- max_workers – The maximum number of worker threads allowed in the pool.
- workers_step – The increment or decrement in the number of workers during pool resizing.
- check_interval – The time interval, in seconds, to wait before checking if the pool needs resizing based on current load.
property current_size: intReturns the current size of the dynamic thread pool.
Returns: Number of currently active threads in the pool.
property pool: `LazyDynamicThreadPoolExecutor <aplustools.io#aplustools.io.concurrency.LazyDynamicThreadPoolExecutor>`_Returns the internal pool instance. This is done, so you don’t need a pool exclusively for web requests. :return: _LazyDynamicThreadPoolExecutorrequest(url: str, async_mode: bool = False, method: str = 'GET', data: Any = None) -> Future | bytes
Submits an HTTP request. Supports both synchronous and asynchronous execution.
Parameters:
- url – The URL to make the request to.
- async_mode – Whether to run the request in asynchronous mode (returns a future).
- method – The HTTP method to use (default is “GET”).
- data – Any additional data to be passed with the request.
Returns: A future if in async mode, or the result bytes in synchronous mode.
request_many(urls: list[str], async_mode: bool = False, method: str = 'GET', data: Any = None) -> `Result <#aplustools.web.request.Result>`_ | list[bytes]
Submits multiple requests concurrently. Supports both synchronous and asynchronous execution.
Parameters:
- urls – A list of URLs to make requests to.
- async_mode – Whether to run the requests in asynchronous mode (returns a Result object).
- method – The HTTP method to use for all requests (default is “GET”).
- data – Any additional data to be passed with the request.
Returns: A Result object in async mode or a list of bytes in synchronous mode.
shutdown() -> None
Shuts down the thread pool gracefully, waiting for all threads to finish their tasks.
class aplustools.web.request.Result(futures: list[Future])
Bases:
objectA class to handle a list of futures, allowing the user to process results once the futures are completed. It supports transformations on results and different modes for collecting them.
futuresA list of future objects representing asynchronous tasks.
Type: list[_concurrent_futures.Future]
no_into_resultsA list initialized to store raw results, which can later be populated.
Type: list[_ty.Any]
_into_doneA flag to indicate whether an ‘into’ method has been used.
Type: bool __init__(futures: list[Future])
Initializes the Result class with a list of future objects.
Parameters: futures – A list of futures representing asynchronous operations. await_() -> Self
Waits for all futures to finish by blocking until they are done.
Returns: Self, after all futures have completed. into(container: list[Callable[[Any], None] | Type]) -> Self
Process each result as it finishes and apply the transformation specified by the container’s elements (e.g., str, json.loads, etc.) to the result.
Parameters: container – A list of callable transformations (e.g., str, json.loads) or types. Returns: Self, to allow for chaining. Raises: ValueError – If the length of the container doesn’t match the number of futures or if another ‘into’ method has already been called. no_into() -> Self
Collects raw results and stores them in the null_results list without any transformation.
Returns: Self, to allow for chaining. Raises: ValueError – If another ‘into’ method has already been called.
aplustools.web.request.fetch(url: str) -> bytes
Fetch a web resource from the specified URL.
Parameters: url (str) – The URL of the web resource to fetch. Returns: The fetched resource as raw bytes. Return type: bytes
TBA
class aplustools.web.utils.WebPage(url: str)
Bases:
objectTBA
__init__(url: str) -> None
static check_url(url: str, timeout: float = 2.0) -> int | None
TBA, returns status code, None if unclearfetch_page(timeout: float = 2.0) -> None
TBAfrom_soup(func_name: str, *args: Any, **kwargs: Any) -> Any | None
TBAstatic generate_user_agent() -> str
TBAget_by_class(class_name: str) -> list[Tag] | None
TBAget_by_tag(tag: str) -> list[Tag] | None
TBAstatic is_crawlable(url: str, timeout: float = 2.0) -> bool | None
TBA, if unclear returns None
aplustools.web.utils.url_validator(url: str) -> bool
TBA
TBA