Skip to content

aplustools.web

github-actions[bot] edited this page Jul 17, 2025 · 2 revisions

aplustools.web package

Submodules

aplustools.web.request module

TBA

class aplustools.web.request.AioHttpRequestHandler

Bases: object

Unified request handler that uses aiohttp for asynchronous HTTP requests and integrates with AsyncIOResult to handle multiple async requests concurrently. Public-facing methods are synchronous, while the internal logic is fully async.

__init__() -> None

request(url: str, method: str = 'GET', data: Any = None) -> bytes

Public-facing synchronous method to submit a request. Internally runs the async code.

request_many(urls: list[str], method: str = 'GET', data: Any = None) -> list[bytes]

Public-facing synchronous method to submit multiple requests. Internally runs the async code.

shutdown() -> None

Shutdown the request handler and close the session if it exists.

class aplustools.web.request.BatchRequestHandler(min_workers: int = 2, max_workers: int = 10, workers_step: int = 5, check_interval: float = 5.0)

Bases: object

A request handler that submits HTTP requests using a dynamic thread pool executor. It supports both synchronous and asynchronous modes for handling single or multiple requests at once.

Configurations known to work well are:

  • (2, 10, 5), 5.0 (Diamond, testing 0.47->0.81 seconds for 10 web requests; 0.9->1.3 seconds for 20 web requests)
  • ({min_workers e.g. 5}, {big n like 100}, 10), 2.0 (Gold, testing 0.7->1.2 seconds for 200 web requests)
  • ({min_workers e.g. 5}, {big n like 100}, 5), 1.0 (Silver, testing 0.8->1.6 seconds for 200 web requests)

_pool

A dynamically sized thread pool for executing tasks.

Type: _LazyDynamicThreadPoolExecutor

__init__(min_workers: int = 2, max_workers: int = 10, workers_step: int = 5, check_interval: float = 5.0) -> None

Initializes the request handler with a dynamic thread pool.

Parameters:
  • min_workers – The minimum number of worker threads to maintain in the pool.
  • max_workers – The maximum number of worker threads allowed in the pool.
  • workers_step – The increment or decrement in the number of workers during pool resizing.
  • check_interval – The time interval, in seconds, to wait before checking if the pool needs resizing based on current load.

property current_size: int

Returns the current size of the dynamic thread pool.

Returns: Number of currently active threads in the pool.

property pool: `LazyDynamicThreadPoolExecutor <aplustools.io#aplustools.io.concurrency.LazyDynamicThreadPoolExecutor>`_

Returns the internal pool instance. This is done, so you don’t need a pool exclusively for web requests. :return: _LazyDynamicThreadPoolExecutor

request(url: str, async_mode: bool = False, method: str = 'GET', data: Any = None) -> Future | bytes

Submits an HTTP request. Supports both synchronous and asynchronous execution.

Parameters:
  • url – The URL to make the request to.
  • async_mode – Whether to run the request in asynchronous mode (returns a future).
  • method – The HTTP method to use (default is “GET”).
  • data – Any additional data to be passed with the request.
Returns:

A future if in async mode, or the result bytes in synchronous mode.

request_many(urls: list[str], async_mode: bool = False, method: str = 'GET', data: Any = None) -> `Result <#aplustools.web.request.Result>`_ | list[bytes]

Submits multiple requests concurrently. Supports both synchronous and asynchronous execution.

Parameters:
  • urls – A list of URLs to make requests to.
  • async_mode – Whether to run the requests in asynchronous mode (returns a Result object).
  • method – The HTTP method to use for all requests (default is “GET”).
  • data – Any additional data to be passed with the request.
Returns:

A Result object in async mode or a list of bytes in synchronous mode.

shutdown() -> None

Shuts down the thread pool gracefully, waiting for all threads to finish their tasks.

class aplustools.web.request.Result(futures: list[Future])

Bases: object

A class to handle a list of futures, allowing the user to process results once the futures are completed. It supports transformations on results and different modes for collecting them.

futures

A list of future objects representing asynchronous tasks.

Type: list[_concurrent_futures.Future]

no_into_results

A list initialized to store raw results, which can later be populated.

Type: list[_ty.Any]

_into_done

A flag to indicate whether an ‘into’ method has been used.

Type: bool

__init__(futures: list[Future])

Initializes the Result class with a list of future objects.

Parameters: futures – A list of futures representing asynchronous operations.

await_() -> Self

Waits for all futures to finish by blocking until they are done.

Returns: Self, after all futures have completed.

into(container: list[Callable[[Any], None] | Type]) -> Self

Process each result as it finishes and apply the transformation specified by the container’s elements (e.g., str, json.loads, etc.) to the result.

Parameters: container – A list of callable transformations (e.g., str, json.loads) or types.
Returns: Self, to allow for chaining.
Raises: ValueError – If the length of the container doesn’t match the number of futures or if another ‘into’ method has already been called.

no_into() -> Self

Collects raw results and stores them in the null_results list without any transformation.

Returns: Self, to allow for chaining.
Raises: ValueError – If another ‘into’ method has already been called.

aplustools.web.request.fetch(url: str) -> bytes

Fetch a web resource from the specified URL.

Parameters: url (str) – The URL of the web resource to fetch.
Returns: The fetched resource as raw bytes.
Return type: bytes

aplustools.web.utils module

TBA

class aplustools.web.utils.WebPage(url: str)

Bases: object

TBA

__init__(url: str) -> None

static check_url(url: str, timeout: float = 2.0) -> int | None

TBA, returns status code, None if unclear

fetch_page(timeout: float = 2.0) -> None

TBA

from_soup(func_name: str, *args: Any, **kwargs: Any) -> Any | None

TBA

static generate_user_agent() -> str

TBA

get_by_class(class_name: str) -> list[Tag] | None

TBA

get_by_tag(tag: str) -> list[Tag] | None

TBA

static is_crawlable(url: str, timeout: float = 2.0) -> bool | None

TBA, if unclear returns None

aplustools.web.utils.url_validator(url: str) -> bool

TBA

Module contents

TBA

Clone this wiki locally