diff --git a/docs/02_concepts/01_actor_lifecycle.mdx b/docs/02_concepts/01_actor_lifecycle.mdx index 3918cc8cd..40826305a 100644 --- a/docs/02_concepts/01_actor_lifecycle.mdx +++ b/docs/02_concepts/01_actor_lifecycle.mdx @@ -7,6 +7,7 @@ description: How an Apify Actor starts, runs, and shuts down, including context import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import ApiLink from '@theme/ApiLink'; import ClassContextExample from '!!raw-loader!roa-loader!./code/01_class_context.py'; import ClassManualExample from '!!raw-loader!roa-loader!./code/01_class_manual.py'; @@ -26,7 +27,7 @@ This guide explains how an **Apify Actor** starts, runs, and shuts down, describ During initialization, the SDK prepares all the components required to integrate with the Apify platform. It loads configuration from environment variables, initializes access to platform storages such as the [key-value store, dataset, and request queue](https://docs.apify.com/platform/storage), sets up event handling for [platform events](https://docs.apify.com/platform/integrations/webhooks/events), and configures logging. -The recommended approach in Python is to use the global [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events. +The recommended approach in Python is to use the global `Actor` class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events. When the Actor exits, either normally or due to an exception, the SDK performs a graceful shutdown. It persists the final Actor state, stops event handling, and sets the terminal exit code together with the [status message](https://docs.apify.com/platform/actors/development/programming-interface/status-messages). @@ -43,9 +44,9 @@ When the Actor exits, either normally or due to an exception, the SDK performs a -You can also create an [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are: +You can also create an `Actor` instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are: -- `configuration` — a custom [`Configuration`](https://docs.apify.com/sdk/python/reference/class/Configuration) instance to control storage paths, API URLs, and other settings. +- `configuration` — a custom `Configuration` instance to control storage paths, API URLs, and other settings. - `configure_logging` — whether to set up default logging configuration (default `True`). Set to `False` if you configure logging yourself. - `exit_process` — whether the Actor calls `sys.exit()` when the context manager exits. Defaults to `True`, except in IPython, Pytest, and Scrapy environments. - `event_listeners_timeout` — maximum time to wait for Actor event listeners to complete before exiting. @@ -72,18 +73,18 @@ Good error handling lets your Actor fail fast on critical errors, retry transien The SDK provides helper methods for explicit control: -- [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) - terminates the run successfully (default exit code 0). -- [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) - marks the run as failed (default exit code 1). +- `Actor.exit` - terminates the run successfully (default exit code 0). +- `Actor.fail` - marks the run as failed (default exit code 1). Any non-zero exit code is treated as a `FAILED` run. You rarely need to call these methods directly unless you want to perform a controlled shutdown or customize the exit behavior. -Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) perform the same cleanup, so complete any long-running persistence before calling them. +Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both `Actor.exit` and `Actor.fail` perform the same cleanup, so complete any long-running persistence before calling them. Below is a minimal context-manager example where an unhandled exception automatically fails the run, followed by a manual pattern giving you more control. {ErrorHandlingContextExample} -If you need explicit control over exit codes or status messages, you can manage the Actor manually using [`Actor.init`](https://docs.apify.com/sdk/python/reference/class/Actor#init), [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit), and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail). +If you need explicit control over exit codes or status messages, you can manage the Actor manually using `Actor.init`, `Actor.exit`, and `Actor.fail`. {ErrorHandlingManualExample} @@ -105,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid ## Conclusion -This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the [reference docs](https://docs.apify.com/sdk/python/reference), [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform). +This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the reference docs, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform). diff --git a/docs/02_concepts/02_actor_input.mdx b/docs/02_concepts/02_actor_input.mdx index 8aa44720b..15807c05d 100644 --- a/docs/02_concepts/02_actor_input.mdx +++ b/docs/02_concepts/02_actor_input.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import InputExample from '!!raw-loader!roa-loader!./code/02_input.py'; import RequestListExample from '!!raw-loader!roa-loader!./code/02_request_list.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store). diff --git a/docs/02_concepts/03_storages.mdx b/docs/02_concepts/03_storages.mdx index dd3010280..059682b2b 100644 --- a/docs/02_concepts/03_storages.mdx +++ b/docs/02_concepts/03_storages.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import OpeningStoragesExample from '!!raw-loader!roa-loader!./code/03_opening_storages.py'; import OpeningStoragesAliasExample from '!!raw-loader!roa-loader!./code/03_opening_storages_alias.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import DeletingStoragesExample from '!!raw-loader!roa-loader!./code/03_deleting_storages.py'; import DatasetReadWriteExample from '!!raw-loader!roa-loader!./code/03_dataset_read_write.py'; import DatasetExportsExample from '!!raw-loader!roa-loader!./code/03_dataset_exports.py'; diff --git a/docs/02_concepts/04_actor_events.mdx b/docs/02_concepts/04_actor_events.mdx index 924a43a3d..3dd4cd8b3 100644 --- a/docs/02_concepts/04_actor_events.mdx +++ b/docs/02_concepts/04_actor_events.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ActorEventsExample from '!!raw-loader!roa-loader!./code/04_actor_events.py'; import UseStateExample from '!!raw-loader!roa-loader!./code/04_use_state.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; During its runtime, the Actor receives Actor events sent by the Apify platform or generated by the Apify SDK itself. diff --git a/docs/02_concepts/05_proxy_management.mdx b/docs/02_concepts/05_proxy_management.mdx index 400696a07..d60763b5c 100644 --- a/docs/02_concepts/05_proxy_management.mdx +++ b/docs/02_concepts/05_proxy_management.mdx @@ -14,7 +14,7 @@ import CustomProxyFunctionExample from '!!raw-loader!roa-loader!./code/05_custom import ProxyActorInputExample from '!!raw-loader!roa-loader!./code/05_proxy_actor_input.py'; import ProxyHttpxExample from '!!raw-loader!roa-loader!./code/05_proxy_httpx.py'; import TieredProxyExample from '!!raw-loader!roa-loader!./code/05_tiered_proxy.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Apify SDK provides built-in proxy management through the `ProxyConfiguration` class, supporting both [Apify Proxy](https://apify.com/proxy) and custom proxy servers. Proxies are essential for web scraping to avoid [IP address blocking](https://en.wikipedia.org/wiki/IP_address_blocking) and distribute requests across multiple addresses. diff --git a/docs/02_concepts/06_interacting_with_other_actors.mdx b/docs/02_concepts/06_interacting_with_other_actors.mdx index 19e272373..cb28d7cf9 100644 --- a/docs/02_concepts/06_interacting_with_other_actors.mdx +++ b/docs/02_concepts/06_interacting_with_other_actors.mdx @@ -11,7 +11,7 @@ import InteractingCallExample from '!!raw-loader!roa-loader!./code/06_interactin import InteractingCallTaskExample from '!!raw-loader!roa-loader!./code/06_interacting_call_task.py'; import InteractingMetamorphExample from '!!raw-loader!roa-loader!./code/06_interacting_metamorph.py'; import InteractingAbortExample from '!!raw-loader!roa-loader!./code/06_interacting_abort.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Apify SDK lets you start, call, and transform (metamorph) other Actors directly from your Actor code. This is useful for composing complex workflows from smaller, reusable Actors. diff --git a/docs/02_concepts/07_webhooks.mdx b/docs/02_concepts/07_webhooks.mdx index 25ff1c4d6..5016d8a8a 100644 --- a/docs/02_concepts/07_webhooks.mdx +++ b/docs/02_concepts/07_webhooks.mdx @@ -6,7 +6,7 @@ description: Set up webhooks to trigger actions when Actor run events occur. import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import WebhookExample from '!!raw-loader!roa-loader!./code/07_webhook.py'; import WebhookPreventingExample from '!!raw-loader!roa-loader!./code/07_webhook_preventing.py'; diff --git a/docs/02_concepts/08_access_apify_api.mdx b/docs/02_concepts/08_access_apify_api.mdx index 050b7e6be..5c4ba7ae7 100644 --- a/docs/02_concepts/08_access_apify_api.mdx +++ b/docs/02_concepts/08_access_apify_api.mdx @@ -6,7 +6,7 @@ description: Use the built-in Apify API client to access platform features not c import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import ActorClientExample from '!!raw-loader!roa-loader!./code/08_actor_client.py'; import ActorNewClientExample from '!!raw-loader!roa-loader!./code/08_actor_new_client.py'; diff --git a/docs/02_concepts/09_logging.mdx b/docs/02_concepts/09_logging.mdx index 0fe482ee0..b2066053a 100644 --- a/docs/02_concepts/09_logging.mdx +++ b/docs/02_concepts/09_logging.mdx @@ -6,7 +6,7 @@ description: Configure log levels, formatting, and log redirection between Actor import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import LogConfigExample from '!!raw-loader!roa-loader!./code/09_log_config.py'; import LoggerUsageExample from '!!raw-loader!roa-loader!./code/09_logger_usage.py'; import RedirectLog from '!!raw-loader!roa-loader!./code/09_redirect_log.py'; diff --git a/docs/02_concepts/10_configuration.mdx b/docs/02_concepts/10_configuration.mdx index fea05911e..6296b864b 100644 --- a/docs/02_concepts/10_configuration.mdx +++ b/docs/02_concepts/10_configuration.mdx @@ -9,7 +9,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ConfigExample from '!!raw-loader!roa-loader!./code/10_config.py'; import GetEnvExample from '!!raw-loader!roa-loader!./code/10_get_env.py'; import PlatformDetectionExample from '!!raw-loader!roa-loader!./code/10_platform_detection.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The `Actor` class is configured through the `Configuration` class, which reads its settings from environment variables. When running on the Apify platform or through the Apify CLI, configuration is automatic — manual setup is only needed for custom requirements. diff --git a/docs/02_concepts/11_pay_per_event.mdx b/docs/02_concepts/11_pay_per_event.mdx index e0fcfe226..d8c04443e 100644 --- a/docs/02_concepts/11_pay_per_event.mdx +++ b/docs/02_concepts/11_pay_per_event.mdx @@ -8,7 +8,7 @@ import ActorChargeSource from '!!raw-loader!roa-loader!./code/11_actor_charge.py import ConditionalActorChargeSource from '!!raw-loader!roa-loader!./code/11_conditional_actor_charge.py'; import ChargeLimitCheckSource from '!!raw-loader!roa-loader!./code/11_charge_limit_check.py'; import AdvancedChargingExample from '!!raw-loader!roa-loader!./code/11_advanced_charging.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; Apify provides several [pricing models](https://docs.apify.com/platform/actors/publishing/monetize) for monetizing your Actors. The most recent and most flexible one is [pay-per-event](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event), which lets you charge your users programmatically directly from your Actor. As the name suggests, you may charge the users each time a specific event occurs, for example a call to an external API or when you return a result. diff --git a/docs/03_guides/06_scrapy.mdx b/docs/03_guides/06_scrapy.mdx index 5af4d6a50..12525609a 100644 --- a/docs/03_guides/06_scrapy.mdx +++ b/docs/03_guides/06_scrapy.mdx @@ -7,6 +7,7 @@ description: Convert Scrapy spiders into Apify Actors with platform storage and import CodeBlock from '@theme/CodeBlock'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import ApiLink from '@theme/ApiLink'; import UnderscoreMainExample from '!!raw-loader!./code/scrapy_project/src/__main__.py'; import MainExample from '!!raw-loader!./code/scrapy_project/src/main.py'; @@ -42,10 +43,10 @@ Within the Actor's main coroutine, the Actor's input is processed as usual. The The Apify SDK provides several custom components to support integration with the Apify platform: -- [`apify.scrapy.ApifyScheduler`](https://docs.apify.com/sdk/python/reference/class/ApifyScheduler) - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. -- [`apify.scrapy.ActorDatasetPushPipeline`](https://docs.apify.com/sdk/python/reference/class/ActorDatasetPushPipeline) - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. -- [`apify.scrapy.ApifyHttpProxyMiddleware`](https://docs.apify.com/sdk/python/reference/class/ApifyHttpProxyMiddleware) - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. -- [`apify.scrapy.extensions.ApifyCacheStorage`](https://docs.apify.com/sdk/python/reference/class/ApifyCacheStorage) - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. +- `apify.scrapy.ApifyScheduler` - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. +- `apify.scrapy.ActorDatasetPushPipeline` - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. +- `apify.scrapy.ApifyHttpProxyMiddleware` - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. +- `apify.scrapy.extensions.ApifyCacheStorage` - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. Additional helper functions in the [`apify.scrapy`](https://github.com/apify/apify-sdk-python/tree/master/src/apify/scrapy) subpackage include: diff --git a/website/src/components/ApiLink.jsx b/website/src/components/ApiLink.jsx deleted file mode 100644 index 44bba3529..000000000 --- a/website/src/components/ApiLink.jsx +++ /dev/null @@ -1,10 +0,0 @@ -import React from 'react'; -import Link from '@docusaurus/Link'; - -const ApiLink = ({ to, children }) => { - return ( - {children} - ); -}; - -export default ApiLink; diff --git a/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx b/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx index feb858b91..4dbecb546 100644 --- a/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx +++ b/website/versioned_docs/version-2.7/02_guides/05_scrapy.mdx @@ -6,6 +6,7 @@ title: Using Scrapy import CodeBlock from '@theme/CodeBlock'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import ApiLink from '@theme/ApiLink'; import UnderscoreMainExample from '!!raw-loader!./code/scrapy_project/src/__main__.py'; import MainExample from '!!raw-loader!./code/scrapy_project/src/main.py'; @@ -41,10 +42,10 @@ Within the Actor's main coroutine, the Actor's input is processed as usual. The The Apify SDK provides several custom components to support integration with the Apify platform: -- [`apify.scrapy.ApifyScheduler`](https://docs.apify.com/sdk/python/reference/class/ApifyScheduler) - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. -- [`apify.scrapy.ActorDatasetPushPipeline`](https://docs.apify.com/sdk/python/reference/class/ActorDatasetPushPipeline) - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. -- [`apify.scrapy.ApifyHttpProxyMiddleware`](https://docs.apify.com/sdk/python/reference/class/ApifyHttpProxyMiddleware) - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. -- [`apify.scrapy.extensions.ApifyCacheStorage`](https://docs.apify.com/sdk/python/reference/class/ApifyCacheStorage) - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. +- `apify.scrapy.ApifyScheduler` - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. +- `apify.scrapy.ActorDatasetPushPipeline` - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. +- `apify.scrapy.ApifyHttpProxyMiddleware` - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. +- `apify.scrapy.extensions.ApifyCacheStorage` - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. Additional helper functions in the [`apify.scrapy`](https://github.com/apify/apify-sdk-python/tree/master/src/apify/scrapy) subpackage include: diff --git a/website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx b/website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx index 0d8edbbcd..6a80927c0 100644 --- a/website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx +++ b/website/versioned_docs/version-2.7/03_concepts/11_pay_per_event.mdx @@ -6,7 +6,7 @@ description: Monetize your Actors using the pay-per-event pricing model import ActorChargeSource from '!!raw-loader!./code/actor_charge.py'; import ConditionalActorChargeSource from '!!raw-loader!./code/conditional_actor_charge.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import CodeBlock from '@theme/CodeBlock'; Apify provides several [pricing models](https://docs.apify.com/platform/actors/publishing/monetize) for monetizing your Actors. The most recent and most flexible one is [pay-per-event](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event), which lets you charge your users programmatically directly from your Actor. As the name suggests, you may charge the users each time a specific event occurs, for example a call to an external API or when you return a result. diff --git a/website/versioned_docs/version-3.3/02_concepts/01_actor_lifecycle.mdx b/website/versioned_docs/version-3.3/02_concepts/01_actor_lifecycle.mdx index 3918cc8cd..40826305a 100644 --- a/website/versioned_docs/version-3.3/02_concepts/01_actor_lifecycle.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/01_actor_lifecycle.mdx @@ -7,6 +7,7 @@ description: How an Apify Actor starts, runs, and shuts down, including context import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import ApiLink from '@theme/ApiLink'; import ClassContextExample from '!!raw-loader!roa-loader!./code/01_class_context.py'; import ClassManualExample from '!!raw-loader!roa-loader!./code/01_class_manual.py'; @@ -26,7 +27,7 @@ This guide explains how an **Apify Actor** starts, runs, and shuts down, describ During initialization, the SDK prepares all the components required to integrate with the Apify platform. It loads configuration from environment variables, initializes access to platform storages such as the [key-value store, dataset, and request queue](https://docs.apify.com/platform/storage), sets up event handling for [platform events](https://docs.apify.com/platform/integrations/webhooks/events), and configures logging. -The recommended approach in Python is to use the global [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events. +The recommended approach in Python is to use the global `Actor` class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events. When the Actor exits, either normally or due to an exception, the SDK performs a graceful shutdown. It persists the final Actor state, stops event handling, and sets the terminal exit code together with the [status message](https://docs.apify.com/platform/actors/development/programming-interface/status-messages). @@ -43,9 +44,9 @@ When the Actor exits, either normally or due to an exception, the SDK performs a -You can also create an [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are: +You can also create an `Actor` instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are: -- `configuration` — a custom [`Configuration`](https://docs.apify.com/sdk/python/reference/class/Configuration) instance to control storage paths, API URLs, and other settings. +- `configuration` — a custom `Configuration` instance to control storage paths, API URLs, and other settings. - `configure_logging` — whether to set up default logging configuration (default `True`). Set to `False` if you configure logging yourself. - `exit_process` — whether the Actor calls `sys.exit()` when the context manager exits. Defaults to `True`, except in IPython, Pytest, and Scrapy environments. - `event_listeners_timeout` — maximum time to wait for Actor event listeners to complete before exiting. @@ -72,18 +73,18 @@ Good error handling lets your Actor fail fast on critical errors, retry transien The SDK provides helper methods for explicit control: -- [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) - terminates the run successfully (default exit code 0). -- [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) - marks the run as failed (default exit code 1). +- `Actor.exit` - terminates the run successfully (default exit code 0). +- `Actor.fail` - marks the run as failed (default exit code 1). Any non-zero exit code is treated as a `FAILED` run. You rarely need to call these methods directly unless you want to perform a controlled shutdown or customize the exit behavior. -Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) perform the same cleanup, so complete any long-running persistence before calling them. +Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both `Actor.exit` and `Actor.fail` perform the same cleanup, so complete any long-running persistence before calling them. Below is a minimal context-manager example where an unhandled exception automatically fails the run, followed by a manual pattern giving you more control. {ErrorHandlingContextExample} -If you need explicit control over exit codes or status messages, you can manage the Actor manually using [`Actor.init`](https://docs.apify.com/sdk/python/reference/class/Actor#init), [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit), and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail). +If you need explicit control over exit codes or status messages, you can manage the Actor manually using `Actor.init`, `Actor.exit`, and `Actor.fail`. {ErrorHandlingManualExample} @@ -105,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid ## Conclusion -This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the [reference docs](https://docs.apify.com/sdk/python/reference), [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform). +This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the reference docs, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform). diff --git a/website/versioned_docs/version-3.3/02_concepts/02_actor_input.mdx b/website/versioned_docs/version-3.3/02_concepts/02_actor_input.mdx index 8aa44720b..15807c05d 100644 --- a/website/versioned_docs/version-3.3/02_concepts/02_actor_input.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/02_actor_input.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import InputExample from '!!raw-loader!roa-loader!./code/02_input.py'; import RequestListExample from '!!raw-loader!roa-loader!./code/02_request_list.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store). diff --git a/website/versioned_docs/version-3.3/02_concepts/03_storages.mdx b/website/versioned_docs/version-3.3/02_concepts/03_storages.mdx index dd3010280..059682b2b 100644 --- a/website/versioned_docs/version-3.3/02_concepts/03_storages.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/03_storages.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import OpeningStoragesExample from '!!raw-loader!roa-loader!./code/03_opening_storages.py'; import OpeningStoragesAliasExample from '!!raw-loader!roa-loader!./code/03_opening_storages_alias.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import DeletingStoragesExample from '!!raw-loader!roa-loader!./code/03_deleting_storages.py'; import DatasetReadWriteExample from '!!raw-loader!roa-loader!./code/03_dataset_read_write.py'; import DatasetExportsExample from '!!raw-loader!roa-loader!./code/03_dataset_exports.py'; diff --git a/website/versioned_docs/version-3.3/02_concepts/04_actor_events.mdx b/website/versioned_docs/version-3.3/02_concepts/04_actor_events.mdx index 924a43a3d..3dd4cd8b3 100644 --- a/website/versioned_docs/version-3.3/02_concepts/04_actor_events.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/04_actor_events.mdx @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ActorEventsExample from '!!raw-loader!roa-loader!./code/04_actor_events.py'; import UseStateExample from '!!raw-loader!roa-loader!./code/04_use_state.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; During its runtime, the Actor receives Actor events sent by the Apify platform or generated by the Apify SDK itself. diff --git a/website/versioned_docs/version-3.3/02_concepts/05_proxy_management.mdx b/website/versioned_docs/version-3.3/02_concepts/05_proxy_management.mdx index a0effccee..fef7ca8c8 100644 --- a/website/versioned_docs/version-3.3/02_concepts/05_proxy_management.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/05_proxy_management.mdx @@ -14,7 +14,7 @@ import CustomProxyFunctionExample from '!!raw-loader!roa-loader!./code/05_custom import ProxyActorInputExample from '!!raw-loader!roa-loader!./code/05_proxy_actor_input.py'; import ProxyHttpxExample from '!!raw-loader!roa-loader!./code/05_proxy_httpx.py'; import TieredProxyExample from '!!raw-loader!roa-loader!./code/05_tiered_proxy.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Apify SDK provides built-in proxy management through the `ProxyConfiguration` class, supporting both [Apify Proxy](https://apify.com/proxy) and custom proxy servers. Proxies are essential for web scraping to avoid [IP address blocking](https://en.wikipedia.org/wiki/IP_address_blocking) and distribute requests across multiple addresses. diff --git a/website/versioned_docs/version-3.3/02_concepts/06_interacting_with_other_actors.mdx b/website/versioned_docs/version-3.3/02_concepts/06_interacting_with_other_actors.mdx index 19e272373..cb28d7cf9 100644 --- a/website/versioned_docs/version-3.3/02_concepts/06_interacting_with_other_actors.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/06_interacting_with_other_actors.mdx @@ -11,7 +11,7 @@ import InteractingCallExample from '!!raw-loader!roa-loader!./code/06_interactin import InteractingCallTaskExample from '!!raw-loader!roa-loader!./code/06_interacting_call_task.py'; import InteractingMetamorphExample from '!!raw-loader!roa-loader!./code/06_interacting_metamorph.py'; import InteractingAbortExample from '!!raw-loader!roa-loader!./code/06_interacting_abort.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The Apify SDK lets you start, call, and transform (metamorph) other Actors directly from your Actor code. This is useful for composing complex workflows from smaller, reusable Actors. diff --git a/website/versioned_docs/version-3.3/02_concepts/07_webhooks.mdx b/website/versioned_docs/version-3.3/02_concepts/07_webhooks.mdx index 25ff1c4d6..5016d8a8a 100644 --- a/website/versioned_docs/version-3.3/02_concepts/07_webhooks.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/07_webhooks.mdx @@ -6,7 +6,7 @@ description: Set up webhooks to trigger actions when Actor run events occur. import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import WebhookExample from '!!raw-loader!roa-loader!./code/07_webhook.py'; import WebhookPreventingExample from '!!raw-loader!roa-loader!./code/07_webhook_preventing.py'; diff --git a/website/versioned_docs/version-3.3/02_concepts/08_access_apify_api.mdx b/website/versioned_docs/version-3.3/02_concepts/08_access_apify_api.mdx index 050b7e6be..5c4ba7ae7 100644 --- a/website/versioned_docs/version-3.3/02_concepts/08_access_apify_api.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/08_access_apify_api.mdx @@ -6,7 +6,7 @@ description: Use the built-in Apify API client to access platform features not c import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import ActorClientExample from '!!raw-loader!roa-loader!./code/08_actor_client.py'; import ActorNewClientExample from '!!raw-loader!roa-loader!./code/08_actor_new_client.py'; diff --git a/website/versioned_docs/version-3.3/02_concepts/09_logging.mdx b/website/versioned_docs/version-3.3/02_concepts/09_logging.mdx index 0fe482ee0..b2066053a 100644 --- a/website/versioned_docs/version-3.3/02_concepts/09_logging.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/09_logging.mdx @@ -6,7 +6,7 @@ description: Configure log levels, formatting, and log redirection between Actor import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import LogConfigExample from '!!raw-loader!roa-loader!./code/09_log_config.py'; import LoggerUsageExample from '!!raw-loader!roa-loader!./code/09_logger_usage.py'; import RedirectLog from '!!raw-loader!roa-loader!./code/09_redirect_log.py'; diff --git a/website/versioned_docs/version-3.3/02_concepts/10_configuration.mdx b/website/versioned_docs/version-3.3/02_concepts/10_configuration.mdx index fea05911e..6296b864b 100644 --- a/website/versioned_docs/version-3.3/02_concepts/10_configuration.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/10_configuration.mdx @@ -9,7 +9,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ConfigExample from '!!raw-loader!roa-loader!./code/10_config.py'; import GetEnvExample from '!!raw-loader!roa-loader!./code/10_get_env.py'; import PlatformDetectionExample from '!!raw-loader!roa-loader!./code/10_platform_detection.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; The `Actor` class is configured through the `Configuration` class, which reads its settings from environment variables. When running on the Apify platform or through the Apify CLI, configuration is automatic — manual setup is only needed for custom requirements. diff --git a/website/versioned_docs/version-3.3/02_concepts/11_pay_per_event.mdx b/website/versioned_docs/version-3.3/02_concepts/11_pay_per_event.mdx index e0fcfe226..d8c04443e 100644 --- a/website/versioned_docs/version-3.3/02_concepts/11_pay_per_event.mdx +++ b/website/versioned_docs/version-3.3/02_concepts/11_pay_per_event.mdx @@ -8,7 +8,7 @@ import ActorChargeSource from '!!raw-loader!roa-loader!./code/11_actor_charge.py import ConditionalActorChargeSource from '!!raw-loader!roa-loader!./code/11_conditional_actor_charge.py'; import ChargeLimitCheckSource from '!!raw-loader!roa-loader!./code/11_charge_limit_check.py'; import AdvancedChargingExample from '!!raw-loader!roa-loader!./code/11_advanced_charging.py'; -import ApiLink from '@site/src/components/ApiLink'; +import ApiLink from '@theme/ApiLink'; import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; Apify provides several [pricing models](https://docs.apify.com/platform/actors/publishing/monetize) for monetizing your Actors. The most recent and most flexible one is [pay-per-event](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event), which lets you charge your users programmatically directly from your Actor. As the name suggests, you may charge the users each time a specific event occurs, for example a call to an external API or when you return a result. diff --git a/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx b/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx index 5af4d6a50..12525609a 100644 --- a/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx +++ b/website/versioned_docs/version-3.3/03_guides/06_scrapy.mdx @@ -7,6 +7,7 @@ description: Convert Scrapy spiders into Apify Actors with platform storage and import CodeBlock from '@theme/CodeBlock'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import ApiLink from '@theme/ApiLink'; import UnderscoreMainExample from '!!raw-loader!./code/scrapy_project/src/__main__.py'; import MainExample from '!!raw-loader!./code/scrapy_project/src/main.py'; @@ -42,10 +43,10 @@ Within the Actor's main coroutine, the Actor's input is processed as usual. The The Apify SDK provides several custom components to support integration with the Apify platform: -- [`apify.scrapy.ApifyScheduler`](https://docs.apify.com/sdk/python/reference/class/ApifyScheduler) - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. -- [`apify.scrapy.ActorDatasetPushPipeline`](https://docs.apify.com/sdk/python/reference/class/ActorDatasetPushPipeline) - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. -- [`apify.scrapy.ApifyHttpProxyMiddleware`](https://docs.apify.com/sdk/python/reference/class/ApifyHttpProxyMiddleware) - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. -- [`apify.scrapy.extensions.ApifyCacheStorage`](https://docs.apify.com/sdk/python/reference/class/ApifyCacheStorage) - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. +- `apify.scrapy.ApifyScheduler` - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests. +- `apify.scrapy.ActorDatasetPushPipeline` - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset. +- `apify.scrapy.ApifyHttpProxyMiddleware` - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service. +- `apify.scrapy.extensions.ApifyCacheStorage` - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work. Additional helper functions in the [`apify.scrapy`](https://github.com/apify/apify-sdk-python/tree/master/src/apify/scrapy) subpackage include: