Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/02_concepts/01_actor_lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ description: How an Apify Actor starts, runs, and shuts down, including context
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import ApiLink from '@theme/ApiLink';

import ClassContextExample from '!!raw-loader!roa-loader!./code/01_class_context.py';
import ClassManualExample from '!!raw-loader!roa-loader!./code/01_class_manual.py';
Expand All @@ -26,7 +27,7 @@ This guide explains how an **Apify Actor** starts, runs, and shuts down, describ

During initialization, the SDK prepares all the components required to integrate with the Apify platform. It loads configuration from environment variables, initializes access to platform storages such as the [key-value store, dataset, and request queue](https://docs.apify.com/platform/storage), sets up event handling for [platform events](https://docs.apify.com/platform/integrations/webhooks/events), and configures logging.

The recommended approach in Python is to use the global [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events.
The recommended approach in Python is to use the global <ApiLink to="class/Actor">`Actor`</ApiLink> class as an asynchronous context manager. This approach automatically manages setup and teardown and keeps your code concise. When entering the context, the SDK loads configuration and initializes clients lazily—for example, a dataset is opened only when it is first accessed. If the Actor runs on the Apify platform, it also begins listening for platform events.

When the Actor exits, either normally or due to an exception, the SDK performs a graceful shutdown. It persists the final Actor state, stops event handling, and sets the terminal exit code together with the [status message](https://docs.apify.com/platform/actors/development/programming-interface/status-messages).

Expand All @@ -43,9 +44,9 @@ When the Actor exits, either normally or due to an exception, the SDK performs a
</TabItem>
</Tabs>

You can also create an [`Actor`](https://docs.apify.com/sdk/python/reference/class/Actor) instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are:
You can also create an <ApiLink to="class/Actor">`Actor`</ApiLink> instance directly. This does not change its capabilities but allows you to specify optional parameters during initialization. The key parameters are:

- `configuration` — a custom [`Configuration`](https://docs.apify.com/sdk/python/reference/class/Configuration) instance to control storage paths, API URLs, and other settings.
- `configuration` — a custom <ApiLink to="class/Configuration">`Configuration`</ApiLink> instance to control storage paths, API URLs, and other settings.
- `configure_logging` — whether to set up default logging configuration (default `True`). Set to `False` if you configure logging yourself.
- `exit_process` — whether the Actor calls `sys.exit()` when the context manager exits. Defaults to `True`, except in IPython, Pytest, and Scrapy environments.
- `event_listeners_timeout` — maximum time to wait for Actor event listeners to complete before exiting.
Expand All @@ -72,18 +73,18 @@ Good error handling lets your Actor fail fast on critical errors, retry transien

The SDK provides helper methods for explicit control:

- [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) - terminates the run successfully (default exit code 0).
- [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) - marks the run as failed (default exit code 1).
- <ApiLink to="class/Actor#exit">`Actor.exit`</ApiLink> - terminates the run successfully (default exit code 0).
- <ApiLink to="class/Actor#fail">`Actor.fail`</ApiLink> - marks the run as failed (default exit code 1).

Any non-zero exit code is treated as a `FAILED` run. You rarely need to call these methods directly unless you want to perform a controlled shutdown or customize the exit behavior.

Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit) and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail) perform the same cleanup, so complete any long-running persistence before calling them.
Catch exceptions only when necessary - for example, to retry network timeouts or map specific errors to exit codes. Keep retry loops bounded with backoff and re-raise once exhausted. Make your processing idempotent so that restarts don't corrupt results. Both <ApiLink to="class/Actor#exit">`Actor.exit`</ApiLink> and <ApiLink to="class/Actor#fail">`Actor.fail`</ApiLink> perform the same cleanup, so complete any long-running persistence before calling them.

Below is a minimal context-manager example where an unhandled exception automatically fails the run, followed by a manual pattern giving you more control.

<RunnableCodeBlock className="language-python" language="python">{ErrorHandlingContextExample}</RunnableCodeBlock>

If you need explicit control over exit codes or status messages, you can manage the Actor manually using [`Actor.init`](https://docs.apify.com/sdk/python/reference/class/Actor#init), [`Actor.exit`](https://docs.apify.com/sdk/python/reference/class/Actor#exit), and [`Actor.fail`](https://docs.apify.com/sdk/python/reference/class/Actor#fail).
If you need explicit control over exit codes or status messages, you can manage the Actor manually using <ApiLink to="class/Actor#init">`Actor.init`</ApiLink>, <ApiLink to="class/Actor#exit">`Actor.exit`</ApiLink>, and <ApiLink to="class/Actor#fail">`Actor.fail`</ApiLink>.

<RunnableCodeBlock className="language-python" language="python">{ErrorHandlingManualExample}</RunnableCodeBlock>

Expand All @@ -105,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid

## Conclusion

This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the [reference docs](https://docs.apify.com/sdk/python/reference), [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="">reference docs</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
2 changes: 1 addition & 1 deletion docs/02_concepts/02_actor_input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import InputExample from '!!raw-loader!roa-loader!./code/02_input.py';
import RequestListExample from '!!raw-loader!roa-loader!./code/02_request_list.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';

The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store).

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/03_storages.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import OpeningStoragesExample from '!!raw-loader!roa-loader!./code/03_opening_storages.py';
import OpeningStoragesAliasExample from '!!raw-loader!roa-loader!./code/03_opening_storages_alias.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';
import DeletingStoragesExample from '!!raw-loader!roa-loader!./code/03_deleting_storages.py';
import DatasetReadWriteExample from '!!raw-loader!roa-loader!./code/03_dataset_read_write.py';
import DatasetExportsExample from '!!raw-loader!roa-loader!./code/03_dataset_exports.py';
Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/04_actor_events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import ActorEventsExample from '!!raw-loader!roa-loader!./code/04_actor_events.py';
import UseStateExample from '!!raw-loader!roa-loader!./code/04_use_state.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';

During its runtime, the Actor receives Actor events sent by the Apify platform or generated by the Apify SDK itself.

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/05_proxy_management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import CustomProxyFunctionExample from '!!raw-loader!roa-loader!./code/05_custom
import ProxyActorInputExample from '!!raw-loader!roa-loader!./code/05_proxy_actor_input.py';
import ProxyHttpxExample from '!!raw-loader!roa-loader!./code/05_proxy_httpx.py';
import TieredProxyExample from '!!raw-loader!roa-loader!./code/05_tiered_proxy.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';

The Apify SDK provides built-in proxy management through the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class, supporting both [Apify Proxy](https://apify.com/proxy) and custom proxy servers. Proxies are essential for web scraping to avoid [IP address blocking](https://en.wikipedia.org/wiki/IP_address_blocking) and distribute requests across multiple addresses.

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/06_interacting_with_other_actors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import InteractingCallExample from '!!raw-loader!roa-loader!./code/06_interactin
import InteractingCallTaskExample from '!!raw-loader!roa-loader!./code/06_interacting_call_task.py';
import InteractingMetamorphExample from '!!raw-loader!roa-loader!./code/06_interacting_metamorph.py';
import InteractingAbortExample from '!!raw-loader!roa-loader!./code/06_interacting_abort.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';

The Apify SDK lets you start, call, and transform (metamorph) other Actors directly from your Actor code. This is useful for composing complex workflows from smaller, reusable Actors.

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/07_webhooks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Set up webhooks to trigger actions when Actor run events occur.

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';
import WebhookExample from '!!raw-loader!roa-loader!./code/07_webhook.py';
import WebhookPreventingExample from '!!raw-loader!roa-loader!./code/07_webhook_preventing.py';

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/08_access_apify_api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Use the built-in Apify API client to access platform features not c

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';
import ActorClientExample from '!!raw-loader!roa-loader!./code/08_actor_client.py';
import ActorNewClientExample from '!!raw-loader!roa-loader!./code/08_actor_new_client.py';

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/09_logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Configure log levels, formatting, and log redirection between Actor

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';
import LogConfigExample from '!!raw-loader!roa-loader!./code/09_log_config.py';
import LoggerUsageExample from '!!raw-loader!roa-loader!./code/09_logger_usage.py';
import RedirectLog from '!!raw-loader!roa-loader!./code/09_redirect_log.py';
Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/10_configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import ConfigExample from '!!raw-loader!roa-loader!./code/10_config.py';
import GetEnvExample from '!!raw-loader!roa-loader!./code/10_get_env.py';
import PlatformDetectionExample from '!!raw-loader!roa-loader!./code/10_platform_detection.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';

The <ApiLink to="class/Actor">`Actor`</ApiLink> class is configured through the <ApiLink to="class/Configuration">`Configuration`</ApiLink> class, which reads its settings from environment variables. When running on the Apify platform or through the Apify CLI, configuration is automatic — manual setup is only needed for custom requirements.

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/11_pay_per_event.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import ActorChargeSource from '!!raw-loader!roa-loader!./code/11_actor_charge.py
import ConditionalActorChargeSource from '!!raw-loader!roa-loader!./code/11_conditional_actor_charge.py';
import ChargeLimitCheckSource from '!!raw-loader!roa-loader!./code/11_charge_limit_check.py';
import AdvancedChargingExample from '!!raw-loader!roa-loader!./code/11_advanced_charging.py';
import ApiLink from '@site/src/components/ApiLink';
import ApiLink from '@theme/ApiLink';
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

Apify provides several [pricing models](https://docs.apify.com/platform/actors/publishing/monetize) for monetizing your Actors. The most recent and most flexible one is [pay-per-event](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event), which lets you charge your users programmatically directly from your Actor. As the name suggests, you may charge the users each time a specific event occurs, for example a call to an external API or when you return a result.
Expand Down
9 changes: 5 additions & 4 deletions docs/03_guides/06_scrapy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ description: Convert Scrapy spiders into Apify Actors with platform storage and
import CodeBlock from '@theme/CodeBlock';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import ApiLink from '@theme/ApiLink';

import UnderscoreMainExample from '!!raw-loader!./code/scrapy_project/src/__main__.py';
import MainExample from '!!raw-loader!./code/scrapy_project/src/main.py';
Expand Down Expand Up @@ -42,10 +43,10 @@ Within the Actor's main coroutine, the Actor's input is processed as usual. The

The Apify SDK provides several custom components to support integration with the Apify platform:

- [`apify.scrapy.ApifyScheduler`](https://docs.apify.com/sdk/python/reference/class/ApifyScheduler) - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests.
- [`apify.scrapy.ActorDatasetPushPipeline`](https://docs.apify.com/sdk/python/reference/class/ActorDatasetPushPipeline) - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset.
- [`apify.scrapy.ApifyHttpProxyMiddleware`](https://docs.apify.com/sdk/python/reference/class/ApifyHttpProxyMiddleware) - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service.
- [`apify.scrapy.extensions.ApifyCacheStorage`](https://docs.apify.com/sdk/python/reference/class/ApifyCacheStorage) - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work.
- <ApiLink to="class/ApifyScheduler">`apify.scrapy.ApifyScheduler`</ApiLink> - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests.
- <ApiLink to="class/ActorDatasetPushPipeline">`apify.scrapy.ActorDatasetPushPipeline`</ApiLink> - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset.
- <ApiLink to="class/ApifyHttpProxyMiddleware">`apify.scrapy.ApifyHttpProxyMiddleware`</ApiLink> - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service.
- <ApiLink to="class/ApifyCacheStorage">`apify.scrapy.extensions.ApifyCacheStorage`</ApiLink> - A storage backend for Scrapy's built-in [HTTP cache middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache). This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Make sure to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, or caching won't work.

Additional helper functions in the [`apify.scrapy`](https://github.com/apify/apify-sdk-python/tree/master/src/apify/scrapy) subpackage include:

Expand Down
10 changes: 0 additions & 10 deletions website/src/components/ApiLink.jsx

This file was deleted.

Loading
Loading