From bc260e02b44a5c0acafdc0f0b2c9070d91ffcce1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Mon, 7 Oct 2024 03:11:53 +0200 Subject: [PATCH 1/5] docs: rewrite state persistence Restructure content with additional headings & subheadings Simplified language Adjusted heading levels --- .../builds_and_runs/state_persistence.md | 55 +++++++++++-------- 1 file changed, 33 insertions(+), 22 deletions(-) diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md index 54baab5986..88709b7c99 100644 --- a/sources/platform/actors/development/builds_and_runs/state_persistence.md +++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md @@ -1,51 +1,62 @@ --- title: State persistence -description: Maintain a long-running Actor's state to prevent unexpected restarts. See a code example on how to prevent a run in the case of a server shutdown. +description: Learn how to maintain an Actor's state to prevent data loss during unexpected restarts. Includes code examples for handling server migrations. slug: /actors/development/builds-and-runs/state-persistence --- # [](#state-persistence)State persistence -**Maintain a long-running Actor's state to prevent unexpected restarts. See a code example on how to prevent a run in the case of a server shutdown.** +**Learn how to maintain an Actor's state to prevent data loss during unexpected restarts. Includes code examples for handling server migrations.** import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; --- -Long-running [Actor](../../index.mdx) jobs may need to migrate from one server to another. Unless you save your job's progress, it will be lost during the migration. The Actor will restart from scratch on the new server, which can be costly. +Long-running [Actor](../../index.mdx) jobs may need to migrate between servers. Without state persistence, your job's progress, is lost during migration, causing it to restart from the beginning on the new server. This can be costly and time-consuming. -To avoid this, long-running Actors should save (persist) their state periodically and listen for [migration events](/sdk/js/api/apify/class/PlatformEventManager). When started, these Actors should [check for persisted state](#code-examples), so they can continue where they left off. +To prevent data loss, long-running Actors should: -For short-running Actors, the chance of a restart and the cost of repeated runs are low, so restarts can be ignored. +- Periodically save (persist) their state. +- Listem for [migration events](/sdk/js/api/apify/class/PlatformEventManager) +- Check for persisted state when starting, allowing them to resume from where they left off. -## [](#what-is-a-migration)What is a migration? +For short-running Actors, the risk of restarts and the cost of repeated runs are low, so you can typically ignore state persistence. -A migration is when a process running on a server has to stop and move to another. All in-progress processes on the current server are stopped. Unless you have saved your state, the Actor run will restart on the new server. For example, if a request in your [request queue](../../../storage/request_queue.md) has not been updated as **crawled** before the migration, it will be crawled again. +## Undersanding migrations -**When a migration event occurs, you only have a few seconds to save your work.** +A migration occurs when a process running on one srever must stop and move to another. During this process: -## [](#why-do-migrations-happen)Why do migrations happen? +- All in-progress processes on the current server are stopped +- Unless you've saved your state, the Actor run will restart on the new server +- You only have a few seconds to save your work when a migration event occurs -- To optimize server workloads. -- When a server crashes (unlikely). -- When we release new features and fix bugs. +### Causes of migration -## [](#how-often-do-migrations-occur)How often do migrations occur? +Migrations can happen for several reasons: -Migrations have no specific interval at which they happen. They are caused by the [above events](#why-do-migrations-happen), so they can happen at any time. +- Server workload optimization +- Server crashes (rare) +- New feature releases and bug fixes -## [](#why-is-state-lost-during-migration)Why is state lost during migration? +### Frequency of migrations -Unless instructed to save its output or state to a [storage](../../../storage/index.md), an Actor keeps them in the server's memory. When it switches servers, the run loses access to the previous server's memory. Even if data were saved on the server's disk, we would also lose access to that. +Migrations don't follow a specific schedule. They can occur at any time due to the events mentioned above. -## [](#how-to-persist-state)How to persist state +## Why state is lost during migration -The [Apify SDKs](/sdk) persist their state automatically. In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). The `persistState` event notifies SDK components to persist their state at regular intervals in case a migration happens. The `migrating` event is emitted just before a migration. +By default, an Actor keeps its output and state in the server's memory. During a server switch, the run loses access to the previous server's memory. Even if data were saved on the server's disk, access to that would also be lost. -### [](#code-examples)Code examples +## Implementing state persistence -To persist state manually, you can use the `Actor.on` method in the Apify SDK. +The [Apify SDKs](/sdk) handle state persistence automatically. In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). + +- The `persistState` event prompts SDK components to save their state at regular intervals +- The `migrating` event is triggered just before a migration occurs. + +### Code examples + +To manually persis state, use the `Actor.on` method in the Apify SDK: @@ -83,7 +94,7 @@ async def main(): -To check for state saved in a previous run, use: +To check for state saved in a previous run: @@ -114,4 +125,4 @@ async def main(): -To improve your Actor's performance, you can also [cache repeated page data](/academy/expert-scraping-with-apify/saving-useful-stats). +TFor improved Actor performance consider [caching repeated page data](/academy/expert-scraping-with-apify/saving-useful-stats). From ade3068a7faaf0c6a79e2269f9872c3e849ed8ca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Tue, 8 Oct 2024 13:48:41 +0200 Subject: [PATCH 2/5] further rewrites --- .../actors/development/builds_and_runs/state_persistence.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md index 88709b7c99..e5b70f242a 100644 --- a/sources/platform/actors/development/builds_and_runs/state_persistence.md +++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md @@ -4,7 +4,7 @@ description: Learn how to maintain an Actor's state to prevent data loss during slug: /actors/development/builds-and-runs/state-persistence --- -# [](#state-persistence)State persistence +# State persistence **Learn how to maintain an Actor's state to prevent data loss during unexpected restarts. Includes code examples for handling server migrations.** @@ -125,4 +125,4 @@ async def main(): -TFor improved Actor performance consider [caching repeated page data](/academy/expert-scraping-with-apify/saving-useful-stats). +For improved Actor performance consider [caching repeated page data](/academy/expert-scraping-with-apify/saving-useful-stats). From dc6841d9a9baf27fca552ffe5b86b3c9f53cb409 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Thu, 10 Oct 2024 11:03:01 +0200 Subject: [PATCH 3/5] add explanation about state persistence in Python --- .../development/builds_and_runs/state_persistence.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md index e5b70f242a..0aada1ed52 100644 --- a/sources/platform/actors/development/builds_and_runs/state_persistence.md +++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md @@ -49,11 +49,19 @@ By default, an Actor keeps its output and state in the server's memory. During a ## Implementing state persistence -The [Apify SDKs](/sdk) handle state persistence automatically. In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). +The [Apify SDKs](/sdk) handle state persistence automatically. + +In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). - The `persistState` event prompts SDK components to save their state at regular intervals - The `migrating` event is triggered just before a migration occurs. +In Python, state persistence is handled using the `Actor.on()` method and the migrating event, similar to JavaScript. The Apify SDK for Python provides mechanisms to save and retrieve state data. + +- The `migrating` event is triggered just before a migration occurs, allowing you to save your state. +- To retrieve previously saved state, you can use the `Actor.get_value()` method. + + ### Code examples To manually persis state, use the `Actor.on` method in the Apify SDK: From 248313740875f2fb95cb3e25cba78d76d7203ca0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Thu, 10 Oct 2024 13:08:40 +0200 Subject: [PATCH 4/5] remove mentions of PlatformEventManager & expand info about migrating event & Actor methods --- .../development/builds_and_runs/state_persistence.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md index 0aada1ed52..921a2f0556 100644 --- a/sources/platform/actors/development/builds_and_runs/state_persistence.md +++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md @@ -51,16 +51,10 @@ By default, an Actor keeps its output and state in the server's memory. During a The [Apify SDKs](/sdk) handle state persistence automatically. -In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). - -- The `persistState` event prompts SDK components to save their state at regular intervals -- The `migrating` event is triggered just before a migration occurs. - -In Python, state persistence is handled using the `Actor.on()` method and the migrating event, similar to JavaScript. The Apify SDK for Python provides mechanisms to save and retrieve state data. +This is done using the `Actor.on()` method and the `migrating` event. - The `migrating` event is triggered just before a migration occurs, allowing you to save your state. -- To retrieve previously saved state, you can use the `Actor.get_value()` method. - +- To retrieve previously saved state, you can use the [`Actor.getValue`](/sdk/js/reference/class/Actor#getValue)/[`Actor.get_value()`](/sdk/python/reference/class/Actor#get_value) methods. ### Code examples From 280fdf0339dba06311381b2f756c29b4188cc817 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Thu, 10 Oct 2024 13:31:13 +0200 Subject: [PATCH 5/5] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: FrantiĊĦek Nesveda --- .../builds_and_runs/state_persistence.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md index 921a2f0556..c9889483f5 100644 --- a/sources/platform/actors/development/builds_and_runs/state_persistence.md +++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md @@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem'; --- -Long-running [Actor](../../index.mdx) jobs may need to migrate between servers. Without state persistence, your job's progress, is lost during migration, causing it to restart from the beginning on the new server. This can be costly and time-consuming. +Long-running [Actor](../../index.mdx) jobs may need to migrate between servers. Without state persistence, your job's progress is lost during migration, causing it to restart from the beginning on the new server. This can be costly and time-consuming. To prevent data loss, long-running Actors should: @@ -23,12 +23,12 @@ To prevent data loss, long-running Actors should: For short-running Actors, the risk of restarts and the cost of repeated runs are low, so you can typically ignore state persistence. -## Undersanding migrations +## Understanding migrations -A migration occurs when a process running on one srever must stop and move to another. During this process: +A migration occurs when a process running on one server must stop and move to another. During this process: - All in-progress processes on the current server are stopped -- Unless you've saved your state, the Actor run will restart on the new server +- Unless you've saved your state, the Actor run will restart on the new server with an empty internal state - You only have a few seconds to save your work when a migration event occurs ### Causes of migration @@ -45,7 +45,7 @@ Migrations don't follow a specific schedule. They can occur at any time due to t ## Why state is lost during migration -By default, an Actor keeps its output and state in the server's memory. During a server switch, the run loses access to the previous server's memory. Even if data were saved on the server's disk, access to that would also be lost. +By default, an Actor keeps its state in the server's memory. During a server switch, the run loses access to the previous server's memory. Even if data were saved on the server's disk, access to that would also be lost. Note that the Actor run's default dataset, key-value store and request queue are preserved across migrations, by state we mean the contents of runtime variables in the Actor's code. ## Implementing state persistence @@ -54,11 +54,11 @@ The [Apify SDKs](/sdk) handle state persistence automatically. This is done using the `Actor.on()` method and the `migrating` event. - The `migrating` event is triggered just before a migration occurs, allowing you to save your state. -- To retrieve previously saved state, you can use the [`Actor.getValue`](/sdk/js/reference/class/Actor#getValue)/[`Actor.get_value()`](/sdk/python/reference/class/Actor#get_value) methods. +- To retrieve previously saved state, you can use the [`Actor.getValue`](/sdk/js/reference/class/Actor#getValue)/[`Actor.get_value`](/sdk/python/reference/class/Actor#get_value) methods. ### Code examples -To manually persis state, use the `Actor.on` method in the Apify SDK: +To manually persist state, use the `Actor.on` method in the Apify SDK: