Skip to content

Commit

Permalink
Merge pull request #4 from Shopify/mac-adam-chaieb-patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
kirs committed Jul 27, 2018
2 parents 4cd4741 + 306c7c1 commit a76168b
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 15 deletions.
12 changes: 6 additions & 6 deletions README.md
Expand Up @@ -18,15 +18,15 @@ class SimpleJob < ActiveJob::Base
end
```

The job would run fairly quickly when you only have a hundred User records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours and days.
The job would run fairly quickly when you only have a hundred `User` records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.

With frequent deploys and worker restarts, it would mean that a job will be either lost of started from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.

Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? What if Kubernetes pod is getting [evicted](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/)? Again, all job progress will be lost. At Shopify, we also use it to interrupt workloads safely when moving tenants between shards and move shards between regions.
Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? What if a Kubernetes pod is getting [evicted](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/)? Again, all job progress will be lost. At Shopify, we also use it to interrupt workloads safely when moving tenants between shards and move shards between regions.

Software that is designed for high availability [must be friendly](https://12factor.net/disposability) to interruptions that come from the infrastructure. That's exactly what Iteration brings to ActiveJob. It's been developed at Shopify to safely process long-running jobs, in Cloud, and has been working in production since May 2017.
Software that is designed for high availability [must be resilient](https://12factor.net/disposability) to interruptions that come from the infrastructure. That's exactly what Iteration brings to ActiveJob. It's been developed at Shopify to safely process long-running jobs, in Cloud, and has been working in production since May 2017.

We recommend you to watch a [conference talk](https://www.youtube.com/watch?v=XvnWjsmAl60) about the ideas and history behind Iteration API.
We recommend that you watch one of our [conference talks](https://www.youtube.com/watch?v=XvnWjsmAl60) about the ideas and history behind Iteration API.

## Getting started

Expand Down Expand Up @@ -134,11 +134,11 @@ There a few configuration assumptions that are required for Iteration to work wi

**What happens with retries?** An interruption of a job does not count as a retry. The iteration of job that caused the job to fail will be retried and progress will continue from there on.

**What happens if my iteration takes a long time?** We recommend that a single `each_iteration` should take no longer than 30 seconds. In the future, this may raise.
**What happens if my iteration takes a long time?** We recommend that a single `each_iteration` should take no longer than 30 seconds. In the future, this may raise an exception.

**Why is it important that `each_iteration` takes less than 30 seconds?** When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that `each_iteration` completes within a reasonable amount of time.

**What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job. We may expose primitives in the future to do this more effectively, but this is not terribly common today. We recommend to read https://goo.gl/UobaaU to learn more about nested operations.
**What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job, which will run your nested business logic. We may expose primitives in the future to do this more effectively, but this is not terribly common today. We recommend to read https://goo.gl/UobaaU to learn more about nested operations.

**Why do I use have to use this ugly helper in `build_enumerator`? Why can't you automatically infer it?** This is how the first version of the API worked. We checked the type of object returned by `build_enumerable`, and whether it was ActiveRecord Relation or an Array, we used the matching adapter. This caused opaque type branching in Iteration internals and it didn’t allow developers to craft their own Enumerators and control the cursor value. We made a decision to _always_ return Enumerator instance from `build_enumerator`. Now we provide explicit helpers to convert ActiveRecord Relation or an Array to Enumerator, and for more complex iteration flows developers can build their own `Enumerator` objects.

Expand Down
18 changes: 9 additions & 9 deletions guides/iteration-how-it-works.md
@@ -1,8 +1,8 @@
# Iteration: how it works

The main idea behind Iteration is to provide an API to describe jobs in interruptible manner, on the contrast with one massive `def perform` that is impossible to interrupt safely.
The main idea behind Iteration is to provide an API to describe jobs in an interruptible manner, in contrast with implementing one massive `#perform` method that is impossible to interrupt safely.

Exposing the enumerator and the action to apply allows us to keep the cursor and interrupt between iterations. Let's see how it looks like on example of an ActiveRecord relation (and Enumerator).
Exposing the enumerator and the action to apply allows us to keep a cursor and interrupt between iterations. Let's see what this looks like with an ActiveRecord relation (and Enumerator).

1. `build_enumerator` is called, which constructs `ActiveRecordEnumerator` from an ActiveRecord relation (`Product.all`)
2. The first batch of records is loaded:
Expand All @@ -11,9 +11,9 @@ Exposing the enumerator and the action to apply allows us to keep the cursor and
SELECT `products`.* FROM `products` ORDER BY products.id LIMIT 100
```

3. Job iterates over two records of the relation and then receives `SIGTERM` (graceful termination signal) caused by a deploy
4. Signal handler sets a flag that makes `job_should_exit?` to return `true`
5. After the last iteration is completed, we will check `job_should_exit?` which now returns `true`
3. The job iterates over two records of the relation and then receives `SIGTERM` (graceful termination signal) caused by a deploy.
4. The signal handler sets a flag that makes `job_should_exit?` return `true`.
5. After the last iteration is completed, we will check `job_should_exit?` which now returns `true`.
6. The job stops iterating and pushes itself back to the queue, with the latest `cursor_position` value.
7. Next time when the job is taken from the queue, we'll load records starting from the last primary key that was processed:

Expand All @@ -23,18 +23,18 @@ SELECT `products`.* FROM `products` WHERE (products.id > 2) ORDER BY products.i

## Signals

It's critical to know UNIX signals in order to understand how interruption works. There are two main signals that Sidekiq and Resque use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq and Resque use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
`SIGTERM` is what allows Iteration to work. In contrast, `SIGKILL` means immediate exit. It doesn't let the worker terminate gracefully, instead it will drop the job and exit as soon as possible.

Most of deploy strategies (Kubernetes, Heroku, Capistrano) send `SIGTERM` before the shut down, then wait for the a timeout (usually from 30 seconds to a minute) and send `SIGKILL` if the process haven't terminated yet.
Most of the deploy strategies (Kubernetes, Heroku, Capistrano) send `SIGTERM` before shutting down a node, then wait for a timeout (usually from 30 seconds to a minute) to send `SIGKILL` if the process has not terminated yet.

Further reading: [Sidekiq signals](https://github.com/mperham/sidekiq/wiki/Signals).

## Enumerators

In the early versions of Iteration, `build_enumerator` used to return ActiveRecord relations directly, and we would infer the Enumerator based on the type of object. We used to support ActiveRecord relations, arrays and CSVs. This way it was hard to add support for anything else to enumerate, and it was easy for developers to make a mistake and return an array of ActiveRecord objects, and for us starting to threat that as an array instead of ActiveRecord relation.
In the early versions of Iteration, `build_enumerator` used to return ActiveRecord relations directly, and we would infer the Enumerator based on the type of object. We used to support ActiveRecord relations, arrays and CSVs. This made it hard to add support for other types of enumerations, and it was easy for developers to make mistakes and return an array of ActiveRecord objects, and for us starting to treat that as an array instead of as an ActiveRecord relation.

In the current version of Iteration, it supports _any_ Enumerator. We expose helpers to build enumerators conveniently (`enumerator_builder.active_record_on_records`), but it's up for a developer to implement a custom Enumerator. Consider this example:
The current version of Iteration supports _any_ Enumerator. We expose helpers to build enumerators conveniently (`enumerator_builder.active_record_on_records`), but it's up for a developer to implement a custom Enumerator. Consider this example:

```ruby
class MyJob < ActiveJob::Base
Expand Down

0 comments on commit a76168b

Please sign in to comment.