# Enable status tracking for business critical application using Amazon DynamoDB

In this example, we introduce a pattern for tracking the status of business critical tasks using Amazon DynamoDB as the backend. This feature gives you the ability to track status of each task, and error-handling, retry, concurrency control out-of-the-box.

When managing a large number of business-critical tasks, it is crucial to monitor and identify which tasks have been successful, which have failed, and which are still in progress. If the business logic is a pipeline consisting of a sequence of tasks, it is important to keep track of its current status and have the ability to recover from any failed task. We also have seen some advanced requirements from AWS customers that includes:

- Each task should be consumed once and exactly once.
- Each task should be handled by only one worker, you want a concurrency lock mechanism to avoid double consumption.
- For those succeeded tasks, store additional information such as the output, statistics, metadata of the task and log the success time.
- For those failed tasks, you want to log the detailed error message for debugging.
- You want to get all of the failed tasks by one simple query and rerun with the updated business logic.
- Task might be impossible to complete. To avoid falling into an endless retry loop, you want to ignore the tasks if they fail too many times.
- Run custom query based on task status for analytics purpose.

With [pynamodb_mate](https://github.com/MacHu-GWU/pynamodb_mate-project) Python library, you can enable this advanced feature without refactoring your existing application code, and you can use the "elegant" context manager to wrap around your business logic code and enjoy all the features above.

## A Real-World Example: Tracking Document Processing Tasks

Let's take an example and see how we use ``pynamodb_mate`` for status tracking.

Let's say we have to process thousands of PDF documents. For each document, we need to extract text and key-value data from it, and then we want to use LLM (Large Language Model) to answer some business questions based on the data we extracted. In this example, the task has two steps: "extract data" and "answer business question". We want to ensure that all the documents have been processed properly. We may distribute thousands of PDF documents to a swarm of workers and process them in parallel, but we only want one document to be worked on by one worker at a time. Since this job is complicated and the worker may fail, we want a retry mechanism to ensure that all the jobs have been processed eventually. However, it is possible that the input document is malformed, making it impossible to finish. We need to avoid deadlock and infinite retry.

Given this example use case, I would like to define the following concepts:

- **Task and Task Id**: Each task that we want to track the status of. Each task has a unique ``task_id``. In this use case, the task ID is the document ID.
- **Status Code**: The status of each task. The lifecycle of each task goes through the following process: ``pending`` -> ``in_progress`` -> ``failed/succeeded/ignored``. ``pending`` means the task is ready to be executed at any time, ``in_progress`` means the task is being executed, ``failed`` means the execution failed, ``succeeded`` means the execution was successful, and ``ignored`` means the task was ignored due to too many failed retries. ``pynamodb_mate`` doesn't allow adding any other status codes in this process. This is because the lifecycle corresponds to an idempotent operation. If you have other intermediate status operations in ``in_progress``, let's assume there are a total of 2 steps. In this case, you should split these 2 steps into two lifecycles. The status code for ``succeeded`` in the first step should be equal to the status code for ``pending`` in the second step. The reason for doing this is that once you have other intermediate status operations, your idempotency cannot be guaranteed, so you must split them into two steps. In this use case, the status codes for Step 1 are 10 for pending, 20 for in progress, 30 for failed, 40 for succeeded, and 50 for ignored (failed too many times, ignore it to avoid deadlock). For Step 2, 40 is for pending (Step 1 is succeeded), 70 is in progress, 80 is failed, 90 is succeeded, and 100 is ignored.
- **Use Case and Use Case Id**: In DynamoDB, we often use a single table to do many things that would require multiple tables in relational databases. In this article, we will put Tasks of completely different types in the same table. These different Tasks may have the same status codes, for example, we use 0 to represent "pending". Therefore, we need a field to separate these Tasks, and this field is ``use_case_id``. We will ensure that all ``task_id`` under a ``use_case_id`` are unique. In this example, the use case ID is "pdf_processing". We don't have other use cases in this example. But by design, pynamodb_mate is able to manage many use cases in a single DynamoDB table.
- **Execution Context (ctx)**: In our program design, we use Python's Context Manager to start a task's lifecycle and automatically update the status in the database based on the task's execution result (success or failure). This Execution Context is a container for all the contextual data in the lifecycle of executing a task, including any user data you need to save during the processing of this task.
- **Execution**: The process of attempting to execute a task, which is the "lifecycle" of the task that has been mentioned many times above.

Now, let's start writing some code to implement this use case.p

## Import pynamodb_mate

All public APIs of ``pynamodb_mate`` are under the ``pynamodb_mate.api`` module. We will import this module first.

In [1]:
import pynamodb_mate.api as pm

print(f"{pm.__version__ = }")

# make the status_tracker submodule name space shorter
st = pm.patterns.status_tracker

pm.__version__ = '6.0.0.1'


## Define Status Code Enum

First, we define some status code using the enum Python standard library. It improves the code readability and avoids hard coding meaningless integers everywhere in the code base.

The ``pm.patterns.status_tracker.BaseStatusEnum`` base class is a subclass of the standard ``enum.Enum``. It gives your status a human-readable name and a machine-readable integer. Usually, we only transition from a smaller integer to a bigger integer. However, this is not enforced by the library. You can define your own status code as long as you follow the idempotent operation principle.

In [2]:
# usually the closer to success, the bigger the integer is    
class Step1StatusEnum(st.BaseStatusEnum):
    pending = 10
    in_progress = 20
    failed = 30
    succeeded = 40
    ignored = 50


class Step2StatusEnum(st.BaseStatusEnum):
    pending = 40
    in_progress = 70
    failed = 80
    succeeded = 90
    ignored = 100

Some useful methods are provided by the ``BaseStatusEnum`` class:

In [3]:
print(f"{Step1StatusEnum.value_to_name(10) = }")
print(f"{Step1StatusEnum.pending.status_name = }")
print(f"{Step1StatusEnum.pending.values() = }")

Step1StatusEnum.value_to_name(10) = 'pending'
Step1StatusEnum.pending.status_name = 'pending'
Step1StatusEnum.pending.values() = [10, 20, 30, 40, 50]


## Declare Your DynamoDB Status Tracking Table

Now, we cand declare a DynamoDB ORM model for the ``Task`` concepts. Since we have two steps in this use case, we need to define two tables. The first table is for Step 1, and the second table is for Step 2. Because they share a lot of things in common, so we could create a base class to avoid code duplication.

In [4]:
class Task(st.BaseTask):
    class Meta:
        table_name = f"pynamodb-mate-test-status-tracker"
        region = "us-east-1"
        billing_mode = pm.constants.PAY_PER_REQUEST_BILLING_MODE

    status_and_update_time_index = st.StatusAndUpdateTimeIndex()

class Step1(Task):
    status_and_update_time_index = st.StatusAndUpdateTimeIndex()

    config = st.TrackerConfig.make(
        use_case_id="test",
        pending_status=Step1StatusEnum.pending.value,
        in_progress_status=Step1StatusEnum.in_progress.value,
        failed_status=Step1StatusEnum.failed.value,
        succeeded_status=Step1StatusEnum.succeeded.value,
        ignored_status=Step1StatusEnum.ignored.value,
        n_pending_shard=5,
        n_in_progress_shard=5,
        n_failed_shard=5,
        n_succeeded_shard=10,
        n_ignored_shard=5,
        status_zero_pad=3,
        status_shard_zero_pad=3,
        max_retry=3,
        lock_expire_seconds=60,
    )


class Step2(Task):
    status_and_update_time_index = st.StatusAndUpdateTimeIndex()

    config = st.TrackerConfig.make(
        use_case_id="test",
        pending_status=Step2StatusEnum.pending.value,
        in_progress_status=Step2StatusEnum.in_progress.value,
        failed_status=Step2StatusEnum.failed.value,
        succeeded_status=Step2StatusEnum.succeeded.value,
        ignored_status=Step2StatusEnum.ignored.value,
        n_pending_shard=5,
        n_in_progress_shard=5,
        n_failed_shard=5,
        n_succeeded_shard=10,
        n_ignored_shard=5,
        status_zero_pad=3,
        status_shard_zero_pad=3,
        max_retry=3,
        lock_expire_seconds=60,
    )

In [6]:
# Create the table if it doesn't exist
Task.create_table(wait=True)
Task.delete_all() # clean up the table to ensure a fresh start

0

## Initialize a Task

At very begin, you should create a new task with pending status and save it to DynamoDB. The ``Task.make_and_save(...)`` constructor method can do this job. If you only want to create an instance without saving it to DynamoDB, you can use the ``Task.make(...)`` method.

In [7]:
from rich import print as rprint

task_id = "t-1"

# create a new task
step1 = Step1.make_and_save(task_id=task_id, data={"version": 1})

rprint(step1.to_dict())
print(f"{step1.status_name = }")
print(f"{step1.is_locked() = }")

step1.status_name = 'pending'
step1.is_locked() = False


## Start an Execution to finish Step 1

The ``Task.start(...)`` method is the **CORE** of this best practice. It is a context manager where you can put your task execution logic under it. It does the following:

1. It will set the status to the ``in_progress_status`` and set the lock. If the task is already locked, it will raise a ``TaskLockedError``.
2. If the task succeeded, it will set the status to the ``succeeded_status``.
3. If the task fail, it will set the status to the ``failed_status`` and log the error to ``.errors`` attribute, and add retry count.
4. If the task failed N times in a row, it will set the status to the ``ignored_status``.

The code below is a "happy path". You can use ``with Step1.start(task_id=..., detailed_error(optional)=True | False, debug(optional)=True|False) as execution_context:`` to start the lifecycle of the task execution. The context manager will return a ``ExecutionContext`` object, which is a container for all the contextual data in the lifecycle of executing a task, including the instance of the DynamoDB ORM model (DynamoDB item), and allow you to use ``ExecutionContext.set_data()`` method to update any user data you need to save at the end of this task. 


In [8]:
exec_ctx: st.ExecutionContext
# by default, it won't print any trace log, you can set debug=True to enable that
with Step1.start(task_id, debug=True) as exec_ctx:
    print(f"{exec_ctx.task.status_name = }") # in progress
    print(f"{exec_ctx.task.is_locked() = }") # is locked
    # --- put your business logic here, start of business logic
    print("do some work ...")
    # you can use exec_ctx.set_data() method to update the data attribute
    # you should not manually update other attributes like
    # create_time, update_time, status, etc.
    exec_ctx.set_data({"version": 1})
    # --- end of business logic

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-1'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
exec_ctx.task.status_name = 'in_progress'
exec_ctx.task.is_locked() = True
do some work ...
✅ 🔐 task succeeded, set status 'success' and unlock the task.
--------------------- ⏹️ end Task(use_case_id='test', task_id='t-1'), status=40)) (aka 'succeeded')---------------------


When it's done, you can get the DynamoDB item to verify the status of the task.

In [9]:
step1 = Step1.get_one_or_none(task_id=task_id)
print(f"{step1.status_name = }")
print(f"{step1.create_time = }")
print(f"{step1.update_time = }")
print(f"{step1.is_locked() = }") # not locked
print(f"{step1.retry = }")
print(f"{step1.data = }")
print(f"{step1.errors = }") # no error

step1.status_name = 'succeeded'
step1.create_time = datetime.datetime(2024, 5, 23, 6, 7, 26, 444861, tzinfo=datetime.timezone.utc)
step1.update_time = datetime.datetime(2024, 5, 23, 6, 7, 50, 917536, tzinfo=datetime.timezone.utc)
step1.is_locked() = False
step1.retry = 0
step1.data = {'version': 1}
step1.errors = {'history': []}


## Run Step 1 Again

Let's run the Step 1 again. Since it is already succeeded, we should not run it again. Logically, we should execute a task only if the status is pending or failed. If it is already in progress, it means that there's another worker is working on it. If it is ignored, it means that it is failed too many times and we need a human to investigate the root cause. 

In [10]:
with Step1.start(task_id, debug=False) as exec_ctx:
    pass

TaskIsNotReadyToStartError: Task(use_case_id='test', task_id='t-1') is not ready to start, either it is locked or status is not in 'pending' or 'failed'. You may use ``with Task.start(task_id=..., detailed_error=True) as execution_context:`` to get more details.

In [11]:
with Step1.start(task_id, detailed_error=True, debug=False) as exec_ctx:
    pass

TaskAlreadySucceedError: Task(use_case_id='test', task_id='t-1') is already succeeded.

## Start an Execution to try Step 2

This time we will continue on step 2. Before we start the execution, let's take a look at the current status of the task. This is not necessary, it is just for demonstration purposes.

In [12]:
step2 = Step2.get_one_or_none(task_id=task_id)
rprint(step2.to_dict())
print(f"{step2.status_name = }")
print(f"{step2.is_locked() = }")

step2.status_name = 'pending'
step2.is_locked() = False


Now, let's start the execution. We would like to manually raise an exception to simulate a failure. In this example, the application code set the data to ``{"version": 2}`` and then raise a ``UserError``. Since the task is failed, the updates will not be saved to DynamoDB.

In [13]:
class UserError(Exception):
    pass

with Step2.start(task_id, debug=True) as exec_ctx:
    exec_ctx.set_data({"version": 2})
    raise UserError("something is wrong!")

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-1'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ 🔐 task failed, set stats 'failed' and unlock the task.
---------------------- ⏹️ end Task(use_case_id='test', task_id='t-1'), status=80)) (aka 'failed')-----------------------


UserError: something is wrong!

In [14]:
step2 = Step2.get_one_or_none(task_id=task_id)
rprint(step2.to_dict())
print(f"{step2.status_name = }")
print(f"{step2.is_locked() = }")

step2.status_name = 'failed'
step2.is_locked() = False


If you really want to save the intermediate data to DynamoDB before the execution is done, you can use the ``exec_ctx.update()`` method.

In [15]:
with Step2.start(task_id, debug=True) as exec_ctx:
    exec_ctx.set_data({"version": 2})
    exec_ctx.update()
    raise UserError("something is wrong!")

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-1'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ 🔐 task failed, set stats 'failed' and unlock the task.
-------------------- ⏹️ end Task(use_case_id='test', task_id='t-1'), status=70)) (aka 'in_progress')--------------------


UserError: something is wrong!

In [16]:
step2 = Step2.get_one_or_none(task_id=task_id)
rprint(step2.to_dict())
print(f"{step2.status_name = }")
print(f"{step2.is_locked() = }")

step2.status_name = 'failed'
step2.is_locked() = False


## Run Step 2 Again

This time we let the task succeed.

In [17]:
with Step2.start(task_id, debug=True) as exec_ctx:
    exec_ctx.set_data({"version": 3})

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-1'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
✅ 🔐 task succeeded, set status 'success' and unlock the task.
--------------------- ⏹️ end Task(use_case_id='test', task_id='t-1'), status=90)) (aka 'succeeded')---------------------


We know that the ``ExecutionContext.task`` is the representation of the DynamoDB item. It will be updated automatically when the context manage exits. You don't have to get the latest data from DynamoDB manually.

In [18]:
step2 = exec_ctx.task
# this is the same as the above
# step2 = Step2.get_one_or_none(task_id=task_id)

rprint(step2.to_dict())
print(f"{step2.status_name = }")
print(f"{step2.is_locked() = }")
print(f"{step2.data = }") # should be v3

step2.status_name = 'succeeded'
step2.is_locked() = False
step2.data = {'version': 3}


## Concurrency Lock Mechanism

To ensure that one task can be processed by one worker at a time, we need to use a concurrency lock. Essentially, a worker tries to acquire a lock when it is about to start an execution. If it fails to obtain the lock, it immediately raises a ``TaskIsLockedError``. Acquiring a lock is actually an update operation that sets a unique UUID as an attribute of the DynamoDB item. The worker who successfully acquires the lock must release it (unset the lock value) at the end of the execution. This is automatically handled by the ``start()`` context manager. If the worker fails to release the lock for any reason (e.g., power outage), the lock will automatically expire after a specified time. The expiration time is defined in the ``TrackerConfig``.


In [19]:
task_id = "t-2"

# create a new task
step1 = Step1.make_and_save(task_id=task_id, data={"version": 0})

print("worker 1 is trying to start the execution")
with Step1.start(task_id, debug=True) as exec_ctx:
    # another worker is trying to start the same task
    print("worker 2 is trying to start the execution")
    try:
        with Step1.start(
            task_id,
            detailed_error=True,
            debug=True,
        ) as exec_ctx_1:
            pass
    # gracefully just leave when it's locked
    except st.TaskLockedError:
        pass
    # the task is processed by worker 1 at this moment
    # let's exam the status of the task
    step1 = Step1.get_one_or_none(task_id=task_id)
    rprint(step1.to_dict())
    print(f"{step1.status_name = }")
    print(f"{step1.is_locked() = }")

worker 1 is trying to start the execution
----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-2'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
worker 2 is trying to start the execution
----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-2'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ task failed to get lock, because it is already locked by another worker.
------------------------------------ ⏹️ end Task(use_case_id='test', task_id='t-2'))------------------------------------


step1.status_name = 'in_progress'
step1.is_locked() = True
✅ 🔐 task succeeded, set status 'success' and unlock the task.
--------------------- ⏹️ end Task(use_case_id='test', task_id='t-2'), status=40)) (aka 'succeeded')---------------------


Now the worker 1 should finished the step 1 exeuction and released the lock.

In [20]:
step1 = Step1.get_one_or_none(task_id=task_id)
rprint(step1.to_dict())
print(f"{step1.status_name = }")
print(f"{step1.is_locked() = }")

step1.status_name = 'succeeded'
step1.is_locked() = False


## Ignoring Tasks that Fail Too Many Times

You don't want a task that logically can never succeed to fall into an endless loop. For example, you may accidentally upload a video clip as the document, making it impossible to process it as a PDF. In this example, we defined the maximum number of retry attempts as 3. If a task fails 3 times in a row, it will be ignored. If you attempt to start a task that has been ignored, you will encounter a ``TaskIgnoredError``. During the 3 retry attempts, you will observe how the status changes and the retry count increases. You can also review the error history in the ``errors`` attribute.

In [21]:
task_id = "t-3"

# create a new task
step1 = Step1.make_and_save(task_id=task_id)

print("at the 0th attempt, the task is:")
rprint(step1.to_dict())

at the 0th attempt, the task is:


In [22]:
with Step1.start(task_id=task_id, debug=True):
    raise UserError()

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-3'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ 🔐 task failed, set stats 'failed' and unlock the task.
---------------------- ⏹️ end Task(use_case_id='test', task_id='t-3'), status=30)) (aka 'failed')-----------------------


UserError: 

In [23]:
print("after the 1th attempt, the task is:")
step1 = Step1.get_one_or_none(task_id=task_id)
# rprint(step1.to_dict())
print(f"{step1.status_name = }")
print(f"{step1.retry = }")
print("errors:")
rprint(step1.errors)

after the 1th attempt, the task is:
step1.status_name = 'failed'
step1.retry = 1
errors:


In [24]:
with Step1.start(task_id=task_id, debug=True):
    raise UserError()

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-3'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ 🔐 task failed, set stats 'failed' and unlock the task.
---------------------- ⏹️ end Task(use_case_id='test', task_id='t-3'), status=30)) (aka 'failed')-----------------------


UserError: 

In [25]:
print("after the 2th attempt, the task is:")
step1 = Step1.get_one_or_none(task_id=task_id)
# rprint(step1.to_dict())
print(f"{step1.status_name = }")
print(f"{step1.retry = }")
print("errors:")
rprint(step1.errors)

after the 2th attempt, the task is:
step1.status_name = 'failed'
step1.retry = 2
errors:


This is the 3rd attempts. This time you will see the task status becomes ``ignored``.

In [26]:
with Step1.start(task_id=task_id, debug=True):
    raise UserError()

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-3'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ 🔐 task failed 3 times already, set status 'ignore' and unlock the task.
---------------------- ⏹️ end Task(use_case_id='test', task_id='t-3'), status=50)) (aka 'ignored')----------------------


UserError: 

In [27]:
print("after the 3th attempt, the task is:")
step1 = Step1.get_one_or_none(task_id=task_id)
# rprint(step1.to_dict())
print(f"{step1.status_name = }")
print(f"{step1.retry = }")
print("errors:")
rprint(step1.errors)

after the 3th attempt, the task is:
step1.status_name = 'ignored'
step1.retry = 3
errors:


You will see a ``TaskIgnoredError`` if you try to start the task again

In [28]:
with Step1.start(task_id=task_id, detailed_error=True, debug=True):
    raise UserError()

----------------------------------- ▶️ start Task(use_case_id='test', task_id='t-3'))-----------------------------------
🔓 set status 'in_progress' and lock the task.
❌ task failed to get lock, because it is ignored.
------------------------------------ ⏹️ end Task(use_case_id='test', task_id='t-3'))------------------------------------


TaskIgnoredError: Task(use_case_id='test', task_id='t-3') is ignored.

## Query Tasks by Status

To restart some tasks from the last failed, you need to be able to query the tasks by status. The ``Task.query_by_status()`` method allow you to get tasks by one or many status codes. By default, it returns tasks in ascending order (older data comes first) by ``update_time``.

First, let's create some test data.

In [29]:
# create some test data
Step1.delete_all()
with Step1.batch_write() as batch:
    for ith, status_enum in enumerate(Step1StatusEnum, start=1):
        step1 = Step1.make(
            task_id=f"t-{ith}",
            _status=status_enum.value,
            data={"status_code": status_enum.value},
        )
        batch.save(step1)
        rprint(step1)

``query_by_status()`` method uses the Global Secondary Index (GSI) under the hood. The GSI index defined by ``pynamodb_mate`` uses `INCLUDE PROJECTION <https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Projection.html>`_ that only returns the task_id, status, update_time when querying the index. By default, you need to call ``task.refresh()`` method to get the value of all attributes from DynamoDB. You can also do ``query_by_status(..., auto_refresh=True)`` to automatically refresh the data attributes.

In [30]:
for step1 in Step1.query_by_status(status=Step1StatusEnum.succeeded):
    print("by default, the `data` attributes is the default value (set by ORM model)")
    rprint(step1.data)
    print("after refresh, the `data` attributes is the real value ")
    step1.refresh()
    rprint(step1.data)

by default, the `data` attributes is the default value (set by ORM model)


after refresh, the `data` attributes is the real value 


In [31]:
print("With auto_refresh = True, the `data` attributes is the real value ")
for step1 in step1.query_by_status(Step1StatusEnum.ignored, auto_refresh=True):
    rprint(step1.data)

With auto_refresh = True, the `data` attributes is the real value 


## GSI Sharding on Status Attribute

To query tasks by status code, we need to use a GSI (Global Secondary Index) that uses the status attribute as the hash key. However, the status attribute doesn't have good cardinality, as it only has 5 different values. If you have a large number of tasks, the GSI can become very unbalanced. We can use the [GSI Sharding technique](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html) to optimize this.

In production, a healthy task execution system usually ends up with most tasks succeeded in history, a constant number of tasks in progress, and a small number of tasks in pending, failed, and ignored states. ``pynamodb_mate`` allows you to split succeeded tasks into many shards while keeping the other statuses in fewer shards. Since you need to query all shards and merge the results when performing a query, this strategy can make the GSI query more efficient when querying non-succeeded tasks (less shard to merge) and evenly distribute the succeeded tasks in the GSI.

The `TrackerConfig` class has the following parameters to control the GSI sharding:

```python
class Step1(Task):
    status_and_update_time_index = st.StatusAndUpdateTimeIndex()

    config = st.TrackerConfig.make(
        use_case_id="test",
        pending_status=Step1StatusEnum.pending.value,
        in_progress_status=Step1StatusEnum.in_progress.value,
        failed_status=Step1StatusEnum.failed.value,
        succeeded_status=Step1StatusEnum.succeeded.value,
        ignored_status=Step1StatusEnum.ignored.value,
        n_pending_shard=5, # <--- number of shards
        n_in_progress_shard=5, # <--- number of shards
        n_failed_shard=5, # <--- number of shards
        n_succeeded_shard=10, # <--- number of shards
        n_ignored_shard=5, # <--- number of shards
        status_zero_pad=3,
        status_shard_zero_pad=3, # <--- number of zero pad for shard id
        max_retry=3,
        lock_expire_seconds=60,
    )
```

You don't have to explicitly specify the shard ID when using the ``Task.query_by_status(status=...)`` API. It will automatically query all shards and merge the results for you. However, if you really want to query a specific shard, you can use the GSI index object directly. The ``Task.make_value(...)`` method can help you generate the GSI hash key value.

In [32]:
for step1 in Step1.status_and_update_time_index.query(
    hash_key=Step1.make_value(
        status=Step1StatusEnum.succeeded.value,
        _shard_id=4,
    ),
):
    step1.refresh()
    rprint(f"{step1.to_dict()}")

## Conclusion

In general, this solution can improve the visibility, resilience and reliability of a business critical application. There’s no upfront effort to use this solution, because DynamoDB is a fully managed service. Naturally, it is scalable to adapt to very high workload or unpredictable workload.

The usage of this solution is not limited to the above examples. If you see potential to use this solution in your business problems, please don’t hesitate to let us know by creating an [issue](https://github.com/MacHu-GWU/pynamodb_mate-project/issues).