# How to Design DynamoDB Data Model for Production


![Amazon-DynamoDB](https://user-images.githubusercontent.com/6800411/212983034-b0d8f048-228e-4be6-b591-1e39a70d64ec.png)

**Table of Content**

- [Overview](##Overview)
- [Case Study - Design YouTube Data Model](##Case_Study_-_Design_YouTube_Data_Model)
    - [Analyze Business Requirement](###Analyze_Business_Requirement)

## Overview

DynamoDB is a serverless, fully-managed, schemaless, key-value NoSQL database that is suitable for wide range of business-critical use cases. DynamoDB is designed to be schemaless. This means that you can store any type of data in a single table without having to define a schema for the table. This is a great advantage for developers who want to build applications quickly and easily. However, this also means that you need to design your data model carefully to avoid performance issues and data consistency issues.

Unlike the [Database Normalization](https://en.wikipedia.org/wiki/Database_normalization) technique used in relational databases, there is no one-size-fits-all solution for designing a data model in DynamoDB. It is often a matter of weighing the pros and cons of various approaches to solving a given business problem in DynamoDB. It may be necessary to conduct your own experiments to determine the best solution.

In this example, I would like to share a methodology that provide an interactive development experience, and capability to iterate fast on the data modeling design. Below are the highlights of this method:

- **Fast start**: no need to set up a real DynamoDB table and any infrastructure (you can use real DynamoDB table but not necessary).
- **Fast iterate**: flexible to try different data model ideas.
- **Ready to be deployed as an application**: at the end, you will have a working ORM (Object relationship mapping) database model and a application data model can be deployed as an application on AWS Lambda / AWS ECS / EC2.
- **User interaction implemented**: all user interaction to your application will be implemented as a method.
- **Query pattern verified**: all of required business query is ready to use as a method.
- **Self documented**: each business requirements and user interaction is implemented and also documented as a method. The sample input / output data of each user interaction is also documented.

Now, let's use a real-world example to demonstrate how to design a data model for production.

## Case Study - Design YouTube Data Model

![Youtube](https://user-images.githubusercontent.com/6800411/212983031-356dc229-251b-46f7-911f-d3e897ce89fd.jpg)

Let's learn this best practice by "Reinventing YouTube in DynamoDB".

### Analyze Business Requirement

The first step is to list out all the entities involved in your application and how does user interact with your application.

Entities:

- 😀 User: user can upload video, user can also view other's video
- 📺 Video: user can create video

User Interaction:

- People can sign up as a new user.
- User can upload video.
- User can view other user's profile.
- User can view the list of other user's videos, ordered by create time, the latest video comes first.
- User can watch video, then video views plus one.
- User can subscribe other User.
- System knows whether a user watched a video before.
- System can push new video notification based on his subscription.

Query Pattern:

- Given a User id, we can get the detailed information of the user.
- Given a Video id, we can get the detailed information of the video.
- Given a User id, we can get all the videos he uploaded, ordered by create time, the latest video comes first.
- Given a User id, we can get list of user he subscribed, ordered by subscribe time.
- Given a User id, we can get number of user subscribes him.


### Import SDK

For a POC, we use the following tools to simplify our development:

- [moto](http://docs.getmoto.org/en/latest/index.html): a library that allows you to easily mock out tests based on AWS infrastructure.
- [implemented dynamodb feature in moto](http://docs.getmoto.org/en/latest/docs/services/dynamodb.html): list of implemented dynamodb feature in moto.
- [pynamodb_mate](https://pynamodb-mate.readthedocs.io/en/latest/): a powerful DynamoDB SDK python library to implment your ORM DynamoDB model.
- [dataclasses](https://docs.python.org/3/library/dataclasses.html): the Python standard library to implement your application data model.
- [rich](https://github.com/Textualize/rich): pretty print your output for human.

You can run: ``pip install -r requirements.txt`` to install all the required libraries.

In [129]:
import typing as T
import enum
from datetime import datetime

import dataclasses
import pynamodb_mate as pm
from moto import mock_dynamodb

from rich import print as rprint

### Define DynamoDB Data Model

It is always good to define a class for all your data models. It gives you a centralized place to access all the application logics and query patterns. Also, with proper type hint set up, the compiler will automatically check your typo and data type error for you.

To learn more details about how to write DynamoDB application code in Python, read these:

- [Pynamodb getting started](https://pynamodb.readthedocs.io/en/stable/tutorial.html#getting-started)
- [Pynamodb mate](https://pynamodb-mate.readthedocs.io/en/latest/)

In [130]:
class EntityTypeEnum(str, enum.Enum):
    """
    Enumerate all entity type to avoid hard code string.
    """
    USER = "USER"
    VIDEO = "VIDEO"
    SUBSCRIPTION = "SUBSCRIPTION"


ROOT = "_root" # indicate that this item only has hash key, range key is not used (logically)


class UsersVideoIndex(pm.GlobalSecondaryIndex):
    """
    This index is used to query all videos of a user.

    we assume that it is impossible to upload two video at the same timestamp.
    """
    class Meta:
        index = "user-s-video-index"
        projection = pm.IncludeProjection([
            "pk",
            "video_title",
        ])

    video_creator_id: T.Union[str, pm.UnicodeAttribute] = pm.UnicodeAttribute(hash_key=True)
    created_at: T.Union[datetime, pm.UTCDateTimeAttribute] = pm.UTCDateTimeAttribute(range_key=True)


# Type hint notation helper
REQUIRED_STR = T.Union[str, pm.UnicodeAttribute]
OPTIONAL_STR = T.Optional[REQUIRED_STR]
REQUIRED_INT = T.Union[int, pm.NumberAttribute]
OPTIONAL_INT = T.Optional[REQUIRED_INT]
REQUIRED_DATETIME = T.Union[datetime, pm.UTCDateTimeAttribute]
OPTIONAL_DATETIME = T.Optional[REQUIRED_DATETIME]


class Model(pm.Model):
    """
    The main DynamoDB table data model definition.
    """
    class Meta:
        table_name = "entities"
        region = "us-east-1"
        billing_mode = pm.PAY_PER_REQUEST_BILLING_MODE

    # --- hash key, range key, and entity type
    pk: REQUIRED_STR = pm.UnicodeAttribute(hash_key=True)
    sk: REQUIRED_STR = pm.UnicodeAttribute(range_key=True)
    entity_type: OPTIONAL_STR = pm.UnicodeAttribute(default=None, null=True)

    # --- user related
    user_name: OPTIONAL_STR = pm.UnicodeAttribute(default=None, null=True)
    subscribers: OPTIONAL_INT = pm.NumberAttribute(default=None, null=True)

    # --- video related
    video_title: OPTIONAL_STR = pm.UnicodeAttribute(default=None, null=True)
    video_creator_id: OPTIONAL_STR = pm.UnicodeAttribute(default=None, null=True)
    views: OPTIONAL_INT = pm.NumberAttribute(default=None, null=True)

    # --- subscription related
    publisher_id: OPTIONAL_STR = pm.UnicodeAttribute(default=None, null=True)

    # --- common
    created_at: OPTIONAL_DATETIME = pm.UTCDateTimeAttribute(default=None, null=True)

    user_s_video_index = UsersVideoIndex()

    # automatically increase user id and video id for dummy data
    _USER_ID_STARTED = 0
    _VIDEO_ID_STARTED = 0

# use moto to mock DynamoDB, it is an in-memory implementation of DynamoDB
# you can also use the real DynamoDB table by just comment out the below two line
mock = mock_dynamodb()
mock.start()

# create a DynamoDB connection, ensure that your default AWS credential is right
# if you are using mock, then this line always works
connect = pm.Connection()

# Create DynamoDB table and index if not exists
Model.create_table(wait=True)

# Clean up existing dummy data before testing
Model.delete_all()

pass

### Implement User Related Features


In [131]:
class Model(Model):
    _USER_ID_STARTED = 0

    @classmethod
    def signup_user(
        cls,
        user_name: str,
        created_at: datetime,
    ) -> "Model":
        """
        Implement "User Interaction": People can sign up as a new user.
        """
        cls._USER_ID_STARTED += 1
        user_model = cls(
            pk=f"user-{cls._USER_ID_STARTED}",
            sk=ROOT,
            entity_type=EntityTypeEnum.USER.value,
            user_name=user_name,
            subscribers=0,
            created_at=created_at,
        )
        user_model.save()
        return user_model

    @classmethod
    def get_user(cls, user_id: str) -> "Model":
        """
        Implement "Query Pattern": Given a User id, we can get the detailed information of the user.
        """
        return cls.get(hash_key=user_id, range_key=ROOT)

# Test your implementation
print("Create some sample users")
user1 = Model.signup_user(user_name="alice", created_at=datetime(2020, 1, 1))
user2 = Model.signup_user(user_name="bob", created_at=datetime(2020, 1, 2))
user3 = Model.signup_user(user_name="cathy", created_at=datetime(2020, 1, 3))
rprint([
    user1.to_dict(),
    user2.to_dict(),
    user3.to_dict(),
])

print("Get user by user id, user-3 details:")
rprint(Model.get_user("user-3").to_dict())

Create some sample users


Get user by user id, user-3 details:


### Implement Video Related Features

In [132]:
class Model(Model):
    _VIDEO_ID_STARTED = 0

    @classmethod
    def upload_video(
        cls,
        user_id: str,
        video_title: str,
        created_at: datetime,
    ) -> "Model":
        """
        Implement "User Interaction": A user can upload a video.
        """
        cls._VIDEO_ID_STARTED += 1

        video_model = cls(
            pk=f"video-{cls._VIDEO_ID_STARTED}",
            sk=ROOT,
            entity_type=EntityTypeEnum.VIDEO.value,
            video_title=video_title,
            video_creator_id=user_id,
            views=0, # views start with 0
            created_at=created_at,
        )
        video_model.save()
        return video_model

    @classmethod
    def get_video(cls, video_id: str) -> "Model":
        """
        Implement "Query Pattern": Given a Video id, we can get the detailed information of the video.
        """
        return cls.get(hash_key=video_id, range_key=ROOT)

    @classmethod
    def get_users_videos(cls, user_id: str, limit: int = 5):
        """
        Implement "Query Pattern": Given a User id, we can get all the videos of the user.
        """
        return cls.iter_query_index(
            index=cls.user_s_video_index,
            hash_key=user_id,
            scan_index_forward=False,
            limit=limit,
        )

# Test your implementation
print("Create some sample users, user-1 has 1 video, user-2 has 2 videos, user-3 has 3 videos")
video1 = Model.upload_video(user_id="user-1", video_title="video 1 title", created_at=datetime(2020, 2, 1))
video2 = Model.upload_video(user_id="user-2", video_title="video 2 title", created_at=datetime(2020, 2, 2))
video3 = Model.upload_video(user_id="user-2", video_title="video 3 title", created_at=datetime(2020, 2, 3))
video4 = Model.upload_video(user_id="user-3", video_title="video 4 title", created_at=datetime(2020, 2, 4))
video5 = Model.upload_video(user_id="user-3", video_title="video 5 title", created_at=datetime(2020, 2, 5))
video6 = Model.upload_video(user_id="user-3", video_title="video 6 title", created_at=datetime(2020, 2, 6))
rprint(video1.to_dict())

print("Get video by video id, video-2 details:")
rprint(Model.get_video("video-2").to_dict())

print("Get user's videos by user id, user-3's videos")
rprint([video.to_dict() for video in Model.get_users_videos("user-3").all()])

Create some sample users, user-1 has 1 video, user-2 has 2 videos, user-3 has 3 videos


Get video by video id, video-2 details:


Get user's videos by user id, user-3's videos


### Implement Subscription Related Features

In [133]:
class Model(Model):
    @classmethod
    def subscribe(
        cls,
        subscriber_id: str,
        publisher_id: str,
        created_at: datetime,
    ) -> "Model":
        """
        Implement "User Interaction": A user can subscribe to another user.
        """
        subscription_model = cls(
            pk=f"subscriber-{subscriber_id}",
            sk=publisher_id,
            entity_type=EntityTypeEnum.SUBSCRIPTION,
            created_at=created_at,
        )
        subscription_model.save()

        cls(pk=publisher_id, sk=ROOT).update(actions=[
            cls.subscribers.set(cls.subscribers + 1)
        ]) # the publisher's subscribers count + 1

        return subscription_model

    @classmethod
    def get_user_subscriptions(
        cls,
        user_id: str,
    ):
        """
        Implement "Query Pattern": Given a User id, we can get all the subscriptions of the user.
        """
        return cls.iter_query(hash_key=f"subscriber-{user_id}")

# Test your implementation
print("Create some sample subscriptions, user-1 subscribes user-2 and user-3")
Model.subscribe(subscriber_id="user-1", publisher_id="user-2", created_at=datetime(2020, 4, 1))
Model.subscribe(subscriber_id="user-1", publisher_id="user-3", created_at=datetime(2020, 4, 2))

print("Get subscribe list by user id, user-1 subscribes user-2 and user-3")
rprint([user.to_dict() for user in Model.get_user_subscriptions("user-1")])

print("Get number of subscribers by user id, user-3 has 1 subscriber")
user = Model.get_user("user-3")
rprint(f"User {user.pk} has {user.subscribers} subscribers")

Create some sample subscriptions, user-1 subscribes user-2 and user-3
Get subscribe list by user id, user-1 subscribes user-2 and user-3


Get number of subscribers by user id, user-3 has 1 subscriber


### Implement Video Views Related Features

In [134]:
class Model(Model):
    @classmethod
    def watch_video(
        cls,
        user_id: str,
        video_id: str,
    ):
        """
        Implement "User Interaction": A user can watch a video.
        """
        cls(pk=video_id, sk=ROOT).update(actions=[
            cls.views.set(cls.views + 1)
        ])
        view_activity = cls(pk=user_id, sk=video_id).save()
        return view_activity

    @classmethod
    def is_user_has_viewed(
        cls,
        user_id: str,
        video_id: str,
    ) -> bool:
        """
        Implement "Query Pattern": System knows whether a user watched a video before.
        """
        return cls.get_one_or_none(hash_key=user_id, range_key=video_id) is not None

print("Mock some view activities, both user-2 and user-3 watched video-1")
Model.watch_video(user_id="user-2", video_id="video-1")
Model.watch_video(user_id="user-3", video_id="video-1")

print("Get number of views of a video-1")
video = Model.get_video("video-1")
rprint(f"Video {video.pk} has {video.views} views")

print("Check if a user has viewed a video")
rprint("has user-2 viewed video-1: ", Model.is_user_has_viewed("user-2", "video-1"))
rprint("has user-2 viewed video-2: ", Model.is_user_has_viewed("user-2", "video-2"))

Mock some view activities, both user-2 and user-3 watched video-1
Get number of views of a video-1


Check if a user has viewed a video


### Implement Video Recommendation Related Features

In [135]:
class Model(Model):
    @classmethod
    def push_new_videos(
        cls,
        user_id: str,
        limit_per_publisher: int =3,
        max_recommendations: int = 10,
    ):
        """
        Implement "User Interaction": A user can get recommendations of new videos based on
        his subscriptions.

        :param limit_per_publisher: for each publisher, max number of videos to push.
        :param max_recommendations: max number of recommendations to push.
        """
        publishers = cls.get_user_subscriptions(user_id).all()
        new_videos = list()
        for publisher in publishers:
            videos = Model.get_users_videos(publisher.sk, limit=limit_per_publisher).all()
            for video in videos:
                if cls.is_user_has_viewed(user_id, video.pk) is False:
                    new_videos.append(video)
                    if len(new_videos) >= max_recommendations:
                        return new_videos
        return new_videos


print("User 2 subscribe User 3")
Model.subscribe(subscriber_id="user-2", publisher_id="user-3", created_at=datetime(2020, 5, 1))

print("User 3 has three video: video 4, 5, 6")
print("User 2 has watched video 4 already")
Model.watch_video(user_id="user-2", video_id="video-4")

print("System should push video 5, 6 to user 2")
print("It won't recommend video-4 because he already watched it")
rprint([i.to_dict() for i in Model.push_new_videos("user-2")])

User 2 subscribe User 3
User 3 has three video: video 4, 5, 6
User 2 has watched video 4 already
System should push video 5, 6 to user 2
It won't recommend video-4 because he already watched it


### Integrate the ORM with your Application

In [136]:
class Model(Model):
    def is_user(self) -> bool:
        return self.entity_type == EntityTypeEnum.USER.value

    def is_video(self) -> bool:
        return self.entity_type == EntityTypeEnum.VIDEO.value

    def is_subscription(self) -> bool:
        return self.entity_type == EntityTypeEnum.SUBSCRIPTION.value

    @property
    def user_id(self) -> str:
        return self.pk

    @property
    def videos(self) -> T.List[Model]:
        if self.is_user() is False:
            raise TypeError("Only user can has videos property!")
        return self.get_users_videos(self.user_id, limit=999).all()

    @property
    def subscriptions(self) -> T.List[Model]:
        if self.is_user() is False:
            raise TypeError("Only user can has subscriptions property!")
        return self.get_user_subscriptions(self.user_id).all()

    @property
    def creator(self) -> Model:
        if self.is_video() is False:
            raise TypeError("Only video can has creator property!")
        return self.get_user(self.video_creator_id)

# Test your implementation
print("User 2 details:")
user2 = Model.get_user("user-2")
rprint(user2.to_dict())

print("User 2's videos:")
rprint([video.to_dict() for video in user2.videos])

print("User 2's has subscribed:")
rprint([user.to_dict() for user in user2.subscriptions])

print("Video 4 details:")
video4 = Model.get_video("video-4")
rprint(video4.to_dict())

print("Video 4's creator:")
rprint(video4.creator.to_dict())

User 2 details:


User 2's videos:


User 2's has subscribed:


Video 4 details:


Video 4's creator:


In [137]:
mock.stop() # stop mocking DynamoDB