## DynamoDB Intro Workbook ##

This workbook will walk through the basics of getting started on DynamoDB. It is intended to demostrate the API building blocks that can then be combined for complex modeling (in subsequent labs). This lab uses Python 3 and the native DynamoDB SDK. The client setup and code samples are demostrated and interactive.  

<em>The AWS Boto3 SDK is pre-installed and a role with permissions to create and modify DynamoDB resources has been assigned to this instance. If you are running this on another Jupyter notebook you may need to install boto3 and configure permissions (instructions can be found in the README.</em>  

Let's get started! 

DynamoDB is a fast serverless scalable key-value and document database hosted on AWS.

The basic functional abstraction of DynamoDB is a Table. Data is organzed on a table with an index, which is required on all tables. An index can be a simple primary key, which consists of a partition key <em>(used for 1:1 modeling of key-value data)</em>. Or a composite primary key, which is created using partition and sort key <em>(used for 1:n modeling of parent-child relationships and related data)</em>.

Update the 'Name' value in the cell below with your initials. This will be the table name that you will working with. While the cell below is active, click 'Run' to initialize the DynamoDB client and import the JSON library. <em> If at any point in the Lab you disconnect, you may need to rerun this cell to reactivate the client. </em>

In [1]:
import boto3 # <----- This is the Python SDK to interact with DynamoDB
import json

Name='initialshereXX-RecipeTable'

# Set up the DynamoDB client (a low level API). All requests are HTTPS requests by default. 
client = boto3.client('dynamodb', region_name = 'us-east-1')
print('\n DynamoDB client is now active. \n')


 DynamoDB client is now active. 



So, let's create our first table. The code below specifies the base table index attributes and table name. While the below cell is active, select 'Run.'

It is going to take a minute to complete the creation of the new table. While it is pending, let's take a look at the code. You will notice that there are a couple important variables:  
1. Table Name: this must be unique within a region by account.
2. Key Schema: in the below example the primary key is a composite key with the partition (or HASH) key as RecipeName and the sort (or RANGE) key as Ingredients. You will notice that both are of AttributeType 'S' that means both are strings. DynamoDB will enforce data types only on specified index attributes, all other attributes are flexible in type or even whether they present.

In [None]:
print('Submitting table creation request... this may take a minute or two. When it completes, it will print details below... \n')

# Create the DynamoDB table.
response = client.create_table(
    TableName=Name, # <----- This is the table name from above
    KeySchema=[
        {
            'AttributeName': 'RecipeName', # <----- This is the Partition Key
            'KeyType': 'HASH' # <----- It is also referred to as the Hash key, because DynamoDB uses it to hash the keys internally.
        },
        {
            'AttributeName': 'RecipeStuff', # <----- This is the Sort Key
            'KeyType': 'RANGE' # <----- It is also referred to as a Range key, because it supports range queries.
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'RecipeName',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'RecipeStuff',
            'AttributeType': 'S'
        },
    ],
    BillingMode= 'PAY_PER_REQUEST' # <----- We'll tackle this in the understanding capacity planning section
)

# Wait until the table exists.
client.get_waiter('table_exists').wait(TableName=Name)

# Print out detailed data about the new table.
response = client.describe_table(TableName=Name)

print(json.dumps(response, indent=1, default=str))
print("Table is active... \n")


Now, what did we just do? We created our first DynamoDB table called <em>YourInitials-Recipes</em>.  

We set the Partition Key to the Recipe and the Sort Key to RecipeStuff. While it doesn't have data yet, this table could look like:  

![RecipeStuffMC.png](attachment:RecipeStuffMC.png)  

We chose Recipe as the Partition Key because we are going to build a mock cooking application backend where we track all of the best recipes. Recipe names are the natural way to organize the data based on our application retrieval pattern, which is primarily to get recipes by name. Names also provide high cardinality and reasonably even distribution of access, which conforms to DynamoDB best practices when choosing a partition key ([Best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html "DynamoDB Best Partices")). 

'RecipeStuff' is a generalized name for our related items Sort Key. We deliberaly chose a generic name, because we can add any elements related to a specific recipe - these could be ingredients, instructions, authors, even reviews! This is our first 1:N relationship. One recipe to many (any) related items.

If you are familiar with relational databases, you may wonder - <em> 'won't I have duplicate ingredients across recipes?'</em> or <em>'shouldn't ingredients have their own table?' </em> You have identified a fundamental difference between relational and NoSQL data models. With NoSQL design, items are denormalized into prebuilt aggregates that are optimal for performace and use minimal CPU overhead to retrieve data. This design enables a table to horizontally scale and serve millions of requests per second without impacting performance. In the above example, we can very efficiently retrieve all the related items of a single recipe rather than joining across a recipes table, an instructions table, and an ingredients table... but we are getting ahead of ourselves.

Let's put some data into the table to see what it looks like!

We are going to load a recipe instruction record, or 'How To.' This item (or row) will have a partition key matching the recipe name, but the sort key will simply be "HowTo#RecipeName." We can programmatically retrieve the instructions for any recipe we upload by using the recipe name and the known pattern of the sort key. 
  
Inspect the below code and click 'Run' to insert the instructions for Egg Nog (a festive favorite)!

In [None]:
response = client.put_item(
    TableName=Name, # <---- this is the same table we created above
    Item={
        'RecipeName': {
            'S': 'Egg Nog'
        },
        'RecipeStuff': {
            'S': 'HowTo#Egg Nog'
        },
        'HowTo': {
            'S': 'Blend together eggs, sugar, milk, vanilla and nutmeg. Serve chilled.\n'
        }
    },
    ReturnConsumedCapacity='TOTAL'
)
print(json.dumps(response, indent=1, default=str) + "\n")

Great! We can see by the HTTPStatusCode: 200 that this was successful! We just inserted this item:

![EggNogRecipe.png](attachment:EggNogRecipe.png)  

We can even see that this request consumed 1 CapacityUnit. We will discuss that more later, but is important to understand that DynamoDB measures usage based on reads and writes per second by object size. 

Let's go ahead and insert each of the ingredients related to the Egg Nog recipe. We can use the BatchWriteItem operation to insert many items at once (up to a maximum of 25 items). 

We've left the last item generic - so you can add your own twist to the Egg Nog recipe. Update the last item in the code below, and when you're ready click 'Run.'

In [None]:
response = client.batch_write_item(
    RequestItems={
        Name : [ # <---- this indicates we are using the same table we created above
            {
                'PutRequest': { # <---- Each of these is an ingredient to insert. BWI also supports DeleteRequest.
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog' # <---- Partition Key 
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#Eggs'  # <---- Sort Key: Ingredient associated with Egg Nog (1:n relationship)
                        },
                        'Quantity': {
                            'S': '2'
                        },                        
                    }
                }
            }   
            ,
            {
                'PutRequest': {
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog'
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#Sugar' 
                        },
                        'Quantity': {
                            'S': '3 TBsp'
                        },                        
                    }
                }
            },
            {
                'PutRequest': {
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog'
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#Milk' 
                        },
                        'Quantity': {
                            'S': '2 1/3 cups'
                        },                        
                    }
                }
            },
            {
                'PutRequest': {
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog'
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#Vanilla Extract' 
                        },
                        'Quantity': {
                            'S': '1 tsp'
                        },                        
                    }
                }
            },
            {
                'PutRequest': {
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog'
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#Nutmeg' 
                        },
                        'Quantity': {
                            'S': '1 dash'
                        },                        
                    }
                }
            },
            {
                'PutRequest': {
                    'Item': {
                        'RecipeName': {
                        'S': 'Egg Nog'
                        },
                        'RecipeStuff': {
                            'S': 'Ingredient#IngredientName' # <---- ** Your ingredient here **
                        },
                        'Quantity': {
                            'S': 'IngredientQuantity' # <---- ** Quantity of the ingredient here **
                        },                        
                    }
                }
            }]
            },
    ReturnConsumedCapacity='TOTAL'
    )
                

print(json.dumps(response, indent=1, default=str) + "\n")

Great - we inserted all the ingredients as a single request! Did you see that the ConsumedCapacity was higher as we added many items simultaneously?

Our table should now look like :  
<em>(plus your ingredient)</em>

  
![EggNogIngredients.png](attachment:EggNogIngredients.png)
 

You may notice that not all the items conform to the same schema. There is also no limitation on adding different types of items <em>(Ingredients and HowTo's)</em> to the index as long as each item adhers to the data type (String) of the index we specified at table creation. If you're wondering why we've labeled each ingredient it is because we can create focused queries by leveraging this label. We will discuss this indepth when we work with the QUERY API, but this labeling is a foundational data modeling construct.


Let's take a look at our table to see what it looks like in realtime. <em>You can come back and run this at any time during the lab.</em>

The Scan API will retrieve all the items on a table. It can paginate through the items and return them in responses that are up to 1MB. Scans can be filtered on conditions and limited by item count. It is a good way for us to validate our work. In general, Scans are not used as part of a regular application design because they can consume many read units and under-perform compared to the targeted retrieval APIs GET or QUERY. DynamoDB workloads tend to be transactional in nature, which enables very specific data retrieval not full table scans.

Run the following cell to scan the current table...

In [None]:
import pandas as pd  # < -- Let's use Pandas & HTML to make the scan response more human readable
from IPython.core.display import HTML

response = client.scan( # < -- Scan request. Note it will return a max of 10 items.
    TableName=Name,
    Limit=10,
    ReturnConsumedCapacity='TOTAL')

# ** Pandas tranformation of response to a table and render it as HTML ** #
dataMap = json.loads(json.dumps(response))['Items']
HTML(pd.DataFrame(dataMap, columns=["RecipeName", "Ingredients", "Quantity", "HowTo"]).style.render().replace('nan',''))


Did you notice how each item contains the attribute type as well as the value? So far we've only added strings.

In [None]:
#Run this cell to see the actual reponse from the scan. This is what your application will parse and use.

print(json.dumps(response, indent=1, default=str) + "\n")

Typically, transactional workloads use either GetItem or Query to retrieve records. Let's assume that our backend system has a function that retrieves the instructions for a recipe. This is a very straightforward request, and we've modeled our data in such a way that it can be retrieved using GetItem. The GetItem function requires the full primary key (just the partition key if it is a simple primary key, or the parition and sort key combination if it is a composite primary key) to return a single item. There is also a BatchGetItem API, which allows for a single API request to return 100 items.

For our system, let's test out the GetItem function below. Inspect and 'Run' the code below:

In [None]:
#TODO GetItem - test

response = client.get_item(
    TableName=Name, # <---- this is the same table we created above
    Item={
        'RecipeName': {
            'S': 'Egg Nog'
        },
        'RecipeStuff': {
            'S': 'HowTo#Egg Nog'
        }
    },
    ReturnConsumedCapacity='TOTAL'
)
print(json.dumps(response, indent=1, default=str) + "\n")

We see that the instructions are missing the ingredient that you added above! How would the author of this recipe change the instructions? 

They could either use the PutItem API above to overwrite the item with a new item. Or, they could use the UpdateItem API to simply update the existing attribute "HowTo." The UpdateItem can also be used to create counters to increment or decrement number values. 

In [None]:
#TODO UpdateItem

#Return Consumed Capacity
#Return Old Values

By inspecting the response, we can see several interesting attributes. We used the flag to return the old value - this can be very handy when building a programmatic interface to update and retrieve an old or new value from a counter or update. We also see that the update operation consumed the same writes as the PutItem we used to initially insert the 'HowTo.' This is because the consumed capacity of an update corresponds to the total row size, not the update size. 

<em>It is also worth noting that every API to DynamoDB will consume only Read Units or Write Units. While UpdateItem and PutItem operations can have conditions we will demonstrate later, such as 'exists' or '>, =, etc...' they will only consume write units. </em>
    
Now that we've updated the item, let's work with Querying the Recipe. The first Query we want should return all items related to a recipe. 

Inspect and 'Run' the code below.
    

In [None]:
#TODO Query - partition key Egg Nog

What if we just want to retrieve the ingredients of the recipe? Take a look at how we can refine the same query by using Key Condition Expression.

Inspect and 'Run' the code below.

In [None]:
#TODO Query - pk = Egg Nog sk begins_with Ingredients#

This could be adapted to any type of related item. For instance, if reviews were also inserted as related items we could retrieve them in sorted order!

The following code block inserts 3 reviews for this Egg Nog. The last review is blank and needs your input!

In [None]:
#TODO WriteItem - 3 reviews

# <---point out timestamp
# <--- condition exists ??

Concatenating a timestamp to the Sort Key is a great way to organize the data as it is inserted. Let's Query the Egg Nog reviews!

Inspect and 'Run' the code below.

In [None]:
#TODO Query - Begins with Review# ScanIndex False

Congratulations! You've worked through the foundational APIs used to work with DynamoDB. 

To recap, you:
1. Created a Table with a composite primary key.
2. Inserted heterogenous items using PutItem and BatchWriteItem.
3. Scanned the Table to retrieve all the items.
4. Updated an Item
5. Retrieved a single item using GetItem.
6. Queried for all items of a Partition Key.
7. Created a conditional PutItem.
8. By using begins_with you queried for specific types of related items for a given Partition Key in order.

Hopefully you also gleened the data modeling that made this go smoothly. We used a high cardinality Partition Key with a composity Primary Key that allowed for targeted queries on the Sort Key's related items that matched our mock cooking application's access patterns. 

In the next workbook we will use these concepts to build more complex data models and explore:
- Global and Local Secondary Indexes
- N:M Data Models
- Adjacency Lists
- Atomic Counters
