![https://pieriantraining.com/](../PTCenteredPurple.png)


In this notebook, we are going to discuss the most important functions to handle [DynamoDB Tables](https://aws.amazon.com/dynamodb) using boto3

## Dynamo DB

Amazon DynamoDB is a fully managed [NoSQL](https://aws.amazon.com/nosql/) database service. It is schema-less and offers automatic scaling, allowing you to build applications without having to implement the underlying infrastructure.

The main benefit of NoSQL databases is that they are designed to handle various types of data and use cases that might not fit well with traditional relational databases.

Uses cases include Large-Scale Web Applications, Big Data and Analytics, IoT Data Storage as well as Document and Content Management.

## Table Creation
The most important part of our database is of course the table!<br />
We can create a [table](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/create_table.html) using client.create_table(**kwargs).
This function requires at least:

1) AttributeDefinitions
   - AttributeName
   - AttributeType (S|N|B) (String, Number, Binary)
2) TableName
3) KeySchema
   - AttributeName (Name of key attribute)
   - KeyType (Hash | Range) (Hash: Partition Key, Range: Sort Key. Here the partition key is the primary key. The sort key sorts the data internally and becomes relevant if there are duplicate primary keys. Otherwise it is optional)

If the *billing_mode* is "provisioned" (which it is by default) we also need to define the **ProvisionedThroughput**. This refers to the capacity that you provision for read and write operations on a table. It determines how much traffic your table can handle and helps ensure consistent performance. The pricing of this service scales with the ProvisionedThroughput


4) ProvisionedThroughput
   - ReadCapacityUnits (number of strongly consistent read operations per second that a table can handle. Strongly consistent reads return the most recent version of data, ensuring that you get the latest updates.)
   - WriteCapacityUnits (number of write operations per second that a table can handle)
  
Remember that in NoSQL databases like DynamoDB, your data model needs to be designed around the queries you plan to perform. If your queries involve complex filtering or sorting, you might need to consider a different approach or potentially a different database solution.

In [46]:
import boto3

In [47]:
client = boto3.client("dynamodb", region_name="us-east-1")

Let's build a movie rating database!

It is important to note, that each predefined attribute has to occur inside the key_schema.
We can add additional attributes later!

In [48]:
table_name = "Movies"
attributes = [
    {
        "AttributeName": "Title",
        "AttributeType" : "S"  # String
    },

    {
        "AttributeName": "Rating",
        "AttributeType" : "N"  # Number
    },

]

key_schema = [
    {
        'AttributeName': 'Title',
        'KeyType': 'HASH'  # Hash Key for Primary Key
    },
    {
        'AttributeName': 'Rating',
        'KeyType': 'RANGE'  # Range key for sorting
    }
]

provisioned_throughput = {
    'ReadCapacityUnits': 5,
    'WriteCapacityUnits': 5
}


Make sure that you have dynamo db rights within IAM

In [49]:
response = client.create_table(
        TableName=table_name,
        AttributeDefinitions=attributes,
        KeySchema=key_schema,
        ProvisionedThroughput=provisioned_throughput
)

In [50]:
response

{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Rating',
    'AttributeType': 'N'},
   {'AttributeName': 'Title', 'AttributeType': 'S'}],
  'TableName': 'Movies',
  'KeySchema': [{'AttributeName': 'Title', 'KeyType': 'HASH'},
   {'AttributeName': 'Rating', 'KeyType': 'RANGE'}],
  'TableStatus': 'CREATING',
  'CreationDateTime': datetime.datetime(2023, 8, 30, 10, 19, 40, 269000, tzinfo=tzlocal()),
  'ProvisionedThroughput': {'NumberOfDecreasesToday': 0,
   'ReadCapacityUnits': 5,
   'WriteCapacityUnits': 5},
  'TableSizeBytes': 0,
  'ItemCount': 0,
  'TableArn': 'arn:aws:dynamodb:us-east-1:472948420345:table/Movies',
  'TableId': '620571b1-3b27-4cd9-b40e-22fe30eb032c',
  'DeletionProtectionEnabled': False},
 'ResponseMetadata': {'RequestId': '2JHSKS68PTSA0O73E9HG1318SVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:19:40 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length'

You can now head over to dynamoDB within aws and check if you can find the table

## Data Insertion

Let's define a sample entry:

You can use [put_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/put_item.html) to update the table. It accepts
1) The TableName
2) An item

Note, that you need to pass the data type for each field as well!

In [51]:
entry = {"Title": {"S": "The Matrix"},
         "Director": {"S": "Lana Wachowski"},
         "Year": {"N": "1999"},
         "Rating": {"N": "5"}}

In [53]:
client.put_item(TableName="Movies", Item=entry)

{'ResponseMetadata': {'RequestId': 'MJHEBN4BP7IE20F0NM4627AL2RVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:20:09 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'MJHEBN4BP7IE20F0NM4627AL2RVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

## Grab data

Note that aws only updates the stored elements every 6 hours. Thus you probably won't find this item online right now!
But we can of course query it using [client.get_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/get_item.html)(TableName, Key) (or click in *Inspect Table Elements*)

Note, that your key has to match the schema, thus you need to provide the title and the rating

In [54]:
item_key = {"Title": {"S": "The Matrix"},
            "Rating": {"N": "5"}
           }


In [55]:
response = client.get_item(TableName="Movies", Key=item_key)

In [56]:
response

{'Item': {'Title': {'S': 'The Matrix'},
  'Director': {'S': 'Lana Wachowski'},
  'Year': {'N': '1999'},
  'Rating': {'N': '5'}},
 'ResponseMetadata': {'RequestId': 'H4D8BGNQ9JKQKPJMC6BN3OCBF7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:20:11 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '110',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'H4D8BGNQ9JKQKPJMC6BN3OCBF7VV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '992882519'},
  'RetryAttempts': 0}}

In [57]:
response["Item"]

{'Title': {'S': 'The Matrix'},
 'Director': {'S': 'Lana Wachowski'},
 'Year': {'N': '1999'},
 'Rating': {'N': '5'}}

## Updates

We can alter an entry using the [update_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/update_item.html) function.
It accepts the TableName, the (primary) key of the element you want to update and the [UpdateExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html) ([Link to boto documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/update_item.html#:~:text=statistics%20are%20returned.-,UpdateExpression,-(string)%20%E2%80%93).
Additionally, we need to pass the new value to **ExpressionAttributeValues**
Let's change the directors to include both directors

In [58]:
key = {"Title": {"S": "The Matrix"}, "Rating": {"N": "5"}}
update = "set Director = :r"

client.update_item(TableName="Movies",
                   Key=key,
                   UpdateExpression=update,
                   ExpressionAttributeValues={':r': {'S': 'Lana Wachowski, Lilly Wachowski'}}
)

{'ResponseMetadata': {'RequestId': 'HQSL7VC00J7CS50Q3NK0P8DFJBVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:20:12 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'HQSL7VC00J7CS50Q3NK0P8DFJBVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

In [59]:
response = client.get_item(TableName="Movies", Key=key)

In [60]:
response["Item"]["Director"]

{'S': 'Lana Wachowski, Lilly Wachowski'}

Note, that you cannot update the key elements!

## Deleting an Item

To delete an item we can use the [delete_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/delete_item.html) function.
It accepts the TableName and the Key

In [61]:
key = {"Title": {"S": "The Matrix"}, "Rating": {"N": "5"}}
client.delete_item(TableName="Movies", Key=key)

{'ResponseMetadata': {'RequestId': 'QIHO21VTAJ0TNLUNO8PL18E8HFVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:20:15 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'QIHO21VTAJ0TNLUNO8PL18E8HFVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

In [62]:
response = client.get_item(TableName="Movies", Key=key)

In [63]:
response

{'ResponseMetadata': {'RequestId': 'KCI55DNM2FVF46TR8ILP8H9THVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:20:15 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'KCI55DNM2FVF46TR8ILP8H9THVVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

You can see that the item got deleted

## Querying

Of course, one of the most important operations when handling databases is querying.
Before diving into this topic let us, at first, fill the table with some data:

To this end we are going to use batch operations:
## Batch Operations

## Batch Write
To write (or update / delete) multiple elements to the table we can use [batch_write_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/batch_write_item.html)

It accepts RequestItems, which is either a *PutRequest* or *DeleteRequest*. 
Note, that for put requests, you have to use **Item** as dictionary key.
For delete requests, you have to use **Key**. The individual items look similar to above

In [64]:
movies = [
        {"Title": "The Matrix",
         "Director": "Lana Wachowski",
         "Year": "1999",
         "Rating": "4.7"},
    
        {"Title": "The Matrix 2",
             "Director": "Lana Wachowski",
             "Year": "2003",
             "Rating": "4.6"},

        {"Title": "The Matrix 3",
             "Director": "Lana Wachowski",
             "Year": "2003",
             "Rating": "4.5"},

        {"Title": "Inception",
             "Director": "Christopher Nolan",
             "Year": "2010",
             "Rating": "4.6"},
    
        {"Title": "Saving Private Ryan",
             "Director": "Steven Spielberg",
             "Year": "1999",
             "Rating": "4.7"},

]

In [65]:
batch_request = []
for movie in movies:
    batch_request.append({
        'PutRequest': {
            'Item': {
                'Title': {'S': movie['Title']},
                'Rating': {'N': str(movie['Rating'])},
                'Director': {'S': movie['Director']},
                'Year': {'N': str(movie['Year'])}
            }
        }
    })


In [66]:
batch_request

[{'PutRequest': {'Item': {'Title': {'S': 'The Matrix'},
    'Rating': {'N': '4.7'},
    'Director': {'S': 'Lana Wachowski'},
    'Year': {'N': '1999'}}}},
 {'PutRequest': {'Item': {'Title': {'S': 'The Matrix 2'},
    'Rating': {'N': '4.6'},
    'Director': {'S': 'Lana Wachowski'},
    'Year': {'N': '2003'}}}},
 {'PutRequest': {'Item': {'Title': {'S': 'The Matrix 3'},
    'Rating': {'N': '4.5'},
    'Director': {'S': 'Lana Wachowski'},
    'Year': {'N': '2003'}}}},
 {'PutRequest': {'Item': {'Title': {'S': 'Inception'},
    'Rating': {'N': '4.6'},
    'Director': {'S': 'Christopher Nolan'},
    'Year': {'N': '2010'}}}},
 {'PutRequest': {'Item': {'Title': {'S': 'Saving Private Ryan'},
    'Rating': {'N': '4.7'},
    'Director': {'S': 'Steven Spielberg'},
    'Year': {'N': '1999'}}}}]

In [67]:
response = client.batch_write_item(
    RequestItems={
        "Movies": batch_request  # You can also update multiple tables at once
        #"Table2": batch_request2
    }
)


In [None]:
response

## Batch Get
We can use the [batch_get_item](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/batch_get_item.html) function to grab items based on their primary keys.
Its syntax is similar to above. The only difference is, that you pass a list of keys to the **Keys** key in your batch_request dictionary

In [None]:
batch_request_2 = {"Keys": []}
for movie in movies:
    batch_request_2["Keys"].append({
            'Title': {'S': movie['Title']},
            'Rating': {'N': str(movie['Rating'])},
        }
    )


In [None]:
batch_request_2

In [None]:
client.batch_get_item(
    RequestItems={
        "Movies": batch_request_2
    }
)

## Scanning
To obtain all elements from the database we can use the [scan](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/scan.html) function to which you simply pass the TableName.

Note that the scan function can only return 1MB of data thus you might have to iterate (paginate) through the database

In [None]:
items = []
response = client.scan(TableName="Movies")
items.extend(response["Items"])

## Code for paginating through results
while "LastEvaluatedKey" in response.keys():
    response = client.scan(TableName="Movies", ExclusiveStartKey=response["LastEvaluatedKey"])
    items.extend(response["Items"])


In [None]:
items

You can also filter these results using [FilterExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.FilterExpression)

Similar to above, when updating elements, we need to use **ExpressionAttributeValues**<br />
Let's filter all movies with a rating of 4.7 or higher

You can find all possible filters [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.KeyConditionExpressions)

| Expression             | Description                                                                                          |
|------------------------|------------------------------------------------------------------------------------------------------|
| `a = b`                | true if the attribute `a` is equal to the value `b`                                                  |
| `a < b`                | true if `a` is less than `b`                                                                         |
| `a <= b`               | true if `a` is less than or equal to `b`                                                             |
| `a > b`                | true if `a` is greater than `b`                                                                      |
| `a >= b`               | true if `a` is greater than or equal to `b`                                                          |
| `a BETWEEN b AND c`    | true if `a` is greater than or equal to `b`, and less than or equal to `c`                           |
| `begins_with (a, substr)` | true if the value of attribute `a` begins with a particular substring                             |


In [None]:
client.scan(TableName="Movies",
            FilterExpression="Rating >= :num",  # Rating has to be larger than num
            ExpressionAttributeValues={":num":{"N":"4.7"}}  # set num to 4.7
            )

In [None]:
client.scan(TableName="Movies",
            FilterExpression="begins_with(Title, :title)",  # Filter all titles that begin with title
            ExpressionAttributeValues={":title":{"S":"The"}}  # set title to "The"
            )

## Querying
Now it's time to query the database. Note that in dynamo db you always need to pass the primary (partition key) when querying. When you want to use other Keys, you need to use scan.

You should always opt to use the query function as it runs much faster than scan. [More Information](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html)

In [None]:
client.query(TableName="Movies",
             KeyConditionExpression='Title = :title',
            ExpressionAttributeValues={':title': {'S': "The Matrix"}}
)

We can now create any queries we want using the partition and sorting key. If you try using another attribute (e.g. year) boto3 will throw an exception as these cannot be used within the KeyConditionExpression.

## Using Resources
The querying and scanning operations can be performed in a simpler way using **Ressources**. <br />
At first we import **Key** and **Attr** from boto3.dynamodb.conditions

The following table visualizes all possible comparison / filtering operations:
| Method       | Description                                             |
|--------------|---------------------------------------------------------|
| `begins_with`| Checks for a prefix.                                     |
| `between`    | Checks for a value between two specified values.         |
| `contains`   | Checks for a substring or an element in a list.          |
| `eq`         | Checks for equality.                                     |
| `exists`     | Checks if the attribute exists.                          |
| `gt`         | Checks if the value is greater than the specified value. |
| `gte`        | Checks if the value is greater than or equal to.         |
| `is_in`      | Checks if the value is in a list of values.              |
| `lt`         | Checks if the value is less than the specified value.    |
| `lte`        | Checks if the value is less than or equal to.            |
| `ne`         | Checks for inequality.                                   |
| `not_exists` | Checks if the attribute does not exist.                  |
| `size`       | Returns the size of the attribute.                       |


In [43]:
from boto3.dynamodb.conditions import Key, Attr


In [44]:
dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
table = dynamodb.Table('Movies')


In [None]:
dir(Attr)  # List of all attributes including the comparison operators

In [None]:
table.scan(FilterExpression=Attr('Year').gte(2003)) # greater or equal than 2001

In [None]:
table.scan(FilterExpression=Attr('Year').gte(2003) & Attr("Rating").gt(4.5)) # greater or equal than 2001

Oh! Per default, float types are not supported. However, we can convert them to decimals first!

In [None]:
from decimal import Decimal
table.scan(FilterExpression=Attr('Year').gte(2003) & Attr("Rating").gt(Decimal(4.5))) # greater or equal than 2001

## Helper functions
There are some very helpful functions to obtain information about your database!

In [None]:
table.key_schema

In [None]:
client.describe_table(TableName="Movies")

## Table Deletion
To delete a table you can simply run table.delete() or client.delete_table(TableName)

In [45]:
table.delete()

{'TableDescription': {'TableName': 'Movies',
  'TableStatus': 'DELETING',
  'ProvisionedThroughput': {'NumberOfDecreasesToday': 0,
   'ReadCapacityUnits': 5,
   'WriteCapacityUnits': 5},
  'TableSizeBytes': 0,
  'ItemCount': 0,
  'TableArn': 'arn:aws:dynamodb:us-east-1:472948420345:table/Movies',
  'TableId': '0a67e98d-6723-4517-b3a1-4d096202acef',
  'DeletionProtectionEnabled': False},
 'ResponseMetadata': {'RequestId': 'JONLVU259SAFATA69H4699SUS7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:19:33 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '348',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'JONLVU259SAFATA69H4699SUS7VV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '1285755675'},
  'RetryAttempts': 0}}

## Global Secondary Index

To be able to leverage additional data, we can use a [Global Secondary Index](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.scenario) ([Doc](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/create_table.html#:~:text=(string)%20%E2%80%93-,GlobalSecondaryIndexes,-(list)%20%E2%80%93)).
You can either update a database or create a new one ([Link])(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.OnlineOps.html#GSI.OnlineOps.Creating)

Let's add the secondary index to this table:<br />

1) Add the attribute to the attributes list we initially defined
2) Define the key schema for the Global Secondary Index
3) Define the provisioned throughput for the GSI
4) Define the Projection which describes the attributes that are copied from the table to the index. Usually you want to copy all of the table's attributes.
5) Use client.update_table(TableName, AttributeDefinitions, GlobalSecondaryIndexUpdates) to add the secondary index


In [68]:
attributes.append({"AttributeName": "Director", "AttributeType" : "S"})

In [69]:
attributes

[{'AttributeName': 'Title', 'AttributeType': 'S'},
 {'AttributeName': 'Rating', 'AttributeType': 'N'},
 {'AttributeName': 'Director', 'AttributeType': 'S'}]

In [70]:
gsi_key_schema = [
    {
        'AttributeName': 'Director',
        'KeyType': 'HASH'
    }
]

gsi_provisioned_throughput = {
    'ReadCapacityUnits': 5,
    'WriteCapacityUnits': 5
}

response = client.update_table(
    TableName="Movies",
    AttributeDefinitions=attributes,
    GlobalSecondaryIndexUpdates=[
        {
            'Create': {  # Create secondary Index
                'IndexName': "idx1",  # The name of the global secondary index. Must be unique only for this table.
                'KeySchema': gsi_key_schema,
                'Projection': {
                    'ProjectionType': 'ALL'  # Project all information
                },
                'ProvisionedThroughput': gsi_provisioned_throughput
            }
        }
    ]
)

response

{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Director',
    'AttributeType': 'S'},
   {'AttributeName': 'Rating', 'AttributeType': 'N'},
   {'AttributeName': 'Title', 'AttributeType': 'S'}],
  'TableName': 'Movies',
  'KeySchema': [{'AttributeName': 'Title', 'KeyType': 'HASH'},
   {'AttributeName': 'Rating', 'KeyType': 'RANGE'}],
  'TableStatus': 'UPDATING',
  'CreationDateTime': datetime.datetime(2023, 8, 30, 10, 19, 40, 269000, tzinfo=tzlocal()),
  'ProvisionedThroughput': {'NumberOfDecreasesToday': 0,
   'ReadCapacityUnits': 5,
   'WriteCapacityUnits': 5},
  'TableSizeBytes': 0,
  'ItemCount': 0,
  'TableArn': 'arn:aws:dynamodb:us-east-1:472948420345:table/Movies',
  'TableId': '620571b1-3b27-4cd9-b40e-22fe30eb032c',
  'GlobalSecondaryIndexes': [{'IndexName': 'idx1',
    'KeySchema': [{'AttributeName': 'Director', 'KeyType': 'HASH'}],
    'Projection': {'ProjectionType': 'ALL'},
    'IndexStatus': 'CREATING',
    'Backfilling': False,
    'ProvisionedThroughput'

Note that this might take some time in the background. While it is running, you will receive the following exception:

ClientError: An error occurred (ValidationException) when calling the Query operation: Cannot read from backfilling global secondary index: idx1

You can now answer the question: *What movies were directed by Lana Wachowski?* without using the scan operation

In [79]:
client.query(TableName="Movies",
             KeyConditionExpression='Director = :d',
             IndexName='idx1',
            ExpressionAttributeValues={':d': {'S': "Lana Wachowski"}}
)

{'Items': [{'Title': {'S': 'The Matrix 2'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '2003'},
   'Rating': {'N': '4.6'}},
  {'Title': {'S': 'The Matrix 3'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '2003'},
   'Rating': {'N': '4.5'}},
  {'Title': {'S': 'The Matrix'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '1999'},
   'Rating': {'N': '4.7'}}],
 'Count': 3,
 'ScannedCount': 3,
 'ResponseMetadata': {'RequestId': 'UGQLPU4EBS5BKHUTT51MO9UPUVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:26:53 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '354',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'UGQLPU4EBS5BKHUTT51MO9UPUVVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '750746483'},
  'RetryAttempts': 0}}

We just saw, that using a secondary index, we can query for more than the primary index!

### Create new table with secondary index

To learn how to directly create a table with a secondary index, let's quickly delete our Movies database

In [80]:
client.delete_table(TableName="Movies")

{'TableDescription': {'TableName': 'Movies',
  'TableStatus': 'DELETING',
  'ProvisionedThroughput': {'NumberOfDecreasesToday': 0,
   'ReadCapacityUnits': 5,
   'WriteCapacityUnits': 5},
  'TableSizeBytes': 0,
  'ItemCount': 0,
  'TableArn': 'arn:aws:dynamodb:us-east-1:472948420345:table/Movies',
  'TableId': '620571b1-3b27-4cd9-b40e-22fe30eb032c',
  'DeletionProtectionEnabled': False},
 'ResponseMetadata': {'RequestId': '0EKEH7A20MLJF8I6SEI8892ODBVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:27:14 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '348',
   'connection': 'keep-alive',
   'x-amzn-requestid': '0EKEH7A20MLJF8I6SEI8892ODBVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2901361696'},
  'RetryAttempts': 0}}

We can start using the same attributes as before and add the Director to the attributes.


In [81]:
table_name = "Movies"
attributes = [
    {
        "AttributeName": "Title",
        "AttributeType" : "S"  # String
    },

    {
        "AttributeName": "Rating",
        "AttributeType" : "N"  # Number
    },
    
    {
        "AttributeName": "Director",  # Note that this is not part of the key schema
        "AttributeType" : "S"  # String
    },


]

key_schema = [
    {
        'AttributeName': 'Title',
        'KeyType': 'HASH'  # Hash Key for Primary Key
    },
    {
        'AttributeName': 'Rating',
        'KeyType': 'RANGE'  # Range key for sorting
    }
]

provisioned_throughput = {
    'ReadCapacityUnits': 5,
    'WriteCapacityUnits': 5
}


When creating the table, you need to pass the **GlobalSecondaryIndexes** argument, to which you need to pass the *IndexName*, *KeySchema*, the *Projection* and the *ProvisionedThroughput*.
*KeySchema* and *ProvisionedThroughput* are identical to above. 

Let's create the table with the secondary index for the Director 

In [82]:
response = client.create_table(
        TableName=table_name,
        AttributeDefinitions=attributes,
        KeySchema=key_schema,
        ProvisionedThroughput=provisioned_throughput,
        GlobalSecondaryIndexes=[
        {
            'IndexName': 'idx1',  # The name of the global secondary index. Must be unique only for this table.
            'KeySchema': [
               {
                  'AttributeName': 'Director',
                  'KeyType': 'HASH'
               }
             ],
             'Projection': {
               'ProjectionType': 'ALL'  # Project all information
             },
             'ProvisionedThroughput': {
                  'ReadCapacityUnits': 10,
                  'WriteCapacityUnits': 10
             }
        }
    ],

)

Let's add the data from above once more

In [87]:
movies = [
        {"Title": "The Matrix",
         "Director": "Lana Wachowski",
         "Year": "1999",
         "Rating": "4.7"},
    
        {"Title": "The Matrix 2",
             "Director": "Lana Wachowski",
             "Year": "2003",
             "Rating": "4.6"},

        {"Title": "The Matrix 3",
             "Director": "Lana Wachowski",
             "Year": "2003",
             "Rating": "4.5"},

        {"Title": "Inception",
             "Director": "Christopher Nolan",
             "Year": "2010",
             "Rating": "4.6"},
    
        {"Title": "Saving Private Ryan",
             "Director": "Steven Spielberg",
             "Year": "1999",
             "Rating": "4.7"},

]

batch_request = []
for movie in movies:
    batch_request.append({
        'PutRequest': {
            'Item': {
                'Title': {'S': movie['Title']},
                'Rating': {'N': str(movie['Rating'])},
                'Director': {'S': movie['Director']},
                'Year': {'N': str(movie['Year'])}
            }
        }
    })


response = client.batch_write_item(
    RequestItems={
        "Movies": batch_request
    }
)


In [89]:
client.query(TableName="Movies",
             KeyConditionExpression='Director = :d',
             IndexName='idx1',
            ExpressionAttributeValues={':d': {'S': "Lana Wachowski"}}
)

{'Items': [{'Title': {'S': 'The Matrix 2'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '2003'},
   'Rating': {'N': '4.6'}},
  {'Title': {'S': 'The Matrix 3'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '2003'},
   'Rating': {'N': '4.5'}},
  {'Title': {'S': 'The Matrix'},
   'Director': {'S': 'Lana Wachowski'},
   'Year': {'N': '1999'},
   'Rating': {'N': '4.7'}}],
 'Count': 3,
 'ScannedCount': 3,
 'ResponseMetadata': {'RequestId': 'RB3U2OTUF6NSULETJQHT43SK6VVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Wed, 30 Aug 2023 08:28:09 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '354',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'RB3U2OTUF6NSULETJQHT43SK6VVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '750746483'},
  'RetryAttempts': 0}}