![https://pieriantraining.com/](../PTCenteredPurple.png)


## Capstone Project: AWS Media Library Management System

In this capstone project, we'll delve into the integration of AWS services with Python through Boto3. Our objective is to construct a robust system for users to seamlessly upload media files such as images, videos, and audio, which will then be carefully processed, cataloged, and securely stored on AWS.

This project offers the opportunity to apply your knowledge of AWS and Boto3 in a practical scenario.


### Components:

1. S3 Bucket - To store the uploaded media files.
2. DynamoDB - To store the metadata of each media file.
3. Lambda Functions - To process media files after upload.
4. Searching - To obtain stored results and download them.


## Tasks

### 1. Infrastructure
1. Create an S3 bucket for storing media files.
2. Set up a DynamoDB table to store the metadata of the media files. If you want, you can create a secondary key for the file type
3. Configure / Verify your IAM roles and permissions needed for Lambda, S3, DynamoDB

In [1]:
### Create S3 Bucket ###
import boto3

s3_client = boto3.client('s3', region_name="us-east-1")
s3_client.create_bucket(Bucket='media-library-bucket-capstone')


{'ResponseMetadata': {'RequestId': 'X5S1XF7P787EZYGT',
  'HostId': 'gTpirE87Stxyyi8ojrqd++ds5fHzBJTr4zl9Tvk6BdXH4zibmLw3/89SxZFw+a4Wtq6HRWJYU1U=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'gTpirE87Stxyyi8ojrqd++ds5fHzBJTr4zl9Tvk6BdXH4zibmLw3/89SxZFw+a4Wtq6HRWJYU1U=',
   'x-amz-request-id': 'X5S1XF7P787EZYGT',
   'date': 'Mon, 11 Sep 2023 07:46:26 GMT',
   'location': '/media-library-bucket-capstone',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': '/media-library-bucket-capstone'}

2. We will create a secondary index for the filetype to be able to query for different types without scanning

In [2]:
### Basic Table Definition ###
table_name = "MediaMetadata"
attributes = [
    {
        "AttributeName": "id",
        "AttributeType" : "S"  # String
    },

    {
        "AttributeName": "filetype",
        "AttributeType" : "S"  # Number
    },
    
    {
        "AttributeName": "size",
        "AttributeType" : "N"  # String
    },


]

key_schema = [
    {
        'AttributeName': 'id',
        'KeyType': 'HASH'  # Hash Key for Primary Key
    },
    {
        'AttributeName': 'size',
        'KeyType': 'RANGE'  # Range key for sorting
    }
]

provisioned_throughput = {
    'ReadCapacityUnits': 5,
    'WriteCapacityUnits': 5
}


In [4]:
dynamo_client = boto3.client('dynamodb', region_name="us-east-1")

response = dynamo_client.create_table(
        TableName=table_name,
        AttributeDefinitions=attributes,
        KeySchema=key_schema,
        ProvisionedThroughput=provisioned_throughput,
        GlobalSecondaryIndexes=[
        {
            'IndexName': 'idx1',
            'KeySchema': [
               {
                  'AttributeName': 'filetype',
                  'KeyType': 'HASH'
               }
             ],
             'Projection': {
               'ProjectionType': 'ALL'
             },
             'ProvisionedThroughput': {
                  'ReadCapacityUnits': 5,
                  'WriteCapacityUnits': 5
             }
        }
    ],

)

### 2. Media Upload

1. Create a Python function to upload media files to the S3 bucket.


In [3]:
from pathlib import Path
def upload_to_s3(local_file_path, bucket_name):
    key = Path(local_file_path).name
    s3_client.upload_file(Filename=local_file_path, Bucket=bucket_name, Key=key)
    return f"{bucket_name}/{key}"


### 3. Processing Media
1. Create a Lambda function that extracts the metadata from the media (filetype, size) and saves it to the DynamoDB table.
2. This function should be triggered, once a file is uploaded


In [6]:
### Lambda code. Similar to lambda.py. Do not run this code here, as it will raise an exception ###

from pathlib import Path

dynamodb = boto3.client('dynamodb')

### Extract metadata from the file ###
def extract_metadata(event):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    size = event['Records'][0]['s3']['object']['size']
    file_type = Path(key).suffix[1:]
    if not file_type:
        file_type = "None"
    return bucket, key, file_type, size

### Add metadata to database. Use file identifier as id ###
def add_to_database(bucket, key, file_type, size):
    id = f"{bucket}/{key}"
    response = dynamodb.put_item(
        TableName='MediaMetadata',  # Change to your table
        Item={
            'id': {'S': id},
            'filetype': {'S': file_type},
            'size': {'N': str(size)}
        }
    )
    print(f"Data added to DynamoDB: {response}")

### Lambda handler routine ###
def lambda_handler(event, context):
    # Extract bucket and file key from S3 Event
    bucket, key, file_type, size = extract_metadata(event)
    print(file_type, size / 1024)
    add_to_database(bucket, key, file_type, size / 1024)


NoRegionError: You must specify a region.

In [7]:
import json 
# Create an IAM role for Lambda trigger. Note that it needs to have s3 Read access as well as dynamodb PUT access
iam_client = boto3.client('iam', region_name="us-east-1")

# Define the Lambda execution role policy
lambda_execution_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [  # Necessary to interact with s3 objects
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem"  # Necessary to update dynamodb table
            ],
            "Resource": "arn:aws:dynamodb:*:*:table/MediaMetadata"  # Replace with your tablename
        }
    ]
}


role_name = 'LambdaMetaDataTrigger'
role_description = 'Role for Trigger Lambda'
role_response = iam_client.create_role(
    RoleName=role_name,
    Description=role_description,
    AssumeRolePolicyDocument=json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "lambda.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    })
)

policy_name = 'LambdaTriggerPolicy'
iam_client.put_role_policy(
    RoleName=role_name,
    PolicyName=policy_name,
    PolicyDocument=json.dumps(lambda_execution_policy)
)

# Get the ARN of the created role
role_arn = role_response['Role']['Arn']


In [8]:
with open("lambda.py", "r") as f:
    function_code = f.read()

In [10]:
### Create Lambda function ###
function_name = "metadata"

import io
import zipfile

lambda_client = boto3.client('lambda', region_name='us-east-1')

with io.BytesIO() as deployment_package:
    with zipfile.ZipFile(deployment_package, 'w') as zipf:
        zipf.writestr('lambda_function.py', function_code)

    create_function_response = lambda_client.create_function(
       FunctionName=function_name,
       Runtime="python3.8",
       Role=role_arn,
       Handler="lambda_function.lambda_handler",
       Code={
           'ZipFile': deployment_package.getvalue()
       }
    )


In [11]:
### Inline Permission ###

bucket_arn = "arn:aws:s3:::media-library-bucket-capstone"  # Change media-library-bucket-capstone to your bucket name
lambda_client.add_permission(
     FunctionName=function_name,
     StatementId='metadata_trigger',  # Unique statement ID
     Action='lambda:InvokeFunction',  # Allow to invoke the function
     Principal='s3.amazonaws.com',  # 
     SourceArn=bucket_arn,
 )


{'ResponseMetadata': {'RequestId': '1219c64b-fea9-4623-968a-9d51ab886f6d',
  'HTTPStatusCode': 201,
  'HTTPHeaders': {'date': 'Mon, 11 Sep 2023 06:37:29 GMT',
   'content-type': 'application/json',
   'content-length': '321',
   'connection': 'keep-alive',
   'x-amzn-requestid': '1219c64b-fea9-4623-968a-9d51ab886f6d'},
  'RetryAttempts': 0},
 'Statement': '{"Sid":"metadata_trigger","Effect":"Allow","Principal":{"Service":"s3.amazonaws.com"},"Action":"lambda:InvokeFunction","Resource":"arn:aws:lambda:us-east-1:472948420345:function:metadata","Condition":{"ArnLike":{"AWS:SourceArn":"arn:aws:s3:::media-library-bucket-capstone"}}}'}

In [12]:
### Define the event configuration ###
event_configuration = {
    'LambdaFunctionConfigurations': [
        {
            'LambdaFunctionArn': create_function_response["FunctionArn"],
            'Events': ['s3:ObjectCreated:*'],
        }
    ]
}

# Configure the S3 event trigger
s3_client.put_bucket_notification_configuration(
    Bucket="media-library-bucket-capstone",
    NotificationConfiguration=event_configuration
)



{'ResponseMetadata': {'RequestId': 'G1EQ2YNN5GXJEH2J',
  'HostId': 'KFjSTmF0hdvCr5uyvmmNQ6C5suRAp9fVNbLbfyIyRz9SkAbXSzR538g4k6UcAlups8XftSJuUyo=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'KFjSTmF0hdvCr5uyvmmNQ6C5suRAp9fVNbLbfyIyRz9SkAbXSzR538g4k6UcAlups8XftSJuUyo=',
   'x-amz-request-id': 'G1EQ2YNN5GXJEH2J',
   'date': 'Mon, 11 Sep 2023 06:37:31 GMT',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0}}

### 4. Upload Data.
Let's upload some data

In [14]:
for data in Path("data/").glob("*"):
    upload_to_s3(data, "media-library-bucket-capstone")

Feel free to head over to your dynamodb table and inspect the elements

### 5. Search and Retrieve Media:

1. Create a Python function using Boto3 that allows users to search the DynamoDB table based on parameters filetype and size. 
2. Allow users to download the files from the S3 bucket using a pre-signed URL.

In [5]:
### Example query ###
dynamo_client.query(TableName="MediaMetadata",
             KeyConditionExpression='filetype = :f',
             IndexName='idx1',
            ExpressionAttributeValues={':f': {'S': "jpeg"}})

{'Items': [{'size': {'N': '79.4306640625'},
   'filetype': {'S': 'jpeg'},
   'id': {'S': 'media-library-bucket-capstone/beer.jpeg'}},
  {'size': {'N': '307.1044921875'},
   'filetype': {'S': 'jpeg'},
   'id': {'S': 'media-library-bucket-capstone/cat.jpeg'}},
  {'size': {'N': '422.533203125'},
   'filetype': {'S': 'jpeg'},
   'id': {'S': 'media-library-bucket-capstone/salk.jpeg'}}],
 'Count': 3,
 'ScannedCount': 3,
 'ResponseMetadata': {'RequestId': '5NEBG9CITNJNP0JBMKSQ015OL7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Mon, 11 Sep 2023 07:47:16 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '362',
   'connection': 'keep-alive',
   'x-amzn-requestid': '5NEBG9CITNJNP0JBMKSQ015OL7VV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2090695012'},
  'RetryAttempts': 0}}

In [42]:
def search_dynamodb(file_type=None, size=0):

    if file_type:
        response = dynamo_client.query(
            TableName="MediaMetadata",
            KeyConditionExpression='filetype = :f',
            IndexName='idx1',
            FilterExpression='size >= :s',
            ExpressionAttributeValues={':f': {'S': file_type}, ':s': {'N': str(size)}}
        )
    else:
        response = dynamo_client.scan(
            TableName="MediaMetadata",
            FilterExpression='size >= :s',
            ExpressionAttributeValues={':s': {'N': str(size)}},
        )

    
    return response['Items']


In [66]:
def generate_presigned_urls(query_response_list, expiration=300):
    item_url_dict = {}
    for item in query_response_list:
        bucket, key = item["id"]["S"].split("/")
        
        item_url_dict[item["id"]["S"]] = s3_client.generate_presigned_url('get_object', Params={'Bucket': bucket, 'Key': key}, ExpiresIn=expiration)
    return item_url_dict

In [67]:
generate_presigned_urls(search_dynamodb(file_type="jpeg"))

{'media-library-bucket-capstone/beer.jpeg': 'https://media-library-bucket-capstone.s3.amazonaws.com/beer.jpeg?AWSAccessKeyId=AKIAW4HPMF34XSD4CG63&Signature=KGefbKl4%2B1urqN8b4Yw3rByaldw%3D&Expires=1694420042',
 'media-library-bucket-capstone/cat.jpeg': 'https://media-library-bucket-capstone.s3.amazonaws.com/cat.jpeg?AWSAccessKeyId=AKIAW4HPMF34XSD4CG63&Signature=y2sT6hhPWFTOh9A4j6wVlb%2BopsI%3D&Expires=1694420042',
 'media-library-bucket-capstone/salk.jpeg': 'https://media-library-bucket-capstone.s3.amazonaws.com/salk.jpeg?AWSAccessKeyId=AKIAW4HPMF34XSD4CG63&Signature=mZkdUDS%2BBbiQf83Y%2BgXqvKr5WdI%3D&Expires=1694420042'}