# Generating Personalized Emails with AWS Bedrock and AWS Personalize


In this notebook, we're building a product recommendation engine that can provide personalized recommendations to users based on their past ratings and reviews. We will be working with pandas to utilize dataset filtering and then export the data to DynamoDB and S3 for future processing.

In [None]:
import time
from time import sleep
import json
from datetime import datetime
import boto3
import pandas as pd
import os
from io import StringIO

***Retrieve Bucket name*** 

Replace the stack_name with the stack name associated with your CloudFormation template. Remove the <> after entering the stack name.

In [None]:
cf_client = boto3.client('cloudformation')

stack_name = '<YOUR-STACK-NAME>'
response = cf_client.describe_stacks(StackName=stack_name)
stack_outputs = response['Stacks'][0]['Outputs']

bucket_name = None
for output in stack_outputs:
    if output['OutputKey'] == 'S3Bucket': 
        bucket_name = output['OutputValue']
        break
        
print(bucket_name)

***Creating the Product Review Dataset***

Taking the initial amazon.csv file, we will be filtering out the respect columns of data to create a product data set. This would include fields like timestamp, product id, product name, and description. 

In [None]:
# Create the product review dataset filtering out respective columns


review_data_key = 'amazon_data.csv'
data_s3_location1 = "s3://{}/{}".format(bucket_name, review_data_key) 
product_data = pd.read_csv(data_s3_location1, encoding='latin1')

product_data = product_data.drop('discounted_price', axis=1)
product_data = product_data.drop('actual_price', axis=1)
product_data = product_data.drop('discount_percentage', axis=1)
product_data = product_data.drop('user_id', axis=1)
product_data = product_data.drop('user_name', axis=1)
product_data = product_data.drop('review_id', axis=1)
product_data = product_data.drop('review_title', axis=1)
product_data = product_data.drop('review_content', axis=1)
product_data = product_data.drop('img_link', axis=1)
product_data = product_data.drop('product_link', axis=1)
product_data = product_data.drop('rating', axis=1)
product_data = product_data.drop('age', axis=1)



product_data.rename(columns={'timestamp': 'CREATION_TIMESTAMP', 'product_id': 'ITEM_ID', 'product_name': 'PRODUCT_NAME', 'category': 'CATEGORY','rating_count': 'RATING_COUNT','about_product': 'DESCRIPTION'}, inplace=True)
product_data['CREATION_TIMESTAMP'] = pd.to_datetime(product_data['CREATION_TIMESTAMP']).astype(int) // 10**9  # Convert to Unix timestamp (seconds)




product_data.head()

## Uploading the data to DynamoDB

DynamoDB is a highly scalable, low-latency, and fully managed NoSQL database service which is well-suited for applications that require fast and consistent performance, such as personalization recommendation engines. DynamoDB is reliable and durable choice for storing and retrieving large volumes of data. We will be utilizing on demand capacity to account for unpredictable traffic patterns from our dataset.


Using the boto client, we create a table with a schema where ITEM_ID is the primary key. After ensuring the table is ready, we convert the rating_count column in the product_data DataFrame to strings and iterate through the DataFrame, inserting each row as an item into the DynamoDB table. The creation of the table may take up to ***one minute*** to complete.

In [None]:
# Upload the product data to DynamoDB


session = boto3.Session()
dynamodb = session.resource('dynamodb')

table_name = 'productdata'



try:
    table = dynamodb.Table(table_name)
    table.load()
    print(f"Table {table_name} already exists.")
except dynamodb.meta.client.exceptions.ResourceNotFoundException:
    # Define the table schema
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'ITEM_ID',
                'KeyType': 'HASH'  
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'ITEM_ID',
                'AttributeType': 'S'
            }
        ],
        BillingMode='PAY_PER_REQUEST'
    )
    table.meta.client.get_waiter('table_exists').wait(TableName=table_name)
    print(f"Table {table_name} created successfully.")


    
product_data['RATING_COUNT'] = product_data['RATING_COUNT'].astype(str)

# Define the DynamoDB table
table = dynamodb.Table(table_name)

# Insert data into DynamoDB
for index, row in product_data.iterrows():
    item = {
        'TIMESTAMP': row['CREATION_TIMESTAMP'],
        'ITEM_ID': row['ITEM_ID'],
        'PRODUCT_NAME': row['PRODUCT_NAME'],
        'CATEGORY': row['CATEGORY'],
        'RATING_COUNT': row['RATING_COUNT'],
        'DESCRIPTION': row['DESCRIPTION']
    }
    
    table.put_item(Item=item)

print("Data inserted successfully.")

***Create the user dataset***

We then load the user review data from an Amazon S3 bucket into a DataFrame filtering out certain columns and renaming some columns for consistency. Convert the TIMESTAMP column to a Unix timestamp in seconds.

In [None]:

# Create the user review dataset filtering out respective columns

user_data_key = 'amazon_data.csv'
data_s3_location1 = "s3://{}/{}".format(bucket_name, user_data_key)  # S3 URL
user_data = pd.read_csv(data_s3_location1, encoding='latin1')

user_data = user_data.drop('rating_count', axis=1)
user_data = user_data.drop('category', axis=1)
user_data = user_data.drop('about_product', axis=1)
user_data = user_data.drop('img_link', axis=1)
user_data = user_data.drop('product_link', axis=1)
user_data = user_data.drop('discounted_price', axis=1)
user_data = user_data.drop('actual_price', axis=1)
user_data = user_data.drop('discount_percentage', axis=1)


user_data.rename(columns={'timestamp': 'TIMESTAMP', 'product_id': 'ITEM_ID', 'product_name':'PRODUCT_NAME','rating':'RATING','user_id': 'USER_ID', 'age': 'AGE','user_name':'USERNAME', 'review_id': 'REVIEW_ID','review_title':'REVIEW_TITLE','review_content':'REVIEW_CONTENT'}, inplace=True)

user_data['TIMESTAMP'] = pd.to_datetime(user_data['TIMESTAMP']).astype(int) // 10**9  # Convert to Unix timestamp (seconds)


user_data.info()
user_data.head()


***Write back user data to DynamoDB***


Using the boto client, we create a table with a schema where review_id as the primary key. After ensuring the table is ready, we convert the rating and review_content columns in the user_data DataFrame to strings.Iterate through the DataFrame, inserting each row as an item into the DynamoDB table. The creation of the table may take up to ***one minute*** to complete.

In [None]:
session = boto3.Session()

# Get the DynamoDB resource
dynamodb = session.resource('dynamodb')

# Define the table
table_name = 'userdata'

# Check if the table exists, if not, create it
try:
    table = dynamodb.Table(table_name)
    table.load()
    print(f"Table {table_name} already exists.")
except dynamodb.meta.client.exceptions.ResourceNotFoundException:
    # Define the table schema
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'REVIEW_ID',
                'KeyType': 'HASH' 
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'REVIEW_ID',
                'AttributeType': 'S'
            }
        ],
        
        BillingMode='PAY_PER_REQUEST'
        
    )

    table.meta.client.get_waiter('table_exists').wait(TableName=table_name)
    print(f"Table {table_name} created successfully.")

table = dynamodb.Table(table_name)
user_data['RATING'] = user_data['RATING'].astype(str)
user_data['REVIEW_CONTENT'] = user_data['REVIEW_CONTENT'].astype(str)



for index, row in user_data.iterrows():
    item = {
        'TIMESTAMP': row['TIMESTAMP'],
        'AGE': row['AGE'],
        'PRODUCT_ID': row['ITEM_ID'],
        'PRODUCT_NAME': row['PRODUCT_NAME'],
        'REVIEW_ID': row['REVIEW_ID'],
        'REVIEW_TITLE': row['REVIEW_TITLE'],
        'USER_ID': row['USER_ID'],
        'USERNAME': row['USERNAME'],
        'REVIEW_CONTENT': row['REVIEW_CONTENT']
    }
    
    table.put_item(Item=item)

print("Data inserted successfully.")


***Creating the interaction dataset***

We are performing sentiment analysis on product reviews using Amazon Comprehend in preparation for training a personalized recommendation model with Amazon Personalize. The sentiment analysis provides valuable insights into the users' opinions and attitudes towards specific products, which can be leveraged by the Amazon Personalize to generate more accurate and relevant recommendations.

- Enriching User Interactions: The sentiment analysis adds an extra layer of information to the user interaction data. By analyzing the sentiment of reviews, we can better understand the user's experience and preferences with different products. Positive sentiments indicate satisfaction, while negative sentiments may suggest dissatisfaction or areas for improvement.

- Improving Recommendation Relevance: Amazon Personalize uses various data sources, including user interactions, to train its recommendation models. By incorporating sentiment information, the model can learn not only what products a user interacted with but also their subjective opinions about those products. This additional context can help the model better understand user preferences and provide more relevant recommendations.

This may take up to a **minute** to generate.

In [None]:
import boto3
import pandas as pd

# Initialize the Amazon Comprehend client
comprehend = boto3.client('comprehend', region_name='us-west-2')

def chunk_text(text, chunk_size=4800):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        while end < len(text) and ord(text[end]) > 127 and ord(text[end]) < 192:
            end -= 1
        chunks.append(text[start:end])
        start = end
    return chunks

def get_comprehend_sentiment(text, max_length=5000):
    if not text:
        return None, None
    
    if len(text.encode('utf-8')) > max_length:
        text = text[:max_length//2]  
    chunks = chunk_text(text)
    sentiments = []
    sentiment_scores = []
    
    for chunk in chunks:
        if len(chunk.encode('utf-8')) > max_length:
            raise ValueError(f"Chunk size exceeds {max_length} bytes limit.")
        response = comprehend.detect_sentiment(Text=chunk, LanguageCode='en')
        sentiment = response['Sentiment'].upper()
        sentiment_score = response['SentimentScore'][sentiment.capitalize()]
        sentiments.append(sentiment)
        sentiment_scores.append(sentiment_score)
    
    avg_sentiment_score = sum(sentiment_scores) / len(sentiment_scores) if sentiment_scores else None
    return sentiments[0], avg_sentiment_score

# Assuming user_data is your DataFrame containing the user data
processed_data = user_data.copy()
processed_data['RATING'] = pd.to_numeric(processed_data['RATING'], errors='coerce')

# Apply sentiment analysis to the 'REVIEW_CONTENT' column
processed_data[['SENTIMENT', 'SENTIMENTSCORE']] = processed_data['REVIEW_CONTENT'].apply(lambda x: pd.Series(get_comprehend_sentiment(x)))

# Add the EVENT_TYPE column
processed_data['EVENT_TYPE'] = None
processed_data.loc[processed_data['RATING'] < 4.0, 'EVENT_TYPE'] = 'read'
processed_data.loc[processed_data['RATING'] > 4.0, 'EVENT_TYPE'] = 'click'


# Filter rows that have an EVENT_TYPE assigned
interactions_df = processed_data[processed_data['EVENT_TYPE'].notna()]
interactions_df = interactions_df[['TIMESTAMP', 'USERNAME', 'USER_ID','ITEM_ID', 'PRODUCT_NAME', 'EVENT_TYPE', 'SENTIMENT', 'SENTIMENTSCORE']]
interactions_df['SENTIMENTSCORE'] = interactions_df['SENTIMENTSCORE'].astype(str)
interactions_df['USER_ID'] = interactions_df['USER_ID'].astype(str)


# Print the final DataFrame
print(interactions_df)

## Read datasets into S3

Read the interaction,product,and user dataset in the initialized S3 bucket. Reading data into Amazon S3 is a common practice when utilizing Amazon Personalize because it requires input data to be stored in an S3 bucket before it can be used to train and deploy models. 

In [None]:
# Read interaction data into S3

from io import StringIO
interactions_filename = "interactions.csv"

interactions_df.info()
print(interactions_df.columns)


csv_buffer = StringIO()
interactions_df.to_csv(csv_buffer, index=False)

s3_client = boto3.client('s3')
s3_client.put_object(Bucket= bucket_name, Key='interactions.csv', Body = csv_buffer.getvalue())


In [None]:
#Read product data into S3

csv_buffer = StringIO()
product_data.to_csv(csv_buffer, index=False)


s3_client = boto3.client('s3')
s3_client.put_object(Bucket= bucket_name, Key='product_data.csv', Body = csv_buffer.getvalue())


In [None]:
# Read user data into S3

user_data = user_data.drop('REVIEW_CONTENT', axis=1)
csv_buffer = StringIO()
user_data.to_csv(csv_buffer, index=False)

s3_client = boto3.client('s3')
s3_client.put_object(Bucket=bucket_name, Key='user_data.csv', Body = csv_buffer.getvalue())


***Store the variables for future use***

In [None]:
%store product_data
%store user_data
%store interactions_df
%store interactions_filename
%store bucket_name