# Content moderation for generative AI applications

As we've seen in these workshops, leveraging generative AI to generate creative content such as marketing copy and images can improve the quality of your messaging and provide a boost to productivity. However, turning over this important task to AI is not without risks. Most foundation models are trained and implemented with safeguards to prevent them from generating unsafe or inappropriate content. For example, you can read more about ["Claude's Constitution"](https://www.anthropic.com/index/claudes-constitution) which outlines the "values" that Anthropic built into the FM we used in the first notebook. In addition, Amazon Bedrock has its own [abuse detection](https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html) layer on top of the FMs that it provides access to. Nevertheless, adding a moderation step to ensure that generated content remains consistent with the voice of your brand and does not inadvertently include any inappropriate themes or messages can add a layer of confidence to scaling generative AI across your organization. Content moderation can be implemented many different ways. For example, you could use a large language model to evaluate itself by building a prompt that expresses your standards and asks the model to evaluate text that it's generated against those standards. Or you can use separate AI models or services designed and tuned for classification tasks to check your work with generative AI.

In this notebook, we will use another FM available in Amazon Bedrock, Titan Text Embeddings, to generate embeddings that will be used to train a classifier model. Embeddings are numerical representations of values or objects like text, images, and audio that can be fed to machine learning models. In this case, the model will be trained on examples of compliant and non-compliant text so that it is able to predict whether our text is compliant. 

For generated banner image, will take a different approach of using [Amazon Rekognition](https://aws.amazon.com/rekognition/) to identify unsafe/inappropriate in images.

## Notebook overview

We will complete the following steps in this notebook.

1. Text moderation:
    - Load and examine the labeled dataset of toxic and non-toxic text.
    - Generate embeddings using the Amazon Titan Text Embedding model for all text values in the dataset.
    - Split the dataset 80/20 into training and testing portions.
    - Train a classifier model using the embeddings of the training data using the scikit-learn [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).
    - Calculate and review metrics that measure the accuracy of the model's ability to propertly classify the text in the held-out test data.
    - Finally, use the model to classify the email subject and email body that we generated in the previous notebook.
1. Image moderation:
    - Use the Amazon Rekognition [DetectModerationLabels](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html) API for a test image that we expect to return some moderation labels.
    - Use the Amazon Rekognition [DetectModerationLabels](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html) API for the images we created in the previous notebook to ensure that no moderation labels are present.

## Upgrade and install dependencies <a id="installdeps"></a>

Run the below cell to install/update Python dependencies.

In [None]:
# First, let's get the latest installations of our dependencies
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall boto3
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore
!{sys.executable} -m pip install --quiet "pillow>=9.5,<10"

### Load variables

Let's load the variables passed from the prior notebooks to we can access them in this notebook.

In [None]:
%store -r

### Import dependencies

Let's load some of the dependencies that we'll need for this notebook as well as print their versions.

In [None]:
import json
import pandas as pd
import boto3
import botocore

# Get the Boto3 version
boto3_version = boto3.__version__

# Get the Botocore version
botocore_version = botocore.__version__

# Print the Boto3 version
print("Current Boto3 Version:", boto3_version)

# Print the Botocore version
print("Current Botocore Version:", botocore_version)

### Initialize AWS service clients

Let's initialize the boto3 client to use for S3 and Bedrock.

In [None]:
s3 = boto3.client("s3")
bedrock = boto3.client("bedrock")

# Foundation models that generate embeddings from text

Now we're ready to get started. Let's first ask Bedrock to list the foundation models that it currently supports with an output modality of embedding.

In [None]:
response = bedrock.list_foundation_models(
    byOutputModality = "EMBEDDING"
)
print(json.dumps(response["modelSummaries"], indent=2))

For this notebook, we will be using the "Titan Embeddings G1 - Text" model which has a modelId of `amazon.titan-embed-text-v1`.

## Prepare custom classification training dataset 
Download and unzip the sample data toxicity.zip to the local volume.

In [None]:
s3.download_file(
    "personalize-solution-staging-us-east-1", 
    "personalize-immersionday-travel/toxicity.zip", 
    data_dir + "/toxicity.zip"
)

In [None]:
!unzip -o $data_dir/toxicity.zip -d $data_dir/toxicity_dataset

This CSV file contains 500 toxic and 500 non-toxic comments from a variety of popular social media platforms. Click on toxicity_en.csv to see a spreadsheet of 1000 English examples.

Columns:
- text: the text of the comment
- is_toxic: whether or not the comment is toxic

(The dataset contained in **$data_dir/toxicity.zip** is an unaltered redistribution of [the toxicity dataset](https://github.com/surge-ai/toxicity) made available by Surge AI under MIT License.)

In [None]:
toxicity_df = pd.read_csv(data_dir + "/toxicity_dataset/toxicity_en.csv")
toxicity_df

In [None]:
# Count the number of toxic and not toxic labels
toxicity_df["is_toxic"].value_counts()

# Generate embeddings

Embeddings are a key concept in generative AI and machine learning in general. An embedding is a representation of an object (like a word, image, video, etc.) in a vector space. Typically, semantically similar objects will have embeddings that are close together in the vector space. These are very powerful for use-cases like semantic search, recommendations, and classifications.

## Define embedding utility function

The following function will generate and return an embedding for a piece of text.

In [None]:
bedrock_runtime = boto3.client("bedrock-runtime")

def get_embedding(body, model_id, accept = "application/json", content_type = "application/json"):
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id, accept=accept, contentType=content_type)
    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')
    return embedding

## Generate embedding vectors for labeled text dataset

Next we'll generate embeddings for the labeled text in our dataset. This should take about 2 minutes to complete.

In [None]:
%%time

# Initialize a list to store the results
embeddings = []

model_id = "amazon.titan-embed-text-v1"

print(f"Generating embeddings for {len(toxicity_df)} labeled text")

for _,row in toxicity_df.iterrows():
    text = row["text"]
    label = row["is_toxic"]
    
    # Calculate the embedding for the text
    body = json.dumps({"inputText": text})
    embedding = get_embedding(body, model_id)

    embeddings.append({
        'label': label,
        'embedding': embedding
    })

# The results can be saved to a file so they can be re-used later if necessary.
with open('moderation_vectors.json', 'w', encoding='utf-8') as output_file:
    json.dump(embeddings, output_file, indent=2)

print('Embedding vectors have been saved to moderation_vectors.json')

# Train classifier model

With the embeddings generated, we can now train our classifier.

## Split embeddings for training and evaluation

So that we can evaluate the accuracy of the classifier we will split the dataset into training and test. We'll use an 80/20 split where we train on 80% of the data and test on the 20% that was held-out.

In [None]:
# Extract the first 100 and last 100 records
first_100_records = embeddings[:100]
last_100_records = embeddings[-100:]

# Create 'test.json' with the combined 200 records
test_data = first_100_records + last_100_records
with open('test.json', 'w') as test_file:
    json.dump(test_data, test_file)

# Create 'train.json' with the remaining 800 records
train_data = embeddings[100:-100]
with open('train.json', 'w') as train_file:
    json.dump(train_data, train_file)

### Training a model - RandomForestClassifier by embedding vectors

In [None]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report

# Extract features (embedding vectors) and labels from the datasets
X_train = [data_point["embedding"] for data_point in train_data]
y_train = [data_point["label"] for data_point in train_data]

X_test = [data_point["embedding"] for data_point in test_data]
y_test = [data_point["label"] for data_point in test_data]

# Convert lists to numpy arrays for scikit-learn
X_train = np.array(X_train)
y_train = np.array(y_train)

X_test = np.array(X_test)
y_test = np.array(y_test)

# Build the classification model (Random Forest in this example)
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
clf.fit(X_train, y_train)

# Evaluate the model
y_pred = clf.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Calculate and print precision
precision = precision_score(y_test, y_pred, average='weighted')
print("Precision:", precision)

# Calculate and print recall
recall = recall_score(y_test, y_pred, average='weighted')
print("Recall:", recall)

# Calculate and print F1-score
f1 = f1_score(y_test, y_pred, average='weighted')
print("F1-score:", f1)

# Calculate and print ROC-AUC score (Note: ROC-AUC is typically used for binary classification)
if len(np.unique(y_test)) == 2:  # Check if it's a binary classification problem
    roc_auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
    print("ROC-AUC:", roc_auc)

# Print the detailed classification report
classification_report_str = classification_report(y_test, y_pred)
print("Classification Report:\n", classification_report_str)

### Save the trained classifier model

The model can then be saved to a file so that it can be loaded later to perform inference. You can use the joblib library to save and load scikit-learn models.

In [None]:
# saving the model after training:
from joblib import dump

# Save the trained model to a file
dump(clf, 'trained_model.joblib')

Then if later you want to load the trained model, it can be loaded from a file.

In [None]:
# Load the trained model from a file
from joblib import load

clf = load('trained_model.joblib')

## Classify text using model

We'll create a utility function to perform inference against the model. This function will generate an embedding for our input text and then use the model to predict the label for the text as well as predict probabilities for each label.

In [None]:
def classify_text(text: str, model_id: str) -> (str, list):
    embedding = get_embedding(json.dumps({"inputText": text}), model_id)
    predicted_label = clf.predict([embedding])
    probability_estimates = clf.predict_proba([embedding])
    return predicted_label[0], probability_estimates[0]

### Classify sample text

To test our classifier, let's test with a string that should be classified as `Toxic`.

In [None]:
prediction = classify_text("Why don''t you shoot him?! I hate you all!", model_id)
print(f"Test string classification: {prediction[0]} with probability {prediction[1]}")

## Classify generated email subject and body

Now let's try our our classifier on the generated email subject and body from the last notebook. First, we need to isolate the email subject and title from the generated response from the Claude Instant model. If you recall from the prompt in the last notebook, we asked Claude to place the email title/subject and body within XML tags. This instruction in the prompt allows us to more easily parse the response to separate these two pieces of content. To do so, we'll wrap the output in an outer `<email></email>` tag and then parse it as an XML document.

In [None]:
import xml.etree.ElementTree as ET
root = ET.fromstring("<email>" + email2 + "</email>")
subject = root.find("email_title").text.strip()
body = root.find("email_body").text.strip()

print(f"Email subject: {subject}")
print(f"Email body: {body}")

### Clasify email subject

Let's start with the email subject.

In [None]:
prediction = classify_text(subject, model_id)
print(f"Email subject classification: {prediction[0]} with probability {prediction[1]}")

### Clasify email body

Now let's run the email body through the classifier.

In [None]:
prediction = classify_text(body, model_id)
print(f"Email body classification: {prediction[0]} with probability {prediction[1]}")

# Moderate images

Finally let's explore how we can add moderation for the images generated with foundation models like Stable Diffusion. We'll use the AWS AI service, [Amazon Rekognition](https://aws.amazon.com/rekognition/), for this task.

## Moderate sample image
We'll start with a sample image that should flag some suggestive content in an image. The image contains a lady in a bikini and Rekognition Image Moderation will label it as "Suggestive".

Here is the sample image:

![Moderate Image](images/yoga_swimwear.jpg "Test image to moderate")

### Upload and moderate sample image

Let's upload the image to our S3 bucket to stage it for moderation by Amazon Rekognition.

In [None]:
s3_key = 'content-moderation-im/image-moderation/yoga_swimwear.jpg'
s3.upload_file("images/yoga_swimwear.jpg", bucket_name, s3_key)

Next we'll create an SDK client for Rekognition and call the [DetectModerationLabels](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html) API for our test image.

In [None]:
rekognition = boto3.client('rekognition')

In [None]:
response = rekognition.detect_moderation_labels(
    Image={
       'S3Object': {
           'Bucket': bucket_name,
           'Name': s3_key,
       }
    }
)
print(json.dumps(response["ModerationLabels"], indent=2))

Note the response from a call to DetectModerationLabels:

- ModerationLabels – The example shows a list of labels for inappropriate or offensive content found in the image. The list includes the top-level label and each second-level label that are detected in the image. Please see Amazon Rekognition doucmentation for the complete list of supported top/second level labels.

- Name/ParentName – Each label has a name, an estimation of the confidence that Amazon Rekognition has that the label is accurate, and the name of its parent label. The parent name for a top-level label is "".

- Confidence – Each label has a confidence value between 0 and 100 that indicates the percentage confidence that Amazon Rekognition has that the label is correct. You specify the minimal confidence level for a label to be returned in the response in the API operation request.

As we can see in the Rekognition response, the Image Moderation API labeled the image in 3 categories with confidence scores:

- Top level category: Suggestive with a confidence score > 99%
- Second level category: Female Swimwear Or Underwear with a confidence score > 99%
- Second level category: Revealing Clothes with a confidence score > 89%

## Moderate email campaign banner images

Now let's run the generated images from the last notebook through Rekognition.

In [None]:
from PIL import Image
image = Image.open(image_1_path)
image

In [None]:
s3_key = 'content-moderation-im/image-moderation/image_1.png'
s3.upload_file(image_1_path, bucket_name, s3_key)

In [None]:
response = rekognition.detect_moderation_labels(
    Image={
       'S3Object': {
           'Bucket': bucket_name,
           'Name': s3_key,
       }
    }
)
print(json.dumps(response["ModerationLabels"], indent=2))

If there were no moderation labels returned (i.e., an empty list `[]`), then there were no findings from the model.

In [None]:
image = Image.open(image_2_path)
image

In [None]:
s3_key = 'content-moderation-im/image-moderation/image_2.png'
s3.upload_file(image_2_path, bucket_name, s3_key)

In [None]:
response = rekognition.detect_moderation_labels(
    Image={
       'S3Object': {
           'Bucket': bucket_name,
           'Name': s3_key,
       }
    }
)
print(json.dumps(response["ModerationLabels"], indent=2))

## Summary

In this notebook we illustrated how to train a random forest classifier model using embeddings generated using the Amazon Titan Text Embeddings FM from Amazon Bedrock. This model was then used to test the email subject and email body generated in the previous notebook for unsafe and inappropriate content. Then we used the AWS AI service [Amazon Rekognition](https://aws.amazon.com/rekognition/) to perform a similar analysis of the images we generated in the previous notebook. Adding a content moderation step to the use of generative AI can help safeguard against unsafe and inappropriate content.

## Cleanup

To clean up the Amazon Personalize resources created in the first notebook, you can execute the `04_Clean_Up` notebook. If you're running these notebooks as part of an AWS-led workshop where temporary AWS accounts are provided for you, this cleanup will be done automatically for you. Otherwise, if you're running this notebook in a personal or work account, be sure to run the `04-Clean_Up` notebook to shutdown resources that can create ongoing AWS charges.