# Social Media Sentiment Analysis with AWS SageMaker

This notebook demonstrates two approaches for performing sentiment analysis on social media posts:

1. **RoBERTa on AWS SageMaker:** Deploy a model for scalable inference using AWS SageMaker.
2. **DistilBERT for Real-Time, On-Device Analysis:** Use a lightweight model locally for quick inference.

Each section contains detailed comments and explanations.

In [None]:
# Import necessary libraries
import boto3              # AWS SDK for Python to interact with AWS services
import json               # To convert Python objects to JSON format
from transformers import RobertaTokenizer, RobertaForSequenceClassification  # For RoBERTa model and tokenizer
import torch              # PyTorch for tensor operations

# Load the pre-trained RoBERTa tokenizer and model for sequence classification
# 'num_labels=2' indicates binary sentiment classification (e.g., Negative and Positive)
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
model = RobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=2)

# Define example social media posts
posts = [
    "I love this product, it works wonders!",
    "This is the worst experience ever."
]

# Tokenize the input posts
# - padding=True: Pads sequences to the same length
# - truncation=True: Truncates sequences longer than the model's max length
# - return_tensors="pt": Returns the results as PyTorch tensors
inputs = tokenizer(posts, padding=True, truncation=True, return_tensors="pt")

# Convert the tensor of input IDs to a list and prepare a JSON payload
# The payload is formatted as JSON to be sent to the SageMaker endpoint
payload = json.dumps({"input_ids": inputs['input_ids'].tolist()})

# Initialize the SageMaker runtime client
sagemaker_runtime = boto3.client("runtime.sagemaker")

# Specify the SageMaker endpoint name (update this to your deployed endpoint name)
endpoint_name = "sentiment-analysis-endpoint"

try:
    # Invoke the SageMaker endpoint with the JSON payload
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=payload
    )

    # Read and decode the response from the endpoint
    result = response['Body'].read().decode('utf-8')
    print("SageMaker Endpoint Response:", result)
except Exception as e:
    # Print the error if the endpoint invocation fails
    print("Error invoking SageMaker endpoint:", str(e))

### DistilBERT for Real-Time, On-Device Sentiment Analysis

This section demonstrates how to perform sentiment analysis locally using DistilBERT. This model is optimized for speed and lower resource consumption, making it suitable for real-time applications.

In [None]:
# Import the DistilBERT tokenizer and model for sequence classification
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Load the pre-trained DistilBERT tokenizer and model
# 'num_labels=2' sets up binary sentiment classification
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

# Define example social media posts
posts = [
    "Amazing product! Will recommend.",
    "Iâ€™m disappointed with the quality."
]

# Tokenize the posts
inputs = tokenizer(posts, padding=True, truncation=True, return_tensors="pt")

# Run inference without gradient computation for performance
with torch.no_grad():
    outputs = model(**inputs)
    # The model returns logits (raw scores); argmax selects the predicted class
    predictions = torch.argmax(outputs.logits, dim=-1)

# Print the sentiment predictions
# Convention: 0 indicates Negative sentiment, 1 indicates Positive sentiment
print("DistilBERT Sentiment Predictions:", predictions.tolist())

### Conclusion

This notebook has demonstrated two distinct methods for social media sentiment analysis:

1. **RoBERTa on AWS SageMaker:** Suitable for scalable and customizable sentiment analysis in production environments.
2. **DistilBERT for On-Device Analysis:** Ideal for real-time inference with lower resource usage.

You can extend these examples to integrate sentiment analysis into your applications for brands, products, or services.