# Applications of Large Language Models
**Dataset Used**: IMDB Movie Reviews
    
**Models used**:
    
    Claude 3 Sonnet : https://www.anthropic.com/news/claude-3-5-sonnet
    Llama-3-70B-instruct-v1 : https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
    
**Instance** : g4dn.xlarge

In [None]:
import pandas as pd
import numpy as np
import random
from sklearn.metrics import classification_report
import json
random.seed(42) # set the seed

In [None]:
# importing the dataset
import s3fs
fs=s3fs.S3FileSystem(anon=False) # initialize the file system
s3_url='s3://amazon-sagemaker-058264306111-us-east-1-e23504aef6c5/dzd_5l5kah6gnsnq3r/bzbm82rtfbxpgn/dev/IMDB Dataset.csv' # location of the file in s3 bucket
data=pd.read_csv(s3_url)
data

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49995,I thought this movie did a down right good job...,positive
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


In [None]:
# Creating a bedrock runtime client.
client = boto3.client("bedrock-runtime", region_name="us-east-1")
# each model has a unique Model Identifier
model_id='anthropic.claude-3-sonnet-20240229-v1:0' # no of tokens for input

## Zero-shot prompting

Zero-shot prompting is a technique in natural language processing (NLP) where a language model is asked to perform a task without any prior examples or training specific to that task. Instead, the model relies on its pre-trained knowledge to generate responses based on the prompt alone.

In [None]:
# import a data sample from the dataset and prompt the model to perform sentiment analysis.
random_index=random.randint(0,50000) # create a random index
data_sample=data.iloc[random_index]# access a random sample from the dataset using random_index
print(data_sample['review'])
print(data_sample['sentiment'])

Water shows the plight of Indian widows in the late 1930s, says in the end that the problem still exists largely by giving statistics in the end, refers to Gandhi several times in the movie before finally having a scene depicting him and does nothing extra ordinarily innovative or new in the movie. Yes, the cinematography is pretty impressive but that cannot be the soul of any movie for me. <br /><br />India has had several problems like many other nations but it has got rid of many of these problems at large. What if a movie is made on racism in America in a particular year which ends with 'x number of Americans still experience racism today'. <br /><br />a) How would it be relevant, and, b) How would it be some thing so extra ordinary being depicted in cinema.<br /><br />A view I read from a Deepa Mehta interview was that this movie is being interpreted as a voice for the marginalised every where. From reviews I read every where, the common thing I am hearing is how the director did 

In [None]:
# now prompt the model
# Extracting Data from the Dataset
review=data_sample['review']
sentiment=data_sample['sentiment']
# Defines a formatted string (f-string) that instructs the model to classify the sentiment.
prompt=f''' You are a highly accurate sentiment analysis model trained to classify movie reviews. Your task is to analyze the sentiment of the given review and categorize it as one of the following labels:

Positive: The review expresses a favorable opinion about the movie.
Negative: The review expresses an unfavorable opinion about the movie.

Expected Output Format:
positive/negative

Provide only the sentiment label in one word without any additional explanation.
            {review}
        '''
# Preparing the API Request for Anthropic Claude
native_request= {
    "anthropic_version" : "bedrock-2023-05-31", # version of the anthropic claude
    "max_tokens":25, # max new tokens that model has to generate
    "temperature": 0.4, # a hyperparameter that regulates the creativity of text generation
    "messages": [
          {"role" : "user" , "content" : prompt}  #  The prompt is included as a "user" message
    ]
}

In [None]:
# convert the request into a json request; AWS Bedrock APIs require requests in JSON format, not Python dictionaries.
request = json.dumps(native_request)
# passing the json request and calling the model
response= client.invoke_model(modelId=model_id,body=request)

# the next two lines of code include parsing the model's response
# Extracts the response body from AWS Bedrock.
model_response= json.loads(response["body"].read())
prediction = model_response.get("content", [""])[0].get("text", "")
# "text" is identified automatically by Claude. The explicit instruction "Only predict whether it is Positive, or Negative." forces the model to output just one of these two labels.

In [None]:
print("---------------Precicted sentiment (From LLM)-----------------------")
print(prediction)

print("\n---------------Actual sentiment (From dataset)-----------------------")
print(sentiment)


---------------Precicted sentiment (From LLM)-----------------------
negative

---------------Actual sentiment (From dataset)-----------------------
negative


In [None]:
# Assuming `data_sample` is a DataFrame with 'review' and 'sentiment' columns

from tqdm import tqdm  # tqdm is a popular Python library for creating progress bars in loops and long-running tasks.


def predict_sentiment(test_set, model_id, client):
    results = []
    predictions=[]
    actual=[]
    for index, row in tqdm(test_set.iterrows()):
        review = row['review']
        actual_sentiment = row['sentiment']

        prompt = f""" You are a highly accurate sentiment analysis model trained to classify movie reviews. Your task is to analyze the sentiment of the given review and categorize it as one of the following labels:

positive: The review expresses a favorable opinion about the movie.
negative: The review expresses an unfavorable opinion about the movie.

Expected Output Format:
positive/negative

Provide only the sentiment label in one word without any additional explanation.
                     {review}
                 """

        native_request = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 20,
            "temperature": 0.4,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        }

        request = json.dumps(native_request)

        response = client.invoke_model(modelId=model_id, body=request)

        # Parse the response
        model_response = json.loads(response["body"].read())
        prediction = model_response.get("content", [""])[0].get("text", "").strip()



        predictions.append(prediction.lower())
        actual.append(actual_sentiment)

    return predictions,actual


In [None]:
 # select 200 random data samples as part of the test set evaluation.
random_indices = np.random.choice(data.index, size=200, replace=False)
test_data = data.loc[random_indices].reset_index(drop=True)
test_data

Unnamed: 0,review,sentiment
0,Nicole Finn (Madonna) is just being released f...,positive
1,It is a pleasure to see such creativity on TV ...,positive
2,I really enjoyed watching this movie about the...,positive
3,"An entertaining kung fu film, with acting, plo...",positive
4,This anime seriously rocked my socks. When the...,positive
...,...,...
195,"SEPARATE LIES is such an elegant, intelligent ...",positive
196,"Tedious girls-at-reform-school flick, which pl...",negative
197,I have no idea as to which audience director G...,negative
198,"Kenneth Branagh's ""Hamlet"" hits all the marks....",positive


In [None]:
predictions,actual=predict_sentiment(test_data,model_id,client)

200it [02:31,  1.32it/s]


In [None]:
print(classification_report(predictions,actual))

              precision    recall  f1-score   support

    negative       0.97      0.93      0.95       112
    positive       0.91      0.97      0.94        88

    accuracy                           0.94       200
   macro avg       0.94      0.95      0.94       200
weighted avg       0.95      0.94      0.95       200



## Review

We can observe that the model's response was accurate, LLMs handle the text very well considering the context and semantic relationships.

Let's learn about a prompting technique using the same dataset


## Few-Shot Prompting

Instead of relying solely on pre-trained knowledge (zero-shot prompting), few-shot prompting provides the model with a small number of labeled examples (typically 2–5) before asking it to complete the task.

In [None]:
# create n-examples
# since we discussed that n-shot involves prompting the model with n examples. we need to create a function to randomly index n examples.
def create_n_samples(data, n=3):
    example_list=[] # a list to hold our n sample
    for i in range(n):
        random_index=random.randint(0,100) # create a random index
        data_sample=data.iloc[random_index]
        example_list.append(data_sample)
    n_shot_examples=""
    for i in example_list:
        n_shot_examples=n_shot_examples+'\n'+"Review: "+i['review']+'\n\n'+"Sentiment :"+i['sentiment']+'\n\n'
    # print(n_shot_examples)
    return n_shot_examples


In [None]:
n = 2 # n value; do not use values >8 as it will flood the window of the model with unnecessary context.
n_shot_prompt = create_n_samples(data , n)
print(n_shot_prompt)


Review: This a fantastic movie of three prisoners who become famous. One of the actors is george clooney and I'm not a fan but this roll is not bad. Another good thing about the movie is the soundtrack (The man of constant sorrow). I recommand this movie to everybody. Greetings Bart

Sentiment :positive


Review: Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents 

In [None]:
random_index=random.randint(101,50000) # create a random index
data_sample=data.iloc[random_index] # access a random sample from the dataset using random_index
data_sample

review       Don't get me wrong , I want to see marijuana l...
sentiment                                             negative
Name: 48699, dtype: object

In [None]:
# now prompt the model
review=data_sample['review']
sentiment=data_sample['sentiment']
# crafting the prompt with the n examples
prompt=f''' Classify the sentiment of the following movie reviews as positive or negative. Here are some examples:\n
{n_shot_prompt}\n\n
            predict the sentiment of the review given below as positive or negative in one word\n\n
            {review}
        '''

formatted_prompt = f"Human: {prompt}\n\nAssistant:"    # "Assistant:" signals to the model that it should now generate a response.

native_request= {
    "anthropic_version" : "bedrock-2023-05-31", # version of the  anthropic claude
    "max_tokens":20, # max new tokens that model has to generate
    "temperature": 0.1, # a hyperparameter that regulates the creativity of text generation
    "messages": [
          {"role" : "user" , "content" : formatted_prompt}
    ]
}
print(prompt)

 Classify the sentiment of the following movie reviews as positive or negative. Here are some examples:


Review: This a fantastic movie of three prisoners who become famous. One of the actors is george clooney and I'm not a fan but this roll is not bad. Another good thing about the movie is the soundtrack (The man of constant sorrow). I recommand this movie to everybody. Greetings Bart

Sentiment :positive


Review: Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watche

In [None]:
request = json.dumps(native_request)  # convert the request into a json request
response= client.invoke_model(modelId=model_id,body=request) # passing the json request and calling the model
# the next two lines of code include parsing the model's response
model_response= json.loads(response["body"].read())
prediction = model_response.get("content", [""])[0].get("text", "")
print(prediction)

negative


In [None]:
print("---------------Precicted sentiment (From LLM)-----------------------")
print(prediction)
print("\n---------------Actual sentiment (From dataset)-----------------------")
print(sentiment)

---------------Precicted sentiment (From LLM)-----------------------
negative

---------------Actual sentiment (From dataset)-----------------------
negative


In [None]:
# Assuming `data_sample` is a DataFrame with 'review' and 'sentiment' columns
from tqdm import tqdm
def predict_sentiment(test_set,n_shot_prompt, model_id, client):
    """
    Predicts sentiment (positive/negative) for a set of movie reviews using a few-shot prompting approach
    with an LLM model accessed via AWS Bedrock.

    Parameters:
    - test_set: Pandas DataFrame containing movie reviews and their actual sentiments.
    - n_shot_prompt: Few-shot prompt containing example sentiment classifications.
    - model_id: Identifier for the LLM model (e.g., Anthropic Claude via AWS Bedrock).
    - client: API client to interact with the model.

    Returns:
    - predictions: List of predicted sentiment labels from the model.
    - actual: List of actual sentiment labels from the dataset.
    """
    results = []  # List to store full response results (not currently used)
    predictions = []  # List to store model-generated sentiment predictions
    actual = []  # List to store actual sentiment labels from the dataset

    # Iterate over each row in the test dataset with a progress bar
    for index, row in tqdm(test_set.iterrows()):
        review = row['review']  # Extract the movie review text
        actual_sentiment = row['sentiment']  # Extract the actual sentiment label

        # Construct the few-shot prompt with example classifications
        prompt = f''' Classify the sentiment of the following movie reviews as positive or negative. Here are some examples:\n
        {n_shot_prompt}\n\n
            Predict the sentiment of the review given below as positive or negative in one word:\n\n
            {review}
        '''

        # Format the prompt according to Anthropic's expected chat format
        formatted_prompt = f"Human: {prompt}\n\nAssistant:"

        # Create the API request payload for AWS Bedrock
        native_request = {
            "anthropic_version": "bedrock-2023-05-31",  # Specify the version of the API
            "max_tokens": 20,  # Limit response length to ensure only one-word predictions
            "temperature": 0.4,  # Set temperature for less randomness and more consistent responses
            "messages": [
                {"role": "user", "content": formatted_prompt}  # Pass the formatted user prompt
            ]
        }

        request = json.dumps(native_request)  # Convert the request payload to JSON format

        # Invoke the AWS Bedrock model with the request
        response = client.invoke_model(modelId=model_id, body=request)

        # Parse the response from the model
        model_response = json.loads(response["body"].read())  # Read and parse the response body

        # Extract the sentiment prediction (handling possible response structure variations)
        prediction = model_response.get("content", [""])[0].get("text", "").strip()

        # Store the lowercase version of the prediction for consistency
        predictions.append(prediction.lower())

        # Store the actual sentiment from the dataset for evaluation
        actual.append(actual_sentiment)

    # Return the list of predicted sentiments and actual sentiments for comparison
    return predictions, actual


In [None]:
predictions,actual=predict_sentiment( test_data, n_shot_prompt, model_id, client)

200it [02:47,  1.20it/s]


In [None]:
print(classification_report(actual,predictions))

              precision    recall  f1-score   support

    negative       0.94      0.96      0.95       107
    positive       0.96      0.92      0.94        93

    accuracy                           0.94       200
   macro avg       0.95      0.94      0.94       200
weighted avg       0.95      0.94      0.94       200



## Using Multiple LLMs for Sentiment Analysis
We have seen how we can craft the prompt to provide context & question. Sometimes we come across tasks which can't be solved using a single instance

Here's a simple use case.

In [None]:
model_id_1='anthropic.claude-3-sonnet-20240229-v1:0' # LLM1
model_id_2='meta.llama3-70b-instruct-v1:0' # LLM2

In [None]:
random_index=random.randint(101,50000) # create a random index
data_sample=data.iloc[random_index] # access a random sample from the dataset using random_index
print(data_sample['review'])
print(data_sample['sentiment'])

How powerful and captivating simple quality filmmaking can be. This film tells it's tale with everyday scenes that manage to revel the poignancy hidden within. It's true as others have stated, how this film really makes it glaringly obvious how lost Hollywood is in it's special effects, overblown emotionalism and over the top climatic endings and have forgotten the essence of a meaningful story told with simple realism. So much of what these characters are going through is implied by the scene rather than spelled out in wordy dialogue. One aspect that I really enjoyed about the film was the contrast of the two brothers, one so very openly expressive in his childlike way and the other completely stoic but both able to evoke deep emotion. The older brother needed to say little, as he usually did, it was all there in that deadpan face of his! Beautiful cinematography, wonderful acting, great direction! Not to be missed!
positive


In [None]:
review=data_sample['review']
sentiment=data_sample['sentiment']

### First LLM

let's ask the first model to make a sentiment prediction given a review.  

In [None]:
prompt=f'''
        Review: {review}

        summarize the given review.

        '''
native_request = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "temperature": 0.5,
        "messages": [
            {"role": "user", "content": prompt}
        ]
}

In [None]:
request = json.dumps(native_request)  # convert the request into a json request
response= client.invoke_model(modelId=model_id_1,body=request) # passing the json request and calling the model
# the next two lines of code include parsing the model's response
model_response= json.loads(response["body"].read())
prediction = model_response.get("content", [""])[0].get("text", "")
print(prediction)

The given review highly praises the film for its powerful and captivating storytelling through simple, quality filmmaking. It commends the film's ability to reveal poignancy in everyday scenes, contrasting it with Hollywood's overreliance on special effects, overblown emotionalism, and over-the-top climactic endings. The review appreciates the film's meaningful story told with simple realism, where much is implied through scenes rather than wordy dialogue. It highlights the


### Second LLM
Let's pass the prediction made by the first LLM to the second LLM and ask it to breakdown the reason for the prediction

In [None]:
prompt=f'''
        Summary : {prediction}
        Now from the summary, list the things the reviewer likes and dislikes
        '''

# Note that, the syntax here for Llama differs from that of Claude
formatted_prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|> """
native_request = {
    "prompt": formatted_prompt,
    "max_gen_len": 512,
    "temperature": 0.5,
}


In [None]:
request = json.dumps(native_request) # converting into json request
response = client.invoke_model(modelId=model_id_2, body=request) # invoking the model
# parsing the model's response
model_response = json.loads(response["body"].read())
output = model_response.get("generation", "")   # Note that, LLaMA (Meta) Response Formats here differs from that of Claude (Anthropic)
print(output)

<|end_header_id|><|start_header_id|><|end_header_id|>assistant<|end_header_id|>

Based on the summary, here are the things the reviewer likes and dislikes:

**Likes:**

* Powerful and captivating storytelling
* Simple, quality filmmaking
* Ability to reveal poignancy in everyday scenes
* Meaningful story told with simple realism
* Implication of meaning through scenes rather than wordy dialogue

**Dislikes:**

* Hollywood's overreliance on special effects
* Overblown emotionalism
* Over-the-top climactic endings
