### 1. **Imports and Environment Setup**
   - Imports necessary libraries, including `os`, `dotenv`, `pandas`, `langchain`, and others.
   - Loads the `.env` file to retrieve the OpenAI API key.
   - Sets the model to "gpt-3.5-turbo" for sarcasm detection.

https://github.com/MirunaPislar/Sarcasm-Detection/blob/master/res/README.md

In [None]:
import os
from dotenv import load_dotenv
import pandas as pd
from langchain.prompts import PromptTemplate
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
import time
from tqdm import tqdm

load_dotenv()

#OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = "gpt-3.5-turbo"

### 2. **Define the PromptTemplate and Query Function**
   - Defines a `PromptTemplate` for sarcasm classification.
   - Implements a `query_ai` function that initializes the OpenAI model, processes input using the defined template, and returns a sarcasm classification label ("sarcastic" or "non-sarcastic").


In [9]:
# Define the PromptTemplate 
template = """
This is a sarcasm classification task. Determine whether the following input text expresses sarcasm.
Input: {input}
If it does, output 'sarcastic'; otherwise, output 'non-sarcastic'.
"""

# Create the PromptTemplate object
prompt = PromptTemplate.from_template(template)

def query_ai(input: str) -> str:
    try:
        # Initialize the model with the API key and model name
        model = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
        
        # Define the chain
        chain = (
            { 
                "input": itemgetter("input"),
            }
            | prompt
            | model
            | StrOutputParser()
        )
        
        # Execute the chain with the provided inputs
        result = chain.invoke({"input": input})
        
        # Ensure only the label is returned
        # Strip 'Output:' if it exists and trim any extra whitespace
        label = result.replace("Output:", "").strip()
        return label
    
    except Exception as e:
        # Improved error handling
        print(f"Error occurred: {str(e)}")
        return "An error occurred."

In [11]:
query_ai("I love doing 20 sprints in 103 degree weather")

'sarcastic'

### 3. **Data Loading and Preparation**
   - Reads input text and corresponding labels from two files (`test.txt` and `labels_test.txt`).
   - Strips extra spaces and ensures matching text-label pairs.
   - Converts the data into a `pandas` DataFrame.

In [14]:
import pandas as pd

# Define file paths
text_file = 'riloff tweet/test.txt'
label_file = 'riloff tweet/labels_test.txt'

# Read text samples
with open(text_file, 'r', encoding='utf-8') as tf:
    texts = tf.readlines()

# Read labels
with open(label_file, 'r', encoding='utf-8') as lf:
    labels = lf.readlines()

# Strip any extra whitespace or newlines from the text and labels
texts = [text.strip() for text in texts]
labels = [label.strip() for label in labels]

# Ensure the lengths of both lists match
if len(texts) != len(labels):
    raise ValueError("The number of text samples and labels must be equal.")

# Convert to DataFrame
df = pd.DataFrame({
    'text': texts,
    'label': labels
})

In [19]:
df

Unnamed: 0,text,label
0,Absolutely love when water is spilt on my phon...,1
1,I was hoping just a LITTLE more shit could hit...,1
2,@pdomo Don't forget that Nick Foles is also th...,0
3,I constantly see tweets about Arsenal on twitt...,0
4,Can feel the feet pulsating...slow one...becau...,0
...,...,...
583,"Somewhere in the desert of Nevada, there is a ...",0
584,I just love getting up this early to go into s...,1
585,"Somewhere in the desert of Nevada, there is a ...",0
586,"Lol 😂 RT“@ReeseButCallMeV: When I'm high, I tu...",0


### 4. **Processing and AI Query Execution**
   - Iterates through the DataFrame, sending each text to the AI model for sarcasm classification.
   - Appends responses to a list and saves progress after each iteration to avoid data loss in case of errors.
   - After processing, the final results are saved in `result_final.csv`.

In [20]:
from tqdm import tqdm

# Initialize lists to hold data
responses = []

# Iterate over each question
for i in tqdm(range(0, df.shape[0])):
    try:
        # Extract text for the current row
        text = df.iloc[i]["text"]
        
        # Get the AI response
        response = query_ai(text)
        
        # Append the response to the list
        responses.append(response)
        
        # Save progress in case of an exception
        temp_df = pd.DataFrame({
            "text": df["text"].iloc[:i+1],
            "response": responses,
            "Predicted Label": [0 if r == 'sarcastic' else 1 for r in responses[:i+1]],
            "True Label": df["label"].iloc[:i+1],
        })
        temp_df.to_csv(f"result_progress.csv", index=False)
    
    except Exception as e:
        print(f"Error at index {i}: {e}")
        break

# Save the final results after completing the loop
final_df = pd.DataFrame({
    "text": df["text"],
    "response": responses,
    "Predicted Label": [0 if r == 'sarcastic' else 1 for r in responses],
    "True Label": df["label"],
})
final_df.to_csv("result_final.csv", index=False)

100%|██████████| 588/588 [07:39<00:00,  1.28it/s]


In [21]:
final_df

Unnamed: 0,text,response,Predicted Label,True Label
0,Absolutely love when water is spilt on my phon...,sarcastic,0,1
1,I was hoping just a LITTLE more shit could hit...,sarcastic,0,1
2,@pdomo Don't forget that Nick Foles is also th...,sarcastic,0,0
3,I constantly see tweets about Arsenal on twitt...,non-sarcastic,1,0
4,Can feel the feet pulsating...slow one...becau...,non-sarcastic,1,0
...,...,...,...,...
583,"Somewhere in the desert of Nevada, there is a ...",non-sarcastic,1,0
584,I just love getting up this early to go into s...,sarcastic,0,1
585,"Somewhere in the desert of Nevada, there is a ...",non-sarcastic,1,0
586,"Lol 😂 RT“@ReeseButCallMeV: When I'm high, I tu...",sarcastic,0,0


### 5. **Model Evaluation**
   - Calculates evaluation metrics (accuracy, precision, recall, F1 score) by comparing predicted and true labels.
   - Prints the metrics and a detailed classification report showing performance for both "Non-Sarcastic" and "Sarcastic" labels.

In [22]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

# Ensure True Label and Predicted Label are integers for metric calculations
final_df["True Label"] = final_df["True Label"].astype(int)
final_df["Predicted Label"] = final_df["Predicted Label"].astype(int)

# Extract true labels and predicted labels
true_labels = final_df["True Label"]
predicted_labels = final_df["Predicted Label"]

# Calculate evaluation metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

# Display the metrics
print("Evaluation Metrics:")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(true_labels, predicted_labels, target_names=["Non-Sarcastic", "Sarcastic"]))


Evaluation Metrics:
Accuracy: 0.43
Precision: 0.06
Recall: 0.17
F1 Score: 0.09

Classification Report:
               precision    recall  f1-score   support

Non-Sarcastic       0.75      0.48      0.59       495
    Sarcastic       0.06      0.17      0.09        93

     accuracy                           0.43       588
    macro avg       0.41      0.33      0.34       588
 weighted avg       0.64      0.43      0.51       588



### 6. **Define the PromptTemplate with Few-Shot Examples**
   - Defines a `PromptTemplate` for sarcasm classification with a few-shot learning approach, providing examples of sarcastic and non-sarcastic text.
   - Implements a `query_ai_few_shot_examples` function that initializes the OpenAI model, processes input using the few-shot prompt, and returns a sarcasm classification label ("sarcastic" or "non-sarcastic").


In [24]:
# Define the PromptTemplate 
template_few_shot_examples = """
This is a sarcasm classification task. Determine whether the following input text expresses sarcasm.

Here are some examples:

1. Input: Absolutely love when water is spilt on my phone.. Just love it.. #timeforanewphone
   Output: non-sarcastic

2. Input: I was hoping just a LITTLE more shit could hit the fan this week.
   Output: non-sarcastic

3. Input: @pdomo Don't forget that Nick Foles is also the new Tom Brady. What a preseason! #toomanystudQBs #thankgodwedonthavetebow
   Output: sarcastic

4. Input: I constantly see tweets about Arsenal on twitter. Thanks for keeping the world updated @ZachBaugus &amp; @shawnxh . #HugeArsenalFans
   Output: sarcastic

Now, classify the following input text:

Input: {input}

If it expresses sarcasm, output 'sarcastic'; otherwise, output 'non-sarcastic'.
"""


# Create the PromptTemplate object
prompt_few_shot_examples = PromptTemplate.from_template(template)

def query_ai_few_shot_examples(input: str) -> str:
    try:
        # Initialize the model with the API key and model name
        model = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
        
        # Define the chain
        chain = (
            { 
                "input": itemgetter("input"),
            }
            | prompt_few_shot_examples
            | model
            | StrOutputParser()
        )
        
        # Execute the chain with the provided inputs
        result = chain.invoke({"input": input})
        
        # Ensure only the label is returned
        # Strip 'Output:' if it exists and trim any extra whitespace
        label = result.replace("Output:", "").strip()
        return label
    
    except Exception as e:
        # Improved error handling
        print(f"Error occurred: {str(e)}")
        return "An error occurred."

### 7. **Processing and AI Query Execution with Few-Shot Examples**
   - Starts processing the DataFrame from the fifth sample and sends each text to the AI model for sarcasm classification using the few-shot prompt.
   - Appends responses to a list and saves progress in a CSV file (`result_progress_few_shot_examples.csv`) after each iteration to prevent data loss.
   - After processing, the final results are saved in `result_final_few_shot_examples.csv`.

In [25]:
from tqdm import tqdm
import pandas as pd

# Initialize lists to hold data
responses = []

# Start from the fifth sample
start_index = 4

# Iterate over each question starting from the specified index
for i in tqdm(range(start_index, df.shape[0])):
    try:
        # Extract text for the current row
        text = df.iloc[i]["text"]
        
        # Get the AI response
        response = query_ai_few_shot_examples(text)
        
        # Append the response to the list
        responses.append(response)
        
        # Save progress in case of an exception
        temp_df = pd.DataFrame({
            "text": df["text"].iloc[start_index:i+1],  # Start saving from the fifth sample
            "response": responses,
            "Predicted Label": [0 if r == 'sarcastic' else 1 for r in responses],
            "True Label": df["label"].iloc[start_index:i+1],
        })
        temp_df.to_csv(f"result_progress_few_shot_examples.csv", index=False)
    
    except Exception as e:
        print(f"Error at index {i}: {e}")
        break

# Save the final results after completing the loop
final_df = pd.DataFrame({
    "text": df["text"].iloc[start_index:],  # Final DataFrame starting from the fifth sample
    "response": responses,
    "Predicted Label": [0 if r == 'sarcastic' else 1 for r in responses],
    "True Label": df["label"].iloc[start_index:],
})
final_df.to_csv("result_final_few_shot_examples.csv", index=False)


100%|██████████| 584/584 [10:40<00:00,  1.10s/it]


### 8. **Model Evaluation**
   - Calculates evaluation metrics (accuracy, precision, recall, F1 score) by comparing predicted and true labels.
   - Prints the metrics and a detailed classification report showing performance for both "Non-Sarcastic" and "Sarcastic" labels.

In [26]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

# Ensure True Label and Predicted Label are integers for metric calculations
final_df["True Label"] = final_df["True Label"].astype(int)
final_df["Predicted Label"] = final_df["Predicted Label"].astype(int)

# Extract true labels and predicted labels
true_labels = final_df["True Label"]
predicted_labels = final_df["Predicted Label"]

# Calculate evaluation metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

# Display the metrics
print("Evaluation Metrics:")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(true_labels, predicted_labels, target_names=["Non-Sarcastic", "Sarcastic"]))


Evaluation Metrics:
Accuracy: 0.45
Precision: 0.05
Recall: 0.14
F1 Score: 0.07

Classification Report:
               precision    recall  f1-score   support

Non-Sarcastic       0.76      0.50      0.60       493
    Sarcastic       0.05      0.14      0.07        91

     accuracy                           0.45       584
    macro avg       0.41      0.32      0.34       584
 weighted avg       0.65      0.45      0.52       584

