<a href="https://colab.research.google.com/github/Biruk-gebru/Advanced-Prompt-Engineering-/blob/main/Task2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis: Fine-tuning vs Prompt Engineering

## 📊 Results Summary

| Approach       | Accuracy | Precision | Recall | F1 Score |
|----------------|----------|-----------|--------|----------|
| **Fine-tuned** | 1.000    | 0.000     | 0.000  | 0.000    |
| **Zero-shot**  | 0.999    | 0.000     | 0.000  | 0.000    |
| **Few-shot**   | 0.999    | 0.000     | 0.000  | 0.000    |

## 📈 Model Architecture
        +------------+
        | DistilBERT|
        +------------+
             |
         +----------+
         |Fine-tuned|
         +----------+

    +--------------+
    |DistilGPT-2   |
    +--------------+
       /      \
  Zero-shot  Few-shot

## 📊 Accuracy Comparison
```python
%%mermaid
pie
    title Accuracy Comparison
    "Fine-tuned (1.000)" : 100
    "Zero-shot (0.999)" : 99.9
    "Few-shot (0.999)" : 99.9
```

## 🔍 Analysis of Current Results

### Accuracy Anomaly:
- Perfect/near-perfect accuracy suggests potential data leakage or evaluation issues

**Possible causes:**
- Test set contamination (same samples in train/test)
- Label ordering mismatch during evaluation
- Overly simplistic dataset (e.g., all positive or negative)

### Precision/Recall/F1 Issue:
Zero values indicate:
- Potential class imbalance (all predictions one class)
- Metric calculation error
- Undefined values when one class dominates

## 🛠️ Recommended Fixes

### Data Verification:
```python
# Check label distribution
print("Train labels:", np.bincount(train_dataset['label']))
print("Test labels:", np.bincount(test_dataset['label']))

# Verify no data leakage
assert len(set(train_dataset['text']) & set(test_dataset['text'])) == 0
```

### Evaluation Correction:
```python
# Update metric calculation:
ft_metrics = precision_recall_fscore_support(
    true_labels,
    ft_preds,
    average='binary',
    pos_label=1,
    zero_division=0
)
```

### Model Validation:
```python
# Test with sample predictions
samples = ["I love this!", "This is terrible", "Meh, it's okay"]
for text in samples:
    print(f"\nText: {text}")
    print(f"Fine-tuned: {'Positive' if model.predict(text) else 'Negative'}")
    print(f"Few-shot: {'Positive' if predict_fewshot(text) else 'Negative'}")
```

## 📝 Methodology

### Training Parameters

| Parameter | Fine-tuned | Prompt Engineering |
|-----------|------------|-------------------|
| Epochs | 2 | N/A |
| Learning Rate | 2e-5 | N/A |
| Batch Size | 16 | N/A |
| Max Length | 128 tokens | 50-150 tokens |
| Examples | Full training set | 3 (few-shot) |

## 🚀 Next Steps

### 1. Data Quality Check:
- Verify label distribution
- Check for duplicate texts
- Ensure proper train/test separation

### 2. Model Debugging:
- Examine sample predictions
- Test with simplified data
- Add evaluation sanity checks

### 3. Hyperparameter Tuning:
```python
# Example tuning for fine-tuning
training_args = TrainingArguments(
    per_device_train_batch_size=8,  # Try smaller batches
    learning_rate=1e-5,            # Try lower learning rate
    num_train_epochs=3,             # Additional epoch
)
```


In [6]:
# Install required packages
!pip install -q kaggle

# Upload kaggle.json
from google.colab import files
print("Please upload your kaggle.json file:")
uploaded = files.upload()

# Set up Kaggle
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Verify setup
!kaggle datasets list -s sentiment140

Please upload your kaggle.json file:


Saving kaggle.json to kaggle.json
ref                                                                 title                                                     size  lastUpdated                 downloadCount  voteCount  usabilityRating  
------------------------------------------------------------------  --------------------------------------------------  ----------  --------------------------  -------------  ---------  ---------------  
kazanova/sentiment140                                               Sentiment140 dataset with 1.6 million tweets          84855679  2017-09-13 22:43:19.117000         214099       2124  0.88235295       
krishbaisoya/tweets-sentiment-analysis                              Sentiment140                                          41251977  2023-05-27 11:24:02.467000           1657         13  1.0              
zphudzz/tweets-clean-posneg-v1                                      tweets_clean_posneg_v1                               178637961  2025-02-23 03:56:3

In [12]:
# Install required packages
!pip install transformers==4.30.1 jinja2==3.1.2 torch datasets scikit-learn kaggle pandas

import torch
import numpy as np
from datasets import load_dataset
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
import warnings
import jinja2
import os
import pandas as pd
warnings.filterwarnings('ignore')

# Import transformers components
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    GPT2LMHeadModel,
    GPT2Tokenizer,
    Trainer,
    TrainingArguments,
    pipeline
)

# Check versions and device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Download Sentiment140 dataset from Kaggle
if not os.path.exists('sentiment140.csv'):
    !kaggle datasets download -d kazanova/sentiment140
    !unzip sentiment140.zip
    !mv training.1600000.processed.noemoticon.csv sentiment140.csv

# ---- Data Loading ----
def load_sentiment_data():
    cols = ['target', 'ids', 'date', 'flag', 'user', 'text']
    df = pd.read_csv('sentiment140.csv',
                    encoding='latin-1',
                    header=None,
                    names=cols)

    # Convert target to binary (0=negative, 1=positive)
    df['label'] = df['target'].apply(lambda x: 0 if x == 0 else 1)
    df[['text', 'label']].to_csv('sentiment140_clean.csv', index=False)

    # Load with datasets library and convert label to ClassLabel
    dataset = load_dataset('csv', data_files='sentiment140_clean.csv', split='train[:5000]')

    # Convert label column to ClassLabel type
    from datasets import ClassLabel
    dataset = dataset.cast_column('label', ClassLabel(names=['negative', 'positive']))

    # Now stratify will work
    dataset = dataset.train_test_split(
        test_size=0.2,
        stratify_by_column='label',
        seed=42
    )
    return dataset['train'], dataset['test']

# ---- Prompt Templates ----
ZERO_SHOT_TEMPLATE = """
Tweet: {{ text }}
Sentiment:"""

FEW_SHOT_TEMPLATE = """
Classify these tweet sentiments:
Tweet: "This is fantastic!"
Sentiment: positive

Tweet: "I hate this product"
Sentiment: negative

Tweet: "It was okay I guess"
Sentiment: negative

Now classify:
Tweet: {{ text }}
Sentiment:"""

# ---- Model Setup ----
def setup_fine_tuning():
    try:
        model_name = "distilbert-base-uncased"
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSequenceClassification.from_pretrained(
            model_name,
            num_labels=2
        ).to(device)

        train_dataset, test_dataset = load_sentiment_data()

        def tokenize_function(examples):
            return tokenizer(
                examples["text"],
                padding="max_length",
                truncation=True,
                max_length=128,
                return_tensors=None
            )

        tokenized_train = train_dataset.map(tokenize_function, batched=True)
        tokenized_test = test_dataset.map(tokenize_function, batched=True)

        tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"])
        tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])

        return model, tokenizer, tokenized_train, tokenized_test, test_dataset
    except Exception as e:
        print(f"Error in setup: {str(e)}")
        raise

def setup_prompt_engineering():
    try:
        gpt2_model = GPT2LMHeadModel.from_pretrained("distilgpt2").to(device)
        gpt2_tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
        gpt2_tokenizer.pad_token = gpt2_tokenizer.eos_token

        env = jinja2.Environment(trim_blocks=True, lstrip_blocks=True)
        sentiment_pipeline = pipeline(
            "text-generation",
            model=gpt2_model,
            tokenizer=gpt2_tokenizer,
            device=0 if torch.cuda.is_available() else -1
        )
        return gpt2_tokenizer, env, sentiment_pipeline
    except Exception as e:
        print(f"Error in GPT-2 setup: {str(e)}")
        raise

# ---- Prediction Functions ----
def predict_sentiment_zeroshot(text, tokenizer, env, pipeline):
    try:
        prompt = env.from_string(ZERO_SHOT_TEMPLATE).render(text=text[:100])
        generated_text = pipeline(
            prompt,
            max_length=50,
            num_return_sequences=1,
            temperature=0.7
        )[0]['generated_text']
        sentiment = generated_text.split("Sentiment:")[-1].strip().lower()
        return 1 if "positive" in sentiment else 0
    except Exception as e:
        print(f"Zero-shot error: {str(e)}")
        return 0

def predict_sentiment_fewshot(text, tokenizer, env, pipeline):
    try:
        prompt = env.from_string(FEW_SHOT_TEMPLATE).render(text=text[:100])
        generated_text = pipeline(
            prompt,
            max_length=150,
            num_return_sequences=1,
            temperature=0.3,
            stop_sequence="\n"
        )[0]['generated_text']
        last_line = generated_text.split("\n")[-1]
        sentiment = last_line.split("Sentiment:")[-1].strip().lower()
        return 1 if "positive" in sentiment else 0
    except Exception as e:
        print(f"Few-shot error: {str(e)}")
        return 0

# ---- Training ----
def train_model(model, tokenized_train, tokenized_test):
    try:
        training_args = TrainingArguments(
            output_dir="./results",
            num_train_epochs=2,
            per_device_train_batch_size=16,
            per_device_eval_batch_size=16,
            learning_rate=2e-5,
            weight_decay=0.01,
            evaluation_strategy="epoch",
            logging_dir="./logs",
            logging_steps=50,
            save_strategy="no"
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=tokenized_train,
            eval_dataset=tokenized_test
        )

        print("Fine-tuning DistilBERT...")
        trainer.train()
        return trainer
    except Exception as e:
        print(f"Training error: {str(e)}")
        raise

# ---- Evaluation ----
def evaluate_all_models(trainer, tokenized_test, gpt2_tokenizer, env, pipeline, test_dataset):
    try:
        print("\n=== Evaluating All Approaches ===")

        # 1. Fine-tuned model
        print("\n1. Evaluating Fine-tuned DistilBERT...")
        ft_preds = trainer.predict(tokenized_test).predictions.argmax(-1)
        ft_labels = tokenized_test['label'].numpy()

        # 2. Zero-shot
        print("\n2. Evaluating Zero-shot DistilGPT-2...")
        zs_preds = [predict_sentiment_zeroshot(text, gpt2_tokenizer, env, pipeline)
                   for text in test_dataset["text"]]

        # 3. Few-shot
        print("\n3. Evaluating Few-shot DistilGPT-2...")
        fs_preds = [predict_sentiment_fewshot(text, gpt2_tokenizer, env, pipeline)
                   for text in test_dataset["text"]]

        # Metrics
        labels = np.array(test_dataset["label"])

        def print_metrics(name, true, pred):
            acc = accuracy_score(true, pred)
            prec, rec, f1, _ = precision_recall_fscore_support(true, pred, average="binary", zero_division=0)
            print(f"\n{name} Results:")
            print(f"Accuracy:  {acc:.3f}")
            print(f"Precision: {prec:.3f}")
            print(f"Recall:    {rec:.3f}")
            print(f"F1:        {f1:.3f}")

        print_metrics("Fine-tuned", labels, ft_preds)
        print_metrics("Zero-shot", labels, np.array(zs_preds))
        print_metrics("Few-shot", labels, np.array(fs_preds))

    except Exception as e:
        print(f"Evaluation error: {str(e)}")

# ---- Main Execution ----
if __name__ == "__main__":
    try:
        # Setup
        model, tokenizer, tokenized_train, tokenized_test, test_dataset = setup_fine_tuning()
        gpt2_tokenizer, env, sentiment_pipeline = setup_prompt_engineering()

        # Train
        trainer = train_model(model, tokenized_train, tokenized_test)

        # Evaluate all approaches
        evaluate_all_models(
            trainer, tokenized_test,
            gpt2_tokenizer, env, sentiment_pipeline,
            test_dataset
        )

    except Exception as e:
        print(f"Fatal error: {str(e)}")

Using device: cuda


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.

Generating train split: 0 examples [00:00, ? examples/s]

Casting the dataset:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Fine-tuning DistilBERT...


Epoch,Training Loss,Validation Loss
1,0.0006,0.000412
2,0.0003,0.000227



=== Evaluating All Approaches ===

1. Evaluating Fine-tuned DistilBERT...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



2. Evaluating Zero-shot DistilGPT-2...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


3. Evaluating Few-shot DistilGPT-2...


Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:198 for open-end generation.
Setting `pad_token_i


Fine-tuned Results:
Accuracy:  1.000
Precision: 0.000
Recall:    0.000
F1:        0.000

Zero-shot Results:
Accuracy:  0.999
Precision: 0.000
Recall:    0.000
F1:        0.000

Few-shot Results:
Accuracy:  0.999
Precision: 0.000
Recall:    0.000
F1:        0.000
