# Step 1: Dataset Selection & Preprocessing


We use the **Kaggle Disaster Tweets dataset** (10k tweets labeled as disaster or not).  
It is interesting because tweets are short, noisy, and often contain slang/emojis → real NLP challenges.  


In [None]:

import pandas as pd
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

# Load dataset (assumes disaster_tweets.csv is available)
# You can download from Kaggle: https://www.kaggle.com/c/nlp-getting-started/data
df = pd.read_csv("disaster_tweets.csv")  
df.head()


In [None]:

# Basic preprocessing: lowercase, remove links, mentions, special chars
def clean_text(text):
    text = text.lower()
    text = re.sub(r"http\S+", "", text)
    text = re.sub(r"@[A-Za-z0-9_]+", "", text)
    text = re.sub(r"[^a-zA-Z0-9\s]", "", text)
    return text

df["clean_text"] = df["text"].apply(clean_text)
df[["text", "clean_text", "target"]].head()


# Step 2: Prompt Engineering


We design **3 prompt styles** for the task *"Is this tweet about a real disaster?"*  


In [None]:

tweet = "Huge fire near the city center, people are being evacuated!"

prompt_1 = f"Classify: {tweet} (Answer Disaster / Not Disaster)"
prompt_2 = f"Read carefully: '{tweet}'. Does it describe a real disaster event?"
prompt_3 = f"You are an emergency analyst. Decide if the tweet below is about a real disaster. Tweet: {tweet}"

print("Prompt 1:", prompt_1)
print("Prompt 2:", prompt_2)
print("Prompt 3:", prompt_3)



👉 Expected differences:  
- Prompt 1 → short, may lead to ambiguous answers.  
- Prompt 2 → clearer, more natural.  
- Prompt 3 → role-based, encourages contextual reasoning.  


# Step 3: Model Training & Evaluation

In [None]:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

X = df["clean_text"]
y = df["target"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=5000)
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train Logistic Regression baseline
model = LogisticRegression(max_iter=200)
model.fit(X_train_vec, y_train)

# Evaluate
y_pred = model.predict(X_test_vec)
print(classification_report(y_test, y_pred))


# Step 4: Troubleshooting


**Likely Issue:**  
Tweets may contain sarcasm, jokes, or slang → model misclassifies them. This leads to **lower recall** (missing real disasters).  

**Solution:**  
- Collect more diverse training data.  
- Use transformer models (like DistilBERT) for better context.  
- Improve prompts by explicitly warning model about sarcasm.  
- Focus evaluation on Recall, not just Accuracy.  
