# Lab 0 — Setup + Warmups (ML & NLP)

**Goal:** Verify Colab runs + quick Python/Pandas/NLP warmups.

✅ By end: you can load data, clean text, and train a tiny text classifier.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

print('✅ Imports successful!')

## Warmup 1 — Python text cleaning
We’ll use a tiny `simple_clean()` function throughout the workshop.

In [None]:
texts = ["Hello World!!!", "ML & NLP workshop", "Spam detection is fun!!!"]

def simple_clean(text: str) -> str:
    return "".join([c.lower() for c in text if c.isalnum() or c.isspace()]).strip()

cleaned = [simple_clean(t) for t in texts]
cleaned

## Warmup 2 — Pandas mini dataset

In [None]:
df = pd.DataFrame({
    "text": ["Win money now", "Hi bro how are you", "Claim your prize today"],
    "label": [1, 0, 1]
})

df

## Warmup 3 — Tiny ML model (text classification)
We convert text → numbers using **CountVectorizer**, then train **LogisticRegression**.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

X = df["text"]
y = df["label"]

vec = CountVectorizer()
X_vec = vec.fit_transform(X)

model = LogisticRegression(max_iter=200)
model.fit(X_vec, y)

test_samples = ["Win prize now", "Hello friend"]
preds = model.predict(vec.transform(test_samples))

list(zip(test_samples, preds))

✅ **Expected:** spam-like phrase predicted as `1`, friendly phrase as `0`.

---

## Next: Lab 1 (Easy ML)
We’ll train a real classifier on a built-in dataset.