# Step 5 â€“ Model Evaluation (Sales Data: Hit_Target Yes/No)

This notebook focuses **only on model evaluation** using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- K-fold Cross-Validation

We assume preprocessing + model training were completed earlier.

## 1. Import Libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix
)

sns.set(style="whitegrid")


## 2. Load Sales Dataset
This dataset contains a Yes/No target column `Hit_Target`.

In [None]:
df = pd.read_csv("sales_data.csv")
df.head()


## 3. Define Features & Target
Here we simply define X and y again for evaluation.

In [None]:
X = df.drop(columns=["Hit_Target"])
y = df["Hit_Target"]

categorical_cols = ["Month", "Region", "Product_Category"]
numeric_cols = ["Revenue", "Units_Sold", "Marketing_Spend", "Monthly_Sales"]


## 4. Train/Test Split
We train a simple Random Forest for evaluation purposes.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


## 5. Minimal Preprocessing + Model (Step 5 builds on earlier steps)

In [None]:
from sklearn.impute import SimpleImputer

preprocess = ColumnTransformer([
    ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
    ("num", SimpleImputer(strategy="median"), numeric_cols) 
])

model = Pipeline([
    ("prep", preprocess),
    ("rf", RandomForestClassifier(random_state=42))
])

model.fit(X_train, y_train)


## 6. Predictions

In [None]:
y_pred = model.predict(X_test)

pd.DataFrame({
    "Actual": y_test.values,
    "Predicted": y_pred
}).head()


## 7. Evaluation Metrics

In [None]:
accuracy  = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall    = recall_score(y_test, y_pred, average='weighted')
f1        = f1_score(y_test, y_pred, average='weighted')

print("Accuracy :", accuracy)
print("Precision:", precision)
print("Recall   :", recall)
print("F1-score :", f1)


## 8. Bar Chart of Metrics

In [None]:
metrics = [accuracy, precision, recall, f1]
names = ["Accuracy", "Precision", "Recall", "F1-score"]

plt.figure(figsize=(6,4))
sns.barplot(x=names, y=metrics)
plt.ylim(0,1)
plt.title("Model Evaluation Metrics")
plt.show()


## 9. Confusion Matrix

In [None]:
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=["No","Yes"], yticklabels=["No","Yes"])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


## 10. K-Fold Cross-Validation

In [None]:
cv_scores = cross_val_score(model, X, y, cv=5)

print("Cross-validation scores:", cv_scores)
print("Mean CV Accuracy:", cv_scores.mean())
print("Standard Deviation:", cv_scores.std())
