# 2️⃣ PyCaret Classification: Predicting Survival on the Titanic
Classification is a Supervised Learning task where the goal is to predict **categorical labels** (discrete classes). 
In this notebook, we aim to predict whether a passenger survived (`1`) or not (`0`).

## Key Learning Objectives:
1. **Automated Preprocessing**: Handling categorical variables like 'Sex' and 'Embarked'.
2. **Model Leaderboard**: Comparing classifiers (Random Forest, XGBoost, CatBoost, etc.).
3. **Performance Metrics**: Analyzing Accuracy, AUC, Precision, and Recall.
4. **Model Deployment**: Exporting and re-loading the classification pipeline.

## Environment Preparation

In [None]:
%%capture
# !pip install pycaret

In [None]:
import pandas as pd
from pycaret.classification import *
import os

# Create Output folder
output_dir = './Output'
if not os.path.exists(output_dir): os.makedirs(output_dir)

## 1. Initializing the Experiment
PyCaret's `setup()` function automatically detects feature types and handles missing values (Imputation).

In [None]:
df = pd.read_csv('./Data/Titanic-Dataset.csv')
df.head()

In [None]:
# Initialize setup
# target: 'Survived' (0 = No, 1 = Yes)
# session_id: For reproducibility
# fix_imbalance: Useful if one class has much fewer samples than the other
clf_setup = setup(data=df, target='Survived', session_id=42, verbose=False)

print("✅ Classification Setup Complete: Pipeline is ready.")

In [None]:
models()

## 2. Comparing and Fine-Tuning Models
We compare all available classifiers and then use `tune_model()` to optimize the **F1-Score**, which balances Precision and Recall.

In [None]:
# Compare all models and return the best one based on Accuracy
best_clf_model = compare_models()

In [None]:
# Fine-tune the best model to optimize for balanced performance
tuned_clf_model = tune_model(best_clf_model)

## 3. Visual Analysis & Performance Metrics
In Classification, we use the **Confusion Matrix** to see exactly where the model is making mistakes.
- **True Positives**: Correctly predicted survivors.
- **False Positives**: Passengers predicted to survive who unfortunately didn't.

In [None]:
# Open the interactive evaluation dashboard
evaluate_model(tuned_clf_model)

In [None]:
# Specifically plot the Confusion Matrix
plot_model(tuned_clf_model, plot='confusion_matrix')

In [None]:
# Specifically plot the ROC Curve
plot_model(tuned_clf_model, plot='auc')

## 4. Finalizing the Model
We check the performance on the test set and then finalize the model for production.

In [None]:
# Predict on hold-out/test data
classification_results = predict_model(tuned_clf_model)

# Finalize the model (train on 100% of data)
final_titanic_model = finalize_model(tuned_clf_model)

print("--- Sample Predictions (Survived vs. Prediction_Label) ---")
print(classification_results[['Survived', 'prediction_label', 'prediction_score']].head())

## 5. Model Export and Re-use
We save the classification pipeline as a `.pkl` file to use it in future applications.

In [None]:
# Save the model
clf_save_path = os.path.join(output_dir, 'classification_titanic_survival_model')
save_model(final_titanic_model, clf_save_path)

# --- RE-LOADING THE MODEL ---
# Load the saved pipeline
loaded_survival_model = load_model(clf_save_path)

# Predict on new data (first 5 rows for demo)
new_passengers = df.head(5)
final_preds = predict_model(loaded_survival_model, data=new_passengers)

print("\n✅ Predictions from LOADED Classification model:")
print(final_preds[['prediction_label', 'prediction_score']])