## PyCaret: The Low-Code Machine Learning Framework

### 1. What is PyCaret?
**PyCaret** is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.

In comparison to other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with only a few words. This makes experiments exponentially fast and efficient.



---

### 2. Why PyCaret for Data Science?
* **Productivity:** It allows you to go from raw data to a deployed model in minutes.
* **Ease of Use:** It features a simple and consistent syntax across all modules.
* **Business Ready:** Designed for fast prototyping and production-grade deployments.
* **Automatic Preprocessing:** It automatically handles missing values, categorical encoding, feature scaling, and train-test splits during the `setup()` phase.

---

### 3. Core Modules and Use Cases
PyCaret is modular. Each module is designed for a specific machine learning task:

| Module | Purpose | Real-World Example |
| :--- | :--- | :--- |
| **Classification** | Predict categorical labels | Customer Churn, Spam Detection |
| **Regression** | Predict continuous values | House Prices, Stock Value |
| **Clustering** | Group similar data points | Customer Segmentation |
| **Anomaly Detection** | Identify rare events | Fraud Detection, System Failures |
| **Time Series** | Forecasting based on time | Sales Forecasting, Weather Prediction |
| **NLP** | Topic Modeling | Text Theme Extraction |



---

### 4. The Standard Workflow
Every PyCaret experiment follows these standardized functional steps:

1.  **`setup()`**: Initializes the experiment and the transformation pipeline.
2.  **`compare_models()`**: Trains all models in the library and ranks them by performance.
3.  **`create_model()`**: Trains a specific algorithm for deeper analysis.
4.  **`tune_model()`**: Automatically optimizes the hyperparameters of a model.
5.  **`plot_model()`**: Generates interactive performance visualizations (ROC, Residuals, etc.).
6.  **`finalize_model()`**: Trains the model on the complete dataset for production.

---

## Installation

In [None]:
%%capture
# !pip install pycaret

In [None]:
import pandas as pd
import os

# Create Output directory for model storage
output_dir = './Output'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# 1️⃣ PyCaret Regression: From Data to Deployment
Regression is a Supervised Learning task used to predict **continuous numerical outcomes**. 
In this notebook, we use the **Boston Housing Dataset** to predict house prices (`medv`).

## Key Learning Objectives:
1. **Automated Setup:** Handling missing data and feature engineering.
2. **Benchmark Comparison:** Ranking 20+ algorithms instantly.
3. **Interactive Evaluation:** Using dashboards for error analysis.
4. **Model Persistence:** Saving and loading models for production.

## Environment Preparation

In [None]:
from pycaret.regression import *

## 1. Initializing the Experiment
The `setup()` function is the engine of PyCaret. It creates a transformation pipeline that 
ensures your data is clean and ready for machine learning.

In [None]:
# Load dataset
df = pd.read_csv('./Data/Boston.csv')
df.head()

In [None]:
# Initialize setup
# target: 'medv' (Median House Value)
# session_id: For reproducibility
# log_experiment: Set to True if you want to track experiments
reg_setup = setup(data=df, target='medv', session_id=123, verbose=False)

print("✅ Pipeline Setup Complete: Data is now cleaned and split.")

In [None]:
models()

## 2. Comparing and Fine-Tuning Models
We first find the best base model, then we use `tune_model()` to automatically optimize its 
hyperparameters for even better $R^2$ scores.

In [None]:
# Compare all models and pick the best one
best_model = compare_models()

In [None]:
# Optional: Fine-tune the best model to squeeze out more performance
tuned_model = tune_model(best_model)

## 3. Visual Analysis
PyCaret provides an interactive dashboard through `evaluate_model()`. 
You can inspect:
- **Residuals:** To check for non-linear patterns in errors.
- **Feature Importance:** To see which variables (like 'RM' - rooms) impact the price most.

In [None]:
# This opens an interactive GUI within the notebook
evaluate_model(tuned_model)

## 4. Predicting and Finalizing
`predict_model()` shows how the model performs on the hold-out set. 
`finalize_model()` then trains it on 100% of the available data.

In [None]:
# Check performance on test data
holdout_predictions = predict_model(tuned_model)

# Finalize the model for saving
final_reg_model = finalize_model(tuned_model)

print("--- Sample Predictions ---")
print(holdout_predictions[['medv', 'prediction_label']].head())

## 5. Saving and Re-loading the Model
To use this model in a real application, we save it as a `.pkl` file and demonstrate how to load it back.

In [None]:
save_path = os.path.join(output_dir, 'regression_boston_house_model')
save_model(final_reg_model, save_path)

# --- RE-LOADING THE MODEL ---
# Load the saved model (pretending we are in a new script)
loaded_house_model = load_model(save_path)

# Predict on new data using the loaded model
new_data = df.head(5) # Taking 5 rows as "new" data
final_preds = predict_model(loaded_house_model, data=new_data)

print("\n✅ Predictions from LOADED model:")
print(final_preds[['prediction_label']])