## PyCaret: The Low-Code Machine Learning Framework

### 1. What is PyCaret?
**PyCaret** is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.

In comparison to other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with only a few words. This makes experiments exponentially fast and efficient.



---

### 2. Why PyCaret for Data Science?
* **Productivity:** It allows you to go from raw data to a deployed model in minutes.
* **Ease of Use:** It features a simple and consistent syntax across all modules.
* **Business Ready:** Designed for fast prototyping and production-grade deployments.
* **Automatic Preprocessing:** It automatically handles missing values, categorical encoding, feature scaling, and train-test splits during the `setup()` phase.

---

### 3. Core Modules and Use Cases
PyCaret is modular. Each module is designed for a specific machine learning task:

| Module | Purpose | Real-World Example |
| :--- | :--- | :--- |
| **Classification** | Predict categorical labels | Customer Churn, Spam Detection |
| **Regression** | Predict continuous values | House Prices, Stock Value |
| **Clustering** | Group similar data points | Customer Segmentation |
| **Anomaly Detection** | Identify rare events | Fraud Detection, System Failures |
| **Time Series** | Forecasting based on time | Sales Forecasting, Weather Prediction |
| **NLP** | Topic Modeling | Text Theme Extraction |



---

### 4. The Standard Workflow
Every PyCaret experiment follows these standardized functional steps:

1.  **`setup()`**: Initializes the experiment and the transformation pipeline.
2.  **`compare_models()`**: Trains all models in the library and ranks them by performance.
3.  **`create_model()`**: Trains a specific algorithm for deeper analysis.
4.  **`tune_model()`**: Automatically optimizes the hyperparameters of a model.
5.  **`plot_model()`**: Generates interactive performance visualizations (ROC, Residuals, etc.).
6.  **`finalize_model()`**: Trains the model on the complete dataset for production.

---

## Installation

In [None]:
%%capture
# !pip install pycaret

In [13]:
import pandas as pd
import os

# Create Output directory for model storage
output_dir = './Output'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# 1️⃣ PyCaret Regression: From Data to Deployment
Regression is a Supervised Learning task used to predict **continuous numerical outcomes**. 
In this notebook, we use the **Boston Housing Dataset** to predict house prices (`medv`).

## Key Learning Objectives:
1. **Automated Setup:** Handling missing data and feature engineering.
2. **Benchmark Comparison:** Ranking 20+ algorithms instantly.
3. **Interactive Evaluation:** Using dashboards for error analysis.
4. **Model Persistence:** Saving and loading models for production.

## Environment Preparation

In [14]:
from pycaret.regression import *

## 1. Initializing the Experiment
The `setup()` function is the engine of PyCaret. It creates a transformation pipeline that 
ensures your data is clean and ready for machine learning.

In [15]:
# Load dataset
df = pd.read_csv('./Data/Boston.csv')
df.head()

Unnamed: 0,Id,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [16]:
# Initialize setup
# target: 'medv' (Median House Value)
# session_id: For reproducibility
# log_experiment: Set to True if you want to track experiments
reg_setup = setup(data=df, target='medv', session_id=123, verbose=False)

print("✅ Pipeline Setup Complete: Data is now cleaned and split.")

✅ Pipeline Setup Complete: Data is now cleaned and split.


In [17]:
models()

Unnamed: 0_level_0,Name,Reference,Turbo
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lr,Linear Regression,sklearn.linear_model._base.LinearRegression,True
lasso,Lasso Regression,sklearn.linear_model._coordinate_descent.Lasso,True
ridge,Ridge Regression,sklearn.linear_model._ridge.Ridge,True
en,Elastic Net,sklearn.linear_model._coordinate_descent.Elast...,True
lar,Least Angle Regression,sklearn.linear_model._least_angle.Lars,True
llar,Lasso Least Angle Regression,sklearn.linear_model._least_angle.LassoLars,True
omp,Orthogonal Matching Pursuit,sklearn.linear_model._omp.OrthogonalMatchingPu...,True
br,Bayesian Ridge,sklearn.linear_model._bayes.BayesianRidge,True
ard,Automatic Relevance Determination,sklearn.linear_model._bayes.ARDRegression,False
par,Passive Aggressive Regressor,sklearn.linear_model._passive_aggressive.Passi...,True


## 2. Comparing and Fine-Tuning Models
We first find the best base model, then we use `tune_model()` to automatically optimize its 
hyperparameters for even better $R^2$ scores.

In [18]:
# Compare all models and pick the best one
best_model = compare_models()

In [19]:
# Optional: Fine-tune the best model to squeeze out more performance
tuned_model = tune_model(best_model)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2.6518,14.0225,3.7447,0.8431,0.1419,0.0993
1,1.9549,6.555,2.5603,0.8585,0.1191,0.1045
2,2.4888,9.4524,3.0745,0.9056,0.1672,0.1495
3,2.9493,23.1717,4.8137,0.7715,0.2005,0.1565
4,2.4372,10.4098,3.2264,0.9117,0.1486,0.1242
5,2.2619,9.9709,3.1577,0.8145,0.1583,0.1215
6,1.6819,4.6246,2.1505,0.9314,0.0907,0.0783
7,3.1966,35.4074,5.9504,0.6751,0.2108,0.1386
8,1.9977,6.616,2.5722,0.8817,0.1604,0.1276
9,2.1001,10.502,3.2407,0.8785,0.1424,0.0911


Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


## 3. Visual Analysis
PyCaret provides an interactive dashboard through `evaluate_model()`. 
You can inspect:
- **Residuals:** To check for non-linear patterns in errors.
- **Feature Importance:** To see which variables (like 'RM' - rooms) impact the price most.

In [20]:
# This opens an interactive GUI within the notebook
evaluate_model(tuned_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

## 4. Predicting and Finalizing
`predict_model()` shows how the model performs on the hold-out set. 
`finalize_model()` then trains it on 100% of the available data.

In [21]:
# Check performance on test data
holdout_predictions = predict_model(tuned_model)

# Finalize the model for saving
final_reg_model = finalize_model(tuned_model)

print("--- Sample Predictions ---")
print(holdout_predictions[['medv', 'prediction_label']].head())

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,2.1953,12.9735,3.6019,0.8395,0.1515,0.1163


--- Sample Predictions ---
          medv  prediction_label
410  15.000000            39.085
85   26.600000            26.834
280  45.400002            47.134
422  20.799999            19.472
199  34.900002            31.408


## 5. Saving and Re-loading the Model
To use this model in a real application, we save it as a `.pkl` file and demonstrate how to load it back.

In [22]:
save_path = os.path.join(output_dir, 'regression_boston_house_model')
save_model(final_reg_model, save_path)

# --- RE-LOADING THE MODEL ---
# Load the saved model (pretending we are in a new script)
loaded_house_model = load_model(save_path)

# Predict on new data using the loaded model
new_data = df.head(5) # Taking 5 rows as "new" data
final_preds = predict_model(loaded_house_model, data=new_data)

print("\n✅ Predictions from LOADED model:")
print(final_preds[['prediction_label']])

Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,0.0,0.0,0.0,1.0,0.0,0.0



✅ Predictions from LOADED model:
   prediction_label
0         24.000000
1         21.600000
2         34.700001
3         33.400002
4         36.200001
