# Introduction to PyCaret

**PyCaret** is an open-source, low-code machine learning library in Python that streamlines the process of building and deploying models. It is designed to automate the entire machine learning workflow, making it accessible to both beginners and experienced practitioners.

## Key Features of PyCaret

1. **Ease of Use**: PyCaret simplifies complex machine learning tasks into just a few lines of code.
  
2. **End-to-End Workflow**: It covers data preprocessing, model training, hyperparameter tuning, and deployment in a unified framework.

3. **Model Selection**: PyCaret provides various algorithms for classification, regression, clustering, and more, allowing users to easily compare performance.

4. **Interpretability**: Built-in functions for model interpretation help users understand the underlying patterns in their data.

5. **Integration**: It integrates seamlessly with popular libraries like Pandas, NumPy, and Matplotlib.

## Getting Started

To get started with PyCaret, follow these steps:

1. **Installation**: Ensure you have PyCaret installed in your environment.
   ``` python
   !pip install pycaret 
   ```

2. **Import Libraries**: Import necessary libraries including PyCaret.
   import pandas as pd
   from pycaret.classification import *

3. **Load Data**: Load your dataset into a Pandas DataFrame.
   data = pd.read_csv('your_dataset.csv')

4. **Setup Environment**: Initialize the PyCaret environment with your dataset.
   clf = setup(data, target='target_column_name')

5. **Model Training**: Train and compare models.
   best_model = compare_models()


<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">1- Installation</h1>

In [1]:
!pip install pycaret

Collecting pycaret
  Downloading pycaret-3.3.2-py3-none-any.whl.metadata (17 kB)
Collecting pandas<2.2.0 (from pycaret)
  Downloading pandas-2.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting scipy<=1.11.4,>=1.6.1 (from pycaret)
  Downloading scipy-1.11.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting joblib<1.4,>=1.2.0 (from pycaret)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting scikit-learn>1.4.0 (from pycaret)
  Downloading scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting pyod>=1.1.3 (from pycaret)
  Downloading pyod-2.0.2.tar.gz (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.8/165.8 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadat

<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">2- Import Libraries</h1>

In [2]:
import pycaret
print(pycaret.__version__)

3.3.2


In [3]:
import pandas as pd 
import  numpy as np

from sklearn.model_selection import  train_test_split

from pycaret.regression import *

import warnings
warnings.filterwarnings('ignore')

<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">3- Load Data</h1>

In [4]:
df = pd.read_csv('/kaggle/input/laptop-prices/laptop_prices.csv')
df.head()

Unnamed: 0,Company,Product,TypeName,Inches,Ram,OS,Weight,Price_euros,Screen,ScreenW,...,RetinaDisplay,CPU_company,CPU_freq,CPU_model,PrimaryStorage,SecondaryStorage,PrimaryStorageType,SecondaryStorageType,GPU_company,GPU_model
0,Apple,MacBook Pro,Ultrabook,13.3,8,macOS,1.37,1339.69,Standard,2560,...,Yes,Intel,2.3,Core i5,128,0,SSD,No,Intel,Iris Plus Graphics 640
1,Apple,Macbook Air,Ultrabook,13.3,8,macOS,1.34,898.94,Standard,1440,...,No,Intel,1.8,Core i5,128,0,Flash Storage,No,Intel,HD Graphics 6000
2,HP,250 G6,Notebook,15.6,8,No OS,1.86,575.0,Full HD,1920,...,No,Intel,2.5,Core i5 7200U,256,0,SSD,No,Intel,HD Graphics 620
3,Apple,MacBook Pro,Ultrabook,15.4,16,macOS,1.83,2537.45,Standard,2880,...,Yes,Intel,2.7,Core i7,512,0,SSD,No,AMD,Radeon Pro 455
4,Apple,MacBook Pro,Ultrabook,13.3,8,macOS,1.37,1803.6,Standard,2560,...,Yes,Intel,3.1,Core i5,256,0,SSD,No,Intel,Iris Plus Graphics 650


In [5]:
print(df.head())
print(df.describe())
print(df.columns)

  Company      Product   TypeName  Inches  Ram     OS  Weight  Price_euros  \
0   Apple  MacBook Pro  Ultrabook    13.3    8  macOS    1.37      1339.69   
1   Apple  Macbook Air  Ultrabook    13.3    8  macOS    1.34       898.94   
2      HP       250 G6   Notebook    15.6    8  No OS    1.86       575.00   
3   Apple  MacBook Pro  Ultrabook    15.4   16  macOS    1.83      2537.45   
4   Apple  MacBook Pro  Ultrabook    13.3    8  macOS    1.37      1803.60   

     Screen  ScreenW  ...  RetinaDisplay CPU_company CPU_freq      CPU_model  \
0  Standard     2560  ...            Yes       Intel      2.3        Core i5   
1  Standard     1440  ...             No       Intel      1.8        Core i5   
2   Full HD     1920  ...             No       Intel      2.5  Core i5 7200U   
3  Standard     2880  ...            Yes       Intel      2.7        Core i7   
4  Standard     2560  ...            Yes       Intel      3.1        Core i5   

  PrimaryStorage  SecondaryStorage PrimaryStorageT

In [6]:
# split data into train and test set 
features = df.drop(columns='Price_euros')
target = df['Price_euros']
x_train,x_test,y_train,y_test = train_test_split(features,target,test_size=0.2)

# make two set one for [train set], another for [test set]
train_data = pd.concat([x_train,y_train],axis=1)
test_data = pd.concat([x_test,y_test],axis=1)

<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">4- Set Up Environment</h1>

In [7]:
s = setup(train_data, target='Price_euros', session_id=123)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Price_euros
2,Target type,Regression
3,Original data shape,"(1020, 23)"
4,Transformed data shape,"(1020, 68)"
5,Transformed train set shape,"(714, 68)"
6,Transformed test set shape,"(306, 68)"
7,Numeric features,8
8,Categorical features,14
9,Preprocess,True


<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">5- model Training</h1>

In [8]:
# model training and selection 
best_model = compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
knn,K Neighbors Regressor,233.9255,137033.0516,349.3671,0.718,0.2861,0.2307,0.277
ridge,Ridge Regression,255.9043,127856.4402,350.6778,0.7149,0.3734,0.3196,0.136
lasso,Lasso Regression,261.022,133246.5445,357.9582,0.7025,0.3779,0.3264,0.146
llar,Lasso Least Angle Regression,261.5102,133600.2628,358.5139,0.7016,0.3785,0.327,0.172
br,Bayesian Ridge,262.2864,134770.8879,360.0684,0.6992,0.3792,0.3285,0.138
catboost,CatBoost Regressor,262.3986,157254.9882,386.0829,0.6588,0.3766,0.3231,2.305
en,Elastic Net,283.6483,159117.8144,390.5194,0.6454,0.4001,0.3552,0.15
gbr,Gradient Boosting Regressor,283.0893,172063.6688,406.3084,0.619,0.4102,0.3664,0.601
et,Extra Trees Regressor,279.1394,175900.2943,407.743,0.6188,0.3964,0.3467,1.078
lightgbm,Light Gradient Boosting Machine,282.872,178001.9338,411.3816,0.6101,0.4103,0.3636,0.417


Processing:   0%|          | 0/85 [00:00<?, ?it/s]

<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">6- evaluate best model </h1>

In [9]:
evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">7- predict </h1>

In [10]:
predict = predict_model(best_model,data=test_data)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,228.4103,106464.4609,326.2889,0.7459,0.3106,0.2541


<h1 style="background-color: #f9f9f9; color: #333; padding: 10px; border-radius: 5px;">8- save </h1>

In [11]:
save_model(best_model,'my_saved_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['Inches', 'Ram', 'Weight',
                                              'ScreenW', 'ScreenH', 'CPU_freq',
                                              'PrimaryStorage',
                                              'SecondaryStorage'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(include=['Company', 'Product', 'TypeName',
                                              'OS', 'Screen', 'Touchscreen',
                                              'IPSpanel', 'RetinaDispla...
                                                                     'CPU_company',
                                                                     'PrimaryStorageType',
                                                                     'SecondaryStorageType',
                            