## 1.Installing Pycaret

In [None]:
#pip install pycaret

## 2.Import necessary libraries

In [2]:
import pandas as pd
from pycaret.regression import *

## Example Code: Startup Profit prediction with PyCaret

## 3.Importing 50 Startup dataset

In [3]:
data = pd.read_csv('50_Startups.csv')
data

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94
5,131876.9,99814.71,362861.36,New York,156991.12
6,134615.46,147198.87,127716.82,California,156122.51
7,130298.13,145530.06,323876.68,Florida,155752.6
8,120542.52,148718.95,311613.29,New York,152211.77
9,123334.88,108679.17,304981.62,California,149759.96


## 4.Initialize the setup
* The `setup()` function automatically preprocesses the data and splits it into training and testing sets.

In [4]:

regression = setup(data, target='Profit', session_id=123)
regression

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Profit
2,Target type,Regression
3,Original data shape,"(50, 5)"
4,Transformed data shape,"(50, 7)"
5,Transformed train set shape,"(35, 7)"
6,Transformed test set shape,"(15, 7)"
7,Numeric features,3
8,Categorical features,1
9,Preprocess,True


<pycaret.regression.oop.RegressionExperiment at 0x1dd3f03dc90>

## 5.Compare and evaluate the different models
* Next, we use the `compare_models()` function to compare and evaluate different regression models using default settings.
* PyCaret automatically evaluates models using various metrics, such as R-squared, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).

In [5]:
best_model = compare_models()
best_model

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,8439.491,151995494.0469,10096.8588,0.8889,0.1706,0.1561,0.045
xgboost,Extreme Gradient Boosting,7917.4076,118977639.2,10203.3711,0.858,0.1829,0.1224,0.099
br,Bayesian Ridge,7759.7853,117731002.5676,9473.9148,0.8562,0.1525,0.1384,0.02
en,Elastic Net,7873.8324,118420853.3844,9537.6059,0.8465,0.1529,0.1389,0.017
ridge,Ridge Regression,8100.0242,122473530.3254,9725.4221,0.8288,0.1535,0.1406,0.018
lr,Linear Regression,8152.8553,123518717.5635,9771.9435,0.8245,0.1537,0.141,0.723
lasso,Lasso Regression,8151.8396,123507256.6629,9771.4025,0.8245,0.1537,0.141,0.017
lar,Least Angle Regression,8152.9283,123519790.9567,9771.9813,0.8245,0.1537,0.141,0.018
llar,Lasso Least Angle Regression,8151.8396,123507256.6023,9771.4025,0.8245,0.1537,0.141,0.017
rf,Random Forest Regressor,9202.3593,163149994.941,10881.6519,0.8107,0.1824,0.1712,0.056


Processing:   0%|          | 0/81 [00:00<?, ?it/s]

## 6.Tune Hyperparameters the best model
* After identifying the best model, we fine-tune its hyperparameters using the `tune_model()` function.
* PyCaret performs an automatic hyperparameter search to find the optimal combination.

In [6]:
tuned_model = tune_model(best_model)
tuned_model

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,41067.0075,2028720304.1812,45041.3177,0.5425,0.8726,1.2692
1,3015.6459,16816551.6504,4100.7989,0.9071,0.032,0.0257
2,14260.7373,481324328.6195,21939.105,0.7453,0.3959,0.3281
3,23583.0006,785960623.7273,28034.9893,0.641,0.1992,0.1737
4,6518.6413,63461869.6309,7966.2959,0.9057,0.0704,0.0592
5,5083.55,32385171.6163,5690.7971,0.7305,0.0626,0.056
6,5795.0448,45830140.0508,6769.7962,0.9256,0.0594,0.0508
7,13838.1259,212391019.3541,14573.6413,0.5548,0.1381,0.1341
8,702.1902,1231349.8169,1109.662,0.9963,0.0135,0.0084
9,19946.0405,841606501.3523,29010.455,0.698,0.19,0.1384


Processing:   0%|          | 0/7 [00:00<?, ?it/s]

Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


## 7.Evaluate the model performance
* We then evaluate the performance of the tuned model using the `evaluate_model()` function.
* PyCaret generates a comprehensive report with various evaluation metrics and visualizations.

In [7]:
evaluate_model(tuned_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

## 8.Make predictions on new data
* To make predictions on new data, we load the new dataset, `new_data.csv`, and use the `predict_model()` function with the tuned model.

In [8]:
new_data = pd.read_csv('1000_Companies.csv')
predictions = predict_model(tuned_model, data=new_data)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,5305.4327,243313798.0882,15598.5191,0.8676,0.0839,0.0452


## 9.Save the model for future use
* Finally, we save the tuned model using the `save_model()` function for future use.

In [9]:
save_model(tuned_model, 'Profit_prediction_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['R&D Spend', 'Administration',
                                              'Marketing Spend'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(include=['State'],
                                     transformer=SimpleImputer(strategy='most_frequent'))),
                 ('onehot_encoding',
                  TransformerWrapper(include=['State'],
                                     transformer=OneHotEncoder(cols=['State'],
                                                               handle_missing='return_nan',
                                                               use_cat_names=True))),
                 ('clean_column_names',
                  TransformerWrapper(transformer=CleanColumnNames())),
                 ('trained_model',
                  ExtraTr

## 10.Conclusion

PyCaret offers an accessible and efficient approach to utilizing machine learning for a wide range of tasks. With its extensive feature set and streamlined workflow, PyCaret enables users to quickly prototype and deploy machine learning models. Whether you are new to the field or an experienced data scientist, PyCaret can greatly simplify the machine learning process, allowing you to concentrate on extracting valuable insights from your data.