# **PyCaret**
Inspired by [Greg Hogg](https://www.youtube.com/watch?v=NbBoZQZ3bxo).

[PyCaret](https://pycaret.gitbook.io/docs/get-started/tutorials) promises **low code** Machine Learning.

In [1]:
# Restart runtime after running this!
# !pip install --pre pycaret

## Regression
`Regression` is used to predict quantitative variables (e.g. housing prices, insurance premiums) based on a given set of features.

In [2]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/timothypesi/Data-Sets-For-Machine-Learning-/main/california_housing_train.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17000 entries, 0 to 16999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           17000 non-null  float64
 1   latitude            17000 non-null  float64
 2   housing_median_age  17000 non-null  float64
 3   total_rooms         17000 non-null  float64
 4   total_bedrooms      17000 non-null  float64
 5   population          17000 non-null  float64
 6   households          17000 non-null  float64
 7   median_income       17000 non-null  float64
 8   median_house_value  17000 non-null  float64
dtypes: float64(9)
memory usage: 1.2 MB


In [3]:
from pycaret.regression import *

s = setup(df, target='median_house_value')

Unnamed: 0,Description,Value
0,Session id,8365
1,Target,median_house_value
2,Target type,Regression
3,Original data shape,"(17000, 9)"
4,Transformed data shape,"(17000, 9)"
5,Transformed train set shape,"(11900, 9)"
6,Transformed test set shape,"(5100, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


In [4]:
best = compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lightgbm,Light Gradient Boosting Machine,32604.5785,2337020176.4911,48318.7329,0.8272,0.2376,0.1814,0.207
rf,Random Forest Regressor,33008.9457,2535422814.385,50328.261,0.8126,0.2381,0.1813,4.601
et,Extra Trees Regressor,35709.3854,2833030403.4134,53200.1279,0.7905,0.2497,0.1963,1.918
gbr,Gradient Boosting Regressor,38027.854,3007075468.0443,54805.5873,0.7777,0.268,0.2143,1.422
lasso,Lasso Regression,50682.5887,4741509043.2,68795.9672,0.6497,0.4224,0.303,0.059
ridge,Ridge Regression,50682.3668,4741508249.6,68795.9621,0.6497,0.4238,0.303,0.031
llar,Lasso Least Angle Regression,50682.8102,4741499161.6,68795.8984,0.6497,0.4225,0.303,0.033
br,Bayesian Ridge,50680.7625,4741506432.0,68795.9578,0.6497,0.4189,0.303,0.03
lr,Linear Regression,50682.9141,4741538611.2,68796.175,0.6496,0.4232,0.303,0.257
dt,Decision Tree Regressor,45076.9955,5024584218.8694,70795.7906,0.629,0.3263,0.2411,0.108


In [5]:
finalize_model(best)

In [6]:
evaluate_model(best)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [7]:
predict_model(best)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Light Gradient Boosting Machine,32528.4274,2340428482.2405,48377.9752,0.8235,0.2416,0.1853


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,prediction_label
11900,-117.110001,32.910000,15.0,1840.0,235.0,855.0,241.0,7.5992,310600.0,327266.251812
11901,-118.150002,33.849998,30.0,4071.0,1067.0,2144.0,970.0,2.7268,218100.0,195707.940162
11902,-121.949997,37.810001,5.0,7178.0,898.0,2823.0,907.0,9.0776,450400.0,419818.755685
11903,-120.959999,41.119999,29.0,779.0,136.0,364.0,123.0,2.5000,59200.0,66263.394737
11904,-122.080002,37.389999,46.0,1115.0,248.0,543.0,248.0,3.2083,334300.0,254538.081485
...,...,...,...,...,...,...,...,...,...,...
16995,-120.459999,34.709999,17.0,2830.0,430.0,1035.0,416.0,4.9292,207200.0,222598.550969
16996,-120.250000,38.040001,22.0,4173.0,763.0,1086.0,444.0,2.5562,136200.0,116049.435497
16997,-117.239998,32.810001,33.0,1588.0,289.0,683.0,301.0,5.4103,332400.0,295825.365610
16998,-117.709999,34.080002,29.0,1276.0,283.0,1216.0,316.0,2.5972,134300.0,116210.170847


In [8]:
test_df = pd.read_csv('https://raw.githubusercontent.com/timothypesi/Data-Sets-For-Machine-Learning-/main/california_housing_test.csv')

predict_model(best, test_df)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Light Gradient Boosting Machine,32670.4902,2395166783.0923,48940.4412,0.8128,0.2452,0.1849


Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,prediction_label
0,-122.050003,37.369999,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0,395354.292351
1,-118.300003,34.259998,43.0,1510.0,310.0,809.0,277.0,3.5990,176500.0,195596.582312
2,-117.809998,33.779999,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0,256908.972669
3,-118.360001,33.820000,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0,303431.071641
4,-119.669998,36.330002,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0,72733.402552
...,...,...,...,...,...,...,...,...,...,...
2995,-119.860001,34.419998,23.0,1450.0,642.0,1258.0,607.0,1.1790,225000.0,238061.496340
2996,-118.139999,34.060001,27.0,5257.0,1082.0,3496.0,1036.0,3.3906,237200.0,215523.976691
2997,-119.699997,36.299999,10.0,956.0,201.0,693.0,220.0,2.2895,62000.0,66963.331178
2998,-117.120003,34.099998,40.0,96.0,14.0,46.0,14.0,3.2708,162500.0,87147.046880
