### **Automated library-lazypredict for Regression and Classification**


In this notebook, I'm going to show how to use  the lazypredict (automated) library to develop a classification model and regression model. So, I wil use bank_ensample data set also, I did this prbolem in manualy with hyper parameter tuning and using DNN: https://github.com/KrishArul26/bank-full_data.csv-Ensemble-Techniques-and-DNN-/blob/main/bank_full_Ensample_Analysis.ipynb

DATASET: https://raw.githubusercontent.com/KrishArul26/Data_set/main/bank-full.csv

For more information:https://lazypredict.readthedocs.io/en/latest/

Automated Machine Learning, often abbreviated as AutoML, is an emerging field in which the process of building machine learning models to model data is automated. AutoML has the capability to make modelling easier and more accessible for everyone.

### **lazypredict**

Lazy Predict helps build a lot of basic models without much code and helps understand which models works better without any parameter tuning.

**It doesn't take care of NA values. So, we have to deal with NA values manually.**


In [None]:
pip install lazypredict

In [42]:
import lazypredict

### **Loading The Dataset**

In [31]:
import pandas as pd
import numpy as np

url="https://raw.githubusercontent.com/KrishArul26/Data_set/main/bank-full.csv"
bank = pd.read_csv(url,sep=";")

In [15]:
bank=bank.iloc[:10000,:] #This dataset contains more than 40000 observation I just take only 10000 observation from that

In [16]:
bank.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        10000 non-null  int64 
 1   job        10000 non-null  object
 2   marital    10000 non-null  object
 3   education  10000 non-null  object
 4   default    10000 non-null  object
 5   balance    10000 non-null  int64 
 6   housing    10000 non-null  object
 7   loan       10000 non-null  object
 8   contact    10000 non-null  object
 9   day        10000 non-null  int64 
 10  month      10000 non-null  object
 11  duration   10000 non-null  int64 
 12  campaign   10000 non-null  int64 
 13  pdays      10000 non-null  int64 
 14  previous   10000 non-null  int64 
 15  poutcome   10000 non-null  object
 16  y          10000 non-null  object
dtypes: int64(7), object(10)
memory usage: 1.3+ MB


### **Train-Test Split**

In [17]:
import lazypredict

from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import train_test_split


X=bank.iloc[:,:-1]
y=bank.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.20,random_state =100)



### **Classification Probelms**

In [18]:
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

print(models)


100%|██████████| 29/29 [00:14<00:00,  1.94it/s]

                               Accuracy  ...  Time Taken
Model                                    ...            
NearestCentroid                    0.93  ...        0.11
GaussianNB                         0.95  ...        0.08
LinearDiscriminantAnalysis         0.96  ...        0.20
XGBClassifier                      0.97  ...        0.92
DecisionTreeClassifier             0.95  ...        0.10
LabelSpreading                     0.96  ...        4.16
LabelPropagation                   0.96  ...        2.87
LGBMClassifier                     0.97  ...        0.25
BaggingClassifier                  0.97  ...        0.24
ExtraTreeClassifier                0.95  ...        0.08
CalibratedClassifierCV             0.97  ...        1.31
LogisticRegression                 0.97  ...        0.23
AdaBoostClassifier                 0.96  ...        0.74
RandomForestClassifier             0.97  ...        0.69
KNeighborsClassifier               0.97  ...        0.38
LinearSVC                      




This lazypredictor used 26 algorithms to classified the dependent variables. We can see in the models So, now we  are going to select which are  algorithms have  accuracy above  95% 

In [29]:
#### Which modles have accuracy above 95%

models[models['Accuracy']>0.95]

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LinearDiscriminantAnalysis,0.96,0.77,,0.96,0.2
XGBClassifier,0.97,0.68,,0.97,0.92
LabelSpreading,0.96,0.65,,0.96,4.16
LabelPropagation,0.96,0.65,,0.96,2.87
LGBMClassifier,0.97,0.65,,0.97,0.25
BaggingClassifier,0.97,0.64,,0.97,0.24
CalibratedClassifierCV,0.97,0.63,,0.96,1.31
LogisticRegression,0.97,0.63,,0.96,0.23
AdaBoostClassifier,0.96,0.62,,0.96,0.74
RandomForestClassifier,0.97,0.62,,0.97,0.69


#### Finally, We can choose the first five models which have good precision and recall or F1 values then we can do manually hyperparameter tuning and finalize our ML model

### **Regression Problems**

In [43]:
import numpy as np

url="https://raw.githubusercontent.com/KrishArul26/Data_set/main/Airquality_index.csv"
air_quality= pd.read_csv(url,sep=",")

In [44]:
air_quality.head()

Unnamed: 0,T,TM,Tm,SLP,H,VV,V,VM,PM 2.5
0,7.4,9.8,4.8,1017.6,93.0,0.5,4.3,9.4,219.72
1,7.8,12.7,4.4,1018.5,87.0,0.6,4.4,11.1,182.19
2,6.7,13.4,2.4,1019.4,82.0,0.6,4.8,11.1,154.04
3,8.6,15.5,3.3,1018.7,72.0,0.8,8.1,20.6,223.21
4,12.4,20.9,4.4,1017.3,61.0,1.3,8.7,22.2,200.65


**Dealing with NA values in the dataset**

In [48]:
air_quality.isna().sum()

T         0
TM        0
Tm        0
SLP       0
H         0
VV        0
V         0
VM        0
PM 2.5    1
dtype: int64

In [49]:
air_quality=air_quality.dropna()

In [50]:
air_quality.isna().sum()

T         0
TM        0
Tm        0
SLP       0
H         0
VV        0
V         0
VM        0
PM 2.5    0
dtype: int64

In [53]:
air_quality.head()

Unnamed: 0,T,TM,Tm,SLP,H,VV,V,VM,PM 2.5
0,7.4,9.8,4.8,1017.6,93.0,0.5,4.3,9.4,219.72
1,7.8,12.7,4.4,1018.5,87.0,0.6,4.4,11.1,182.19
2,6.7,13.4,2.4,1019.4,82.0,0.6,4.8,11.1,154.04
3,8.6,15.5,3.3,1018.7,72.0,0.8,8.1,20.6,223.21
4,12.4,20.9,4.4,1017.3,61.0,1.3,8.7,22.2,200.65


In [54]:
from lazypredict.Supervised import LazyRegressor
from sklearn.model_selection import train_test_split


X=air_quality.iloc[:,:-1]
y=air_quality.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.20,random_state =100)

In [55]:
regressor= LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None)
models, predictions = regressor.fit(X_train, X_test, y_train, y_test)

 29%|██▊       | 12/42 [00:00<00:01, 17.04it/s]

GammaRegressor model failed to execute
Some value(s) of y are out of the valid range for family GammaDistribution


100%|██████████| 42/42 [00:03<00:00, 10.78it/s]


In [56]:
models

Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ExtraTreesRegressor,0.86,0.86,32.67,0.23
XGBRegressor,0.83,0.83,35.84,0.08
HistGradientBoostingRegressor,0.81,0.82,37.64,0.35
LGBMRegressor,0.81,0.81,38.2,0.09
RandomForestRegressor,0.8,0.81,38.43,0.4
BaggingRegressor,0.79,0.8,39.83,0.05
GradientBoostingRegressor,0.72,0.73,45.95,0.19
KNeighborsRegressor,0.65,0.66,51.4,0.01
GaussianProcessRegressor,0.62,0.64,53.04,0.1
DecisionTreeRegressor,0.61,0.62,54.33,0.01


**Conclusion**

#### Finally, It automated library is very helpful to make a decision quickly and get the ideas which are the best suitable algorithm for classification problems or regressor problems. Further, we can do hyperparameter tunning manually. And find the best suitable parameters for each problem.