<h1 style="text-align: center;">Ensemble Learning: Boosting</h1>

#### Sources:
- https://www.analyticsvidhya.com/blog/2021/04/best-boosting-algorithm-in-machine-learning-in-2021/
- https://medium.com/analytics-vidhya/catboost-101-fb2fdc3398f3
#### Dataset:
- https://www.kaggle.com/datasets/uciml/mushroom-classification?select=mushrooms.csv

#### Boosting is a method used in machine learning to improve the accuracy
#### of a model by combining the predictions of multiple weaker models.
#### The idea is to train a sequence of models in a stage-wise manner,
#### where each model tries to correct the mistakes of the previous model.
#### The final prediction is made by combining the predictions of all
#### the models in the sequence, typically through a weighted majority vote.
#### One of the most popular boosting algorithms is called AdaBoost.

### Importing libraries

In [7]:
from catboost import CatBoostClassifier
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier 
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
import warnings
import xgboost as xgb

### Settings

In [8]:
warnings.filterwarnings("ignore")

### Getting Data

In [3]:
df = pd.read_csv('./datasets/mushrooms.csv')
df = df.sample(frac = 1)

print("Dataframe:")
print("--------")
print(df)
print("-------------------------------------------------------------------")
print("Columns:")
print("--------")
print(df.columns)
print("-------------------------------------------------------------------")

for label in df.columns: 
    df[label]=LabelEncoder().fit(df[label]).transform(df[label])

print("Info:")
print("-----")
print(df.info())

X = df.drop(['class'], axis=1)
    
y = df['class']    

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Dataframe:
--------
     class cap-shape cap-surface cap-color bruises odor gill-attachment  \
2600     e         f           f         n       t    n               f   
1309     e         f           s         g       f    n               f   
6988     p         f           s         e       f    s               f   
4826     p         b           f         y       f    n               f   
3438     e         f           f         e       t    n               f   
...    ...       ...         ...       ...     ...  ...             ...   
2592     e         f           f         n       t    n               f   
1630     p         f           s         w       t    p               f   
959      e         x           y         y       t    l               f   
1617     e         f           f         g       f    n               f   
2606     e         x           f         e       t    n               f   

     gill-spacing gill-size gill-color  ... stalk-surface-below-ring  \
2600   

## Gradient Boosting

#### Gradient boosting is a method used to improve the accuracy of a model
#### by training an ensemble of decision trees. It uses gradient descent
#### to minimize the loss function by fitting a new tree at each iteration,
#### and adding it to the ensemble with a weight determined by the current loss.
#### The final prediction is made by combining the predictions of all the
#### decision trees in the ensemble.

### Implementation

In [4]:
cl = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)
cl.fit(X_train, y_train)

y_pred = cl.predict(X_test)

predictions = accuracy_score(y_test,y_pred)

print("Accuracy of Gradient Boosting: ", predictions)

Accuracy of Gradient Boosting:  1.0


## AdaBoost

#### AdaBoost, short for Adaptive Boosting, is a boosting algorithm used to
#### improve the accuracy of a model by combining the predictions of multiple
#### weaker models. It works by fitting a sequence of models to the data,
#### where each model tries to correct the mistakes of the previous model.
#### The final prediction is made by combining the predictions of all the
#### models in the sequence, typically through a weighted majority vote.

### Implementation

In [9]:
dtree = DecisionTreeClassifier()

cl = AdaBoostClassifier(n_estimators=100, estimator=dtree, learning_rate=1)
cl.fit(X_train,y_train)

y_pred = cl.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)

print("Accuracy of AdaBoost: ", accuracy)

Accuracy of AdaBoost:  1.0


## XGBoost - improvised version of the gradient boosting algorithm

#### XGBoost (eXtreme Gradient Boosting) is an open-source implementation
#### of the gradient boosting algorithm. It is designed to be efficient and
#### scalable, making it a popular choice for large-scale machine learning
#### tasks such as classification and regression. XGBoost is known for its
#### high performance, fast training times, and ability to handle missing
#### values and large datasets.

### Implementation

In [12]:
xgb_model = xgb.XGBClassifier(learning_rate=0.001, max_depth=2, n_estimators=100) # learning_rate=0.01, max_depth=20
xgb_model.fit(X_train, y_train)

y_pred = xgb_model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)

print("Accuracy of XGBoost: ", accuracy)

Accuracy of XGBoost:  0.9181538461538462


## Catboost - Works well on Heterogeneous data.

#### CatBoost is a gradient boosting algorithm that is specifically designed to
#### handle categorical features in the dataset without any preprocessing
#### required. It uses a novel technique called "permutation-based algorithm"
#### to handle categorical variables by calculating the optimal split points on
#### the categorical feature rather than the one-hot encoding method. It also has
#### built-in handling of missing values, and it is efficient for large datasets
#### and has built-in visualization tools to help understand the model.

### Implementation

In [15]:
cat_model = CatBoostClassifier(
    iterations = 1000, # 1000 are ideal
    loss_function='MultiClass',
    bootstrap_type = "Bayesian",
    eval_metric = 'MultiClass',
    leaf_estimation_iterations = 100,
    random_strength = 0.5,
    depth = 7,
    l2_leaf_reg = 5,
    learning_rate=0.1,
    bagging_temperature = 0.5,
    task_type = "CPU",
)

# training the model
cat_model.fit(X_train,y_train)

# predicting the model output
y_pred_cat = cat_model.predict(X_test)

0:	learn: 0.3549149	total: 73.6ms	remaining: 1m 13s
1:	learn: 0.1756214	total: 130ms	remaining: 1m 4s
2:	learn: 0.1100814	total: 191ms	remaining: 1m 3s
3:	learn: 0.0673281	total: 249ms	remaining: 1m 2s
4:	learn: 0.0362715	total: 306ms	remaining: 1m
5:	learn: 0.0256025	total: 365ms	remaining: 1m
6:	learn: 0.0180359	total: 422ms	remaining: 59.8s
7:	learn: 0.0142933	total: 481ms	remaining: 59.6s
8:	learn: 0.0114938	total: 541ms	remaining: 59.5s
9:	learn: 0.0096903	total: 602ms	remaining: 59.6s
10:	learn: 0.0066270	total: 657ms	remaining: 59.1s
11:	learn: 0.0042707	total: 715ms	remaining: 58.8s
12:	learn: 0.0034498	total: 770ms	remaining: 58.4s
13:	learn: 0.0028480	total: 826ms	remaining: 58.1s
14:	learn: 0.0022550	total: 881ms	remaining: 57.9s
15:	learn: 0.0019123	total: 937ms	remaining: 57.6s
16:	learn: 0.0015459	total: 993ms	remaining: 57.4s
17:	learn: 0.0011382	total: 1.05s	remaining: 57.3s
18:	learn: 0.0009905	total: 1.11s	remaining: 57.2s
19:	learn: 0.0008883	total: 1.17s	remaining: 

164:	learn: 0.0000189	total: 9.71s	remaining: 49.2s
165:	learn: 0.0000187	total: 9.77s	remaining: 49.1s
166:	learn: 0.0000186	total: 9.82s	remaining: 49s
167:	learn: 0.0000185	total: 9.88s	remaining: 49s
168:	learn: 0.0000185	total: 9.94s	remaining: 48.9s
169:	learn: 0.0000184	total: 10s	remaining: 48.8s
170:	learn: 0.0000182	total: 10.1s	remaining: 48.8s
171:	learn: 0.0000182	total: 10.1s	remaining: 48.7s
172:	learn: 0.0000181	total: 10.2s	remaining: 48.6s
173:	learn: 0.0000179	total: 10.2s	remaining: 48.6s
174:	learn: 0.0000178	total: 10.3s	remaining: 48.5s
175:	learn: 0.0000177	total: 10.3s	remaining: 48.4s
176:	learn: 0.0000177	total: 10.4s	remaining: 48.4s
177:	learn: 0.0000176	total: 10.5s	remaining: 48.3s
178:	learn: 0.0000175	total: 10.5s	remaining: 48.3s
179:	learn: 0.0000173	total: 10.6s	remaining: 48.2s
180:	learn: 0.0000172	total: 10.6s	remaining: 48.2s
181:	learn: 0.0000171	total: 10.7s	remaining: 48.1s
182:	learn: 0.0000170	total: 10.8s	remaining: 48s
183:	learn: 0.000017

324:	learn: 0.0000087	total: 19s	remaining: 39.5s
325:	learn: 0.0000087	total: 19.1s	remaining: 39.4s
326:	learn: 0.0000087	total: 19.1s	remaining: 39.4s
327:	learn: 0.0000087	total: 19.2s	remaining: 39.3s
328:	learn: 0.0000086	total: 19.2s	remaining: 39.2s
329:	learn: 0.0000086	total: 19.3s	remaining: 39.2s
330:	learn: 0.0000086	total: 19.4s	remaining: 39.1s
331:	learn: 0.0000086	total: 19.4s	remaining: 39.1s
332:	learn: 0.0000086	total: 19.5s	remaining: 39s
333:	learn: 0.0000085	total: 19.5s	remaining: 38.9s
334:	learn: 0.0000085	total: 19.6s	remaining: 38.9s
335:	learn: 0.0000085	total: 19.6s	remaining: 38.8s
336:	learn: 0.0000085	total: 19.7s	remaining: 38.8s
337:	learn: 0.0000084	total: 19.8s	remaining: 38.7s
338:	learn: 0.0000083	total: 19.8s	remaining: 38.7s
339:	learn: 0.0000083	total: 19.9s	remaining: 38.6s
340:	learn: 0.0000083	total: 20s	remaining: 38.6s
341:	learn: 0.0000083	total: 20s	remaining: 38.5s
342:	learn: 0.0000082	total: 20.1s	remaining: 38.5s
343:	learn: 0.000008

484:	learn: 0.0000056	total: 28.3s	remaining: 30s
485:	learn: 0.0000055	total: 28.3s	remaining: 30s
486:	learn: 0.0000055	total: 28.4s	remaining: 29.9s
487:	learn: 0.0000055	total: 28.4s	remaining: 29.8s
488:	learn: 0.0000055	total: 28.5s	remaining: 29.8s
489:	learn: 0.0000055	total: 28.6s	remaining: 29.7s
490:	learn: 0.0000055	total: 28.6s	remaining: 29.7s
491:	learn: 0.0000055	total: 28.7s	remaining: 29.6s
492:	learn: 0.0000055	total: 28.7s	remaining: 29.6s
493:	learn: 0.0000055	total: 28.8s	remaining: 29.5s
494:	learn: 0.0000054	total: 28.9s	remaining: 29.4s
495:	learn: 0.0000054	total: 28.9s	remaining: 29.4s
496:	learn: 0.0000054	total: 29s	remaining: 29.3s
497:	learn: 0.0000054	total: 29.1s	remaining: 29.3s
498:	learn: 0.0000054	total: 29.1s	remaining: 29.2s
499:	learn: 0.0000054	total: 29.2s	remaining: 29.2s
500:	learn: 0.0000054	total: 29.2s	remaining: 29.1s
501:	learn: 0.0000054	total: 29.3s	remaining: 29.1s
502:	learn: 0.0000054	total: 29.4s	remaining: 29s
503:	learn: 0.000005

646:	learn: 0.0000039	total: 38.2s	remaining: 20.8s
647:	learn: 0.0000039	total: 38.2s	remaining: 20.8s
648:	learn: 0.0000039	total: 38.3s	remaining: 20.7s
649:	learn: 0.0000039	total: 38.4s	remaining: 20.7s
650:	learn: 0.0000039	total: 38.4s	remaining: 20.6s
651:	learn: 0.0000039	total: 38.5s	remaining: 20.5s
652:	learn: 0.0000039	total: 38.5s	remaining: 20.5s
653:	learn: 0.0000039	total: 38.6s	remaining: 20.4s
654:	learn: 0.0000039	total: 38.7s	remaining: 20.4s
655:	learn: 0.0000039	total: 38.7s	remaining: 20.3s
656:	learn: 0.0000039	total: 38.8s	remaining: 20.3s
657:	learn: 0.0000038	total: 38.9s	remaining: 20.2s
658:	learn: 0.0000038	total: 38.9s	remaining: 20.1s
659:	learn: 0.0000038	total: 39s	remaining: 20.1s
660:	learn: 0.0000038	total: 39s	remaining: 20s
661:	learn: 0.0000038	total: 39.1s	remaining: 20s
662:	learn: 0.0000038	total: 39.2s	remaining: 19.9s
663:	learn: 0.0000038	total: 39.2s	remaining: 19.9s
664:	learn: 0.0000038	total: 39.3s	remaining: 19.8s
665:	learn: 0.000003

806:	learn: 0.0000030	total: 47.9s	remaining: 11.5s
807:	learn: 0.0000030	total: 48s	remaining: 11.4s
808:	learn: 0.0000030	total: 48s	remaining: 11.3s
809:	learn: 0.0000030	total: 48.1s	remaining: 11.3s
810:	learn: 0.0000030	total: 48.1s	remaining: 11.2s
811:	learn: 0.0000030	total: 48.2s	remaining: 11.2s
812:	learn: 0.0000030	total: 48.3s	remaining: 11.1s
813:	learn: 0.0000030	total: 48.3s	remaining: 11s
814:	learn: 0.0000030	total: 48.4s	remaining: 11s
815:	learn: 0.0000030	total: 48.4s	remaining: 10.9s
816:	learn: 0.0000030	total: 48.5s	remaining: 10.9s
817:	learn: 0.0000030	total: 48.5s	remaining: 10.8s
818:	learn: 0.0000030	total: 48.6s	remaining: 10.7s
819:	learn: 0.0000030	total: 48.6s	remaining: 10.7s
820:	learn: 0.0000030	total: 48.7s	remaining: 10.6s
821:	learn: 0.0000030	total: 48.8s	remaining: 10.6s
822:	learn: 0.0000030	total: 48.8s	remaining: 10.5s
823:	learn: 0.0000030	total: 48.9s	remaining: 10.4s
824:	learn: 0.0000030	total: 48.9s	remaining: 10.4s
825:	learn: 0.000003

966:	learn: 0.0000025	total: 57.4s	remaining: 1.96s
967:	learn: 0.0000025	total: 57.5s	remaining: 1.9s
968:	learn: 0.0000025	total: 57.5s	remaining: 1.84s
969:	learn: 0.0000025	total: 57.6s	remaining: 1.78s
970:	learn: 0.0000025	total: 57.6s	remaining: 1.72s
971:	learn: 0.0000025	total: 57.7s	remaining: 1.66s
972:	learn: 0.0000025	total: 57.8s	remaining: 1.6s
973:	learn: 0.0000025	total: 57.8s	remaining: 1.54s
974:	learn: 0.0000025	total: 57.9s	remaining: 1.48s
975:	learn: 0.0000025	total: 58s	remaining: 1.43s
976:	learn: 0.0000024	total: 58s	remaining: 1.36s
977:	learn: 0.0000024	total: 58.1s	remaining: 1.31s
978:	learn: 0.0000024	total: 58.1s	remaining: 1.25s
979:	learn: 0.0000024	total: 58.2s	remaining: 1.19s
980:	learn: 0.0000024	total: 58.2s	remaining: 1.13s
981:	learn: 0.0000024	total: 58.3s	remaining: 1.07s
982:	learn: 0.0000024	total: 58.4s	remaining: 1.01s
983:	learn: 0.0000024	total: 58.4s	remaining: 950ms
984:	learn: 0.0000024	total: 58.5s	remaining: 891ms
985:	learn: 0.0000

In [17]:
# printing the accuracy of the tuned model
print("Accuracy of CatBoost: ", accuracy_score(y_test,y_pred_cat))

# confusion metrics of the LightGBM and plotting the same
confusion_matrix_LightGBM = confusion_matrix(y_test,y_pred_cat)
print(confusion_matrix_LightGBM)

Accuracy of CatBoost:  1.0
[[853   0]
 [  0 772]]


<h2 style="text-align: center;">Vielen Dank für die Aufmerksamkeit.</h2>