#Classification using Bagging and Boosting

In this colab, we will take Abalone data set and apply different techniques of bagging and boosting. In particular, we will apply the following techniques:

**Bagging:**
* `sklearn.ensemble.BaggingClassifier`
* `sklearn.ensemble.RandomForestClassifier`

**Boosting:**
* `sklearn.ensemble.GradientBoostingClassifier`
* `sklearn.ensemble.AdaBoostClassifier`

We will also apply **VotingClassifier** that is implemented in sklearn as:

`sklearn.ensemble.VotingClassifier`



Below, we provide details of most relevant parameters of the constructors of these classes:

# `sklearn.ensemble.BaggingClassifier`

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregates their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a base estimator (e.g., a decision tree).

If the samples are drawn with replacement, then the method is known as **Bagging** wheras, if the samples are drawn without replacement, the method is known as **pasting**.

Following are some useful parameters:

* `base_estimator:` object, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a DecisionTreeClassifier.

* `n_estimators:` int, default=10

The number of base estimators in the ensemble.

* `max_samples:` int or float, default=1.0

The number of samples to draw from X to train each base estimator (with replacement by default) 

* `max_features:` int or float, default=1.0

The number of features to draw from X to train each base estimator (without replacement by default)

* `bootstrap:` bool, default=True

Whether samples are drawn with replacement. If False, sampling without replacement is performed.



# `sklearn.ensemble.RandomForestClassifier`

Random forests differ from bagging by forcing the tree to use only a subset of its available features while constructing each tree. All the decision trees that make up a random forest are different because each tree is built on a different random subset of data

Following are some useful parameters:
* `n_estimators:` int, default=100

The number of trees in the forest.

* `criterion:` {“gini”, “entropy”}, default=”gini”

The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. 

* `max_depth:` int, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

* `min_samples_split:` int or float, default=2

The minimum number of samples required to split an internal node.

* `min_samples_leaf:` int or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. 

* `min_impurity_decrease:` float, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

* `bootstrap:` bool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.


#sklearn.ensemble.GradientBoostingClassifier

In Gradient Boosting, each predictor tries to improve on its predecessor by reducing the errors. Instead of fitting a predictor on the data at each iteration, GradientBoostingClassifier fits a new predictor to the residual errors made by the previous predictor. 

Following are some useful parameters:

* `n_estimators:` int, default=100

The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

* `learning_rate:` float, default=0.1

Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

* `max_features:` int or float, default = None
The number of features to consider when looking for the best split.

* `max_depth:` int, default=3

The maximum depth of the individual estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

* `min_impurity_decrease:` float, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.


#sklearn.ensemble.AdaBoostClassifier

The basic concept behind Adaboost is to set the weights of classifiers and training the data sample in each iteration such that it ensures the accurate predictions of unusual observations. 

Following are some useful parameters:

* `base_estimator:` object, default=None

The base estimator from which the boosted ensemble is built. If None, then the base estimator is DecisionTreeClassifier initialized with max_depth=1.

* `n_estimators:` int, default=50

The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

* `learning_rate:` float, default=1.0

Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the learning_rate and n_estimators parameters


#sklearn.ensemble.VotingClassifier

A Voting Classifier is a machine learning model that trains on an ensemble of numerous models and predicts an output (class) based on their highest probability of chosen class.

It aggregates the findings of each classifier passed into Voting Classifier and predicts the output class based on the highest majority of voting. The idea is that instead of creating separate dedicated models and finding the accuracy for each of them, we create a single model which trains by these models and predicts output based on their combined majority of voting for each output class.

It is of two types:

* **Hard Voting:** In hard voting, the predicted output class is a class with the highest majority of votes. Suppose five classifiers predicted the output class (A, B, B, A, B), so here the majority predicted B as output. Hence B will be the final prediction.

* **Soft Voting:** In soft voting, the output class is the prediction based on the average of probability given to that class. Suppose given some input to three models, the prediction probability for the classes A and B by the five predictors are (0.2, 0.8), (0.5, 0.5), (0.8, 0.2), (0.6, 0.4) and (0.3, 0.7) respectively. So the average for class A is 0.48 and B is 0.52, the winner is class B because it has more average probability.

**Parameters:**

* `estimators:` list of (str, estimator) tuples

* `voting:` {‘hard’, ‘soft’}, default=’hard’
If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities.




Let us now apply these different models on Abalone data set.

#Loading the data set

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
column_names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight', 'Rings']
abalone_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', header=None, names=column_names)


In [None]:
abalone_data.shape

(4177, 9)

In [None]:
abalone_data.head(10)

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
5,I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
6,F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
7,F,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
8,M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
9,F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19


In [None]:
X = abalone_data.iloc[:, :-1]
y = abalone_data.iloc[:, -1]


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

In [None]:
numeric_features = ['Length', 'Diameter',	'Height',	'Whole weight',	'Shucked weight',	'Viscera weight',	'Shell weight']
categorical_features = ["Sex"]


In [None]:
numeric_transformer = Pipeline(
    steps=[("imputer", SimpleImputer(missing_values = 0, strategy="constant", fill_value = 0.107996)), ("scaler", StandardScaler())]
)


In [None]:
categorical_transformer = OneHotEncoder(handle_unknown="ignore")



In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features),
    ]
)

In [None]:
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", BaggingClassifier())]
)

In [None]:
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))

model score: 0.224




---



#BaggingClassifier

In [None]:
from sklearn.model_selection import cross_val_score
acc = cross_val_score(estimator = clf, X = X_train, y = y_train, cv = 10)
print(type(acc))
print('Accuracy of each fold ', list(acc*100))
print("Accuracy: {:.2f} %".format(acc.mean()*100))



<class 'numpy.ndarray'>
Accuracy of each fold  [20.8955223880597, 24.251497005988025, 21.55688622754491, 20.059880239520957, 21.856287425149702, 23.353293413173652, 24.251497005988025, 27.245508982035926, 24.550898203592812, 20.35928143712575]
Accuracy: 22.84 %


In [None]:
X_train_new = preprocessor.fit_transform(X_train)


In [None]:
from sklearn.model_selection import GridSearchCV
tuned_parameters = [{'n_estimators': [10, 50, 100, 500],
                    'max_samples': [0.05, 0.1, 0.2, 0.5]
                     }]
scores = ['recall']
for score in scores:
    
    print()
    print(f"Tuning hyperparameters for {score}")
    print()
    
    clf_CV = GridSearchCV(
        BaggingClassifier(), tuned_parameters,
        scoring = f'{score}_macro'
    )
    clf_CV.fit(X_train_new, y_train)
    
    print("Best parameters:")
    print()
    print(clf_CV.best_params_)
    print()
    print("Grid scores:")
    means = clf_CV.cv_results_["mean_test_score"]
    stds = clf_CV.cv_results_["std_test_score"]
    for mean, std, params in zip(means, stds,
                                 clf_CV.cv_results_['params']):
        print(f"{mean:0.3f} (+/-{std*2:0.03f}) for {params}")



Tuning hyperparameters for recall



  _warn_prf(average, modifier, msg_start, len(result))


Best parameters:

{'max_samples': 0.2, 'n_estimators': 500}

Grid scores:
0.135 (+/-0.014) for {'max_samples': 0.05, 'n_estimators': 10}
0.143 (+/-0.019) for {'max_samples': 0.05, 'n_estimators': 50}
0.138 (+/-0.016) for {'max_samples': 0.05, 'n_estimators': 100}
0.141 (+/-0.019) for {'max_samples': 0.05, 'n_estimators': 500}
0.122 (+/-0.028) for {'max_samples': 0.1, 'n_estimators': 10}
0.135 (+/-0.016) for {'max_samples': 0.1, 'n_estimators': 50}
0.138 (+/-0.008) for {'max_samples': 0.1, 'n_estimators': 100}
0.143 (+/-0.009) for {'max_samples': 0.1, 'n_estimators': 500}
0.129 (+/-0.023) for {'max_samples': 0.2, 'n_estimators': 10}
0.142 (+/-0.016) for {'max_samples': 0.2, 'n_estimators': 50}
0.149 (+/-0.023) for {'max_samples': 0.2, 'n_estimators': 100}
0.149 (+/-0.025) for {'max_samples': 0.2, 'n_estimators': 500}
0.138 (+/-0.019) for {'max_samples': 0.5, 'n_estimators': 10}
0.142 (+/-0.025) for {'max_samples': 0.5, 'n_estimators': 50}
0.145 (+/-0.023) for {'max_samples': 0.5, 'n_est

In [None]:
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf2 = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", BaggingClassifier(max_samples =0.2, n_estimators = 500, random_state = 42))]
)

In [None]:
clf2.fit(X_train, y_train)
print("model score: %.3f" % clf2.score(X_test, y_test))

model score: 0.254




---



#RandomForestClassifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf_RFC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier())]
)

clf_RFC.fit(X_train, y_train)
print("model score: %.3f" % clf_RFC.score(X_test, y_test))

model score: 0.236


In [None]:
X_train_new = preprocessor.fit_transform(X_train)

In [None]:
from sklearn.model_selection import GridSearchCV
tuned_parameters = [{'n_estimators': [50, 100, 250, 500],
                    'max_features': ['auto', 'sqrt', 'log2'],
                    'max_depth' : [4,5,6,7,8,9,10],
                    'criterion' :['gini', 'entropy']
                     }]
scores = ['recall']
for score in scores:
    
    print()
    print(f"Tuning hyperparameters for {score}")
    print()
    
    clf_RFC_CV = GridSearchCV(
        RandomForestClassifier(), tuned_parameters,
        scoring = f'{score}_macro'
    )
    clf_RFC_CV.fit(X_train_new, y_train)
    
    print("Best parameters:")
    print()
    print(clf_RFC_CV.best_params_)
    print()
    print("Grid scores:")
    means = clf_RFC_CV.cv_results_["mean_test_score"]
    stds = clf_RFC_CV.cv_results_["std_test_score"]
    for mean, std, params in zip(means, stds,
                                 clf_RFC_CV.cv_results_['params']):
        print(f"{mean:0.3f} (+/-{std*2:0.03f}) for {params}")


In [None]:
clf_RFC2 = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier(criterion= 'gini', max_depth= 8, max_features = 'sqrt', n_estimators = 200))]
)

clf_RFC2.fit(X_train, y_train)
print("model score: %.3f" % clf_RFC2.score(X_test, y_test))

model score: 0.266




---



#Gradient Boosted Decision Trees or GBDT

Decision Trees or DT's are one of the simplest yet powerful machine learning algorithms. However, they suffer from overfitting to the training data. By using meta learning techniques on Decision Trees, both testing accuracy as well as training time has improved substantially. While Bagging methods do yield significant improvement in the performance of Decision Trees, there are cases where Gradient Boosted Decision Trees ('GBDTs') using different boosting techniques achieve a better performance.

We take a look at the following boosting techniques
- Gradient Boosting in scikit-learn
- Adaboost
- XGBoost
- Ligh GBM
- CatBoost

#Gradient Boosting in scikit-learn

This is a generalization of boosting methods which can be used for both classification and regression. Let us understand the concept before implementing the module.

 At a high level, an ensemble of weak learners (whose prediction capability is slightly better than randomly guessing) is trained in a sequential manner. Every new iteration looks at the residual(prediction - ground truth) and tries to minimize the same. The outputs of the learners are combined in a weighted manner. The trajectory of the average loss function of such an ensemble follows a decreasing gradient path. It is also called as Forward Stagewise Additive Modeling


#Gradient Boosting - Algorithm

Gradient boosting is also called as Forward Stagewise Additive Modeling.  The pseudo code is given below
1. Initialize $f_o(x) = 0$
2. For $m = 1$ to $M$
- Compute
$(\beta_m,\gamma_m) = \underset{\beta,\gamma}{\mathrm{argmin}}$ $\sum_{i=1} ^{N} L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma))$
- Set $f_m(x) = f_{m-1}(x)+\beta_mb(x;\gamma_m)$

where
- $f_m()$ is the $m$th weak learner
- $m = 1$ to $M$ is the number of steps
- $L$ is  the loss function. The loss function is 
- $y_i$ is the true output
- $\beta_m$ is the step size or expanision co-efficients
- $\gamma$ is the set of parameters which in case of trees parameterizes the split variables and split points at the internal nodes and predictions at the terminal nodes.

Squared-error loss in case of Gradient Boosting is given by
\begin{aligned}
L(y,f(x)&= (y-f(x))^2\\
or, L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma)) &= (y_i - f_{m-1}(x_i)-\beta b(x_i;\gamma))^2\\
&= (r_{im} -\beta b(x_i;\gamma))^2
\end{aligned}

where $r_{im} = y_i - f_{m-1}(x_i)$ is the residual of the current model on the ith observation.

Squared-error loss is not ideally suited for classification problem section and therefore, we use 'deviance' (also known as bimomial negative log-likelihood or cross entropy). The queation of deciance is 
$-l(Y,f(X)) = log(1+e^{-2Yf(x)})$


#GradientBoostingClassifier

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf_GBC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", GradientBoostingClassifier())]
)

clf_GBC.fit(X_train, y_train)
print("model score: %.3f" % clf_GBC.score(X_test, y_test))

model score: 0.231


In [None]:
X_train_new = preprocessor.fit_transform(X_train)

In [None]:
from sklearn.model_selection import GridSearchCV
tuned_parameters = [{
    "learning_rate": [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2],
    # "min_samples_split": np.linspace(0.1, 0.5, 12),
    # "min_samples_leaf": np.linspace(0.1, 0.5, 12),
    "max_depth":[3,5,8],
    "max_features":["log2","sqrt"],
    # "subsample":[0.5, 0.618, 0.8, 0.85, 0.9, 0.95, 1.0],
    "n_estimators":[10]
}]


scores = ['recall']
for score in scores:
    
    print()
    print(f"Tuning hyperparameters for {score}")
    print()
    
    clf_GBC_CV = GridSearchCV(
        GradientBoostingClassifier(), tuned_parameters,
        scoring = f'{score}_macro'
    )
    clf_GBC_CV.fit(X_train_new, y_train)
    
    print("Best parameters:")
    print()
    print(clf_GBC_CV.best_params_)
    print()
    print("Grid scores:")
    means = clf_GBC_CV.cv_results_["mean_test_score"]
    stds = clf_GBC_CV.cv_results_["std_test_score"]
    for mean, std, params in zip(means, stds,
                                 clf_GBC_CV.cv_results_['params']):
        print(f"{mean:0.3f} (+/-{std*2:0.03f}) for {params}")


In [None]:
clf_GBC2 = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", GradientBoostingClassifier(learning_rate= 0.075, max_depth= 3, max_features='log2', n_estimators=10))]
)

clf_GBC2.fit(X_train, y_train)
print("model score: %.3f" % clf_GBC2.score(X_test, y_test))

model score: 0.275




---



#AdaBoostClassifier - Algorithm

AdaBoost was invented by Robert Schapire who is Parner Reasearcher at Microsoft Research.

AdaBoost Classifier works exactly as GBDT defined earlier. However, instead of using deviance, AdaBoost uses exponential loss

The steps involved are
- Assign weights to each of the sample/training data points as $w_i$. For the first learner model, assign the weights equally as $w_i = 1/N$ for $i =  1,2,...,N$
- Repeat until $M$ iterations i.e. for $t= 1$ to $M$:
  - Train a learner model $C_t(x)$ on the weighted training data points
  - Compute the prediction error i.e.
  $err_m$ = $\frac{1}{\sum_{i=1} ^{N} w_i}\sum_{i=1} ^{N} w_iI(y_i\neq C_m(x_i)$)
  - Compute the step-size or weight of the respective learner model i.e.
  $\alpha_m = log((1-err_m)/err_m)$ 
  - Recalculate weight of the observations as $w_i^m = w_i^{m-1}*exp[\alpha_m.I(y_i\neq C_m(x_i)]$
- Ouput Final Learner model as sum of all the models till the $Mth$ iteration i.e. $C(x) = \sum_{t=1} ^{M} \alpha_m.C_m(x_i)$

#AdaBoost - Modus Operandi
*source*

*- An Introduction to Machine Learning by Prof. Balaraman Ravindran*

*- The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman*

We look at the intuition behind AdaBoost. As we already know, AdaBoost aims to train individual classifiers $C_m(x) \in \{-1,1\}$ in such a fashion so that exponential loss is minimized. Therefore,

$(\beta_m,C_m) = \underset{\beta,C}{\mathrm{argmin}}$ $\sum_{i=1} ^{N} exp(-y_i(f_{m-1}(x_i)+\beta C(x_i))$

Now,
$w_i^{m} = exp(-y_i .f_{m-1}(x_i))$. Therefore, we can rewrite the above loss function as 
$(\beta_m,C_m) = \underset{\beta,C}{\mathrm{argmin}}$ $\sum_{i=1} ^{N} w_i^{m} . exp(-y_i \beta C(x_i))$

Now, the above equation can be re-written as 

$e^{-\beta} \sum\limits^{}_{y_1 = G(x_1)} w_i^{(m)} + e^{-\beta} \sum\limits^{}_{y_1 \neq G(x_1)} w_i^{(m)}$

Basically, we can ee that there are two bins in which one bin contains correctly classified points and another contains incorrectly classified points. The goal of AdaBoost or for that matter any boosting technique is to get more points move from the incorrectly classified bin to the correctly classified bin.

In [None]:
from sklearn.ensemble import AdaBoostClassifier

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf_ABC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", AdaBoostClassifier())]
)

clf_ABC.fit(X_train, y_train)
print("model score: %.3f" % clf_ABC.score(X_test, y_test))

model score: 0.194


In [None]:
X_train_new = preprocessor.fit_transform(X_train)

In [None]:
from sklearn.model_selection import GridSearchCV
tuned_parameters = [{
    'n_estimators': [1,50,100,150],    
    'learning_rate': [0.1,0.4,0.7,1]
}]


scores = ['recall']
for score in scores:
    
    print()
    print(f"Tuning hyperparameters for {score}")
    print()
    
    clf_ABC_CV = GridSearchCV(
        AdaBoostClassifier(), tuned_parameters,
        scoring = f'{score}_macro'
    )
    clf_ABC_CV.fit(X_train_new, y_train)
    
    print("Best parameters:")
    print()
    print(clf_ABC_CV.best_params_)
    print()
    print("Grid scores:")
    means = clf_ABC_CV.cv_results_["mean_test_score"]
    stds = clf_ABC_CV.cv_results_["std_test_score"]
    for mean, std, params in zip(means, stds,
                                 clf_ABC_CV.cv_results_['params']):
        print(f"{mean:0.3f} (+/-{std*2:0.03f}) for {params}")



Tuning hyperparameters for recall



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Best parameters:

{'learning_rate': 0.4, 'n_estimators': 50}

Grid scores:
0.073 (+/-0.009) for {'learning_rate': 0.1, 'n_estimators': 1}
0.070 (+/-0.006) for {'learning_rate': 0.1, 'n_estimators': 50}
0.092 (+/-0.011) for {'learning_rate': 0.1, 'n_estimators': 100}
0.094 (+/-0.013) for {'learning_rate': 0.1, 'n_estimators': 150}
0.073 (+/-0.009) for {'learning_rate': 0.4, 'n_estimators': 1}
0.100 (+/-0.014) for {'learning_rate': 0.4, 'n_estimators': 50}
0.097 (+/-0.023) for {'learning_rate': 0.4, 'n_estimators': 100}
0.098 (+/-0.019) for {'learning_rate': 0.4, 'n_estimators': 150}
0.073 (+/-0.009) for {'learning_rate': 0.7, 'n_estimators': 1}
0.093 (+/-0.009) for {'learning_rate': 0.7, 'n_estimators': 50}
0.091 (+/-0.006) for {'learning_rate': 0.7, 'n_estimators': 100}
0.091 (+/-0.007) for {'learning_rate': 0.7, 'n_estimators': 150}
0.073 (+/-0.009) for {'learning_rate': 1, 'n_estimators': 1}
0.081 (+/-0.012) for {'learning_rate': 1, 'n_estimators': 50}
0.081 (+/-0.012) for {'learning

In [None]:
clf_ABC2 = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", AdaBoostClassifier(learning_rate= 0.4, n_estimators= 50))]
)

clf_ABC2.fit(X_train, y_train)
print("model score: %.3f" % clf_ABC2.score(X_test, y_test))

model score: 0.246


#VotingClassifier

In [None]:
models = list()
models.append(('knn1', KNeighborsClassifier(n_neighbors=1)))
models.append(('knn3', KNeighborsClassifier(n_neighbors=3)))
models.append(('knn5', KNeighborsClassifier(n_neighbors=5)))
models.append(('knn7', KNeighborsClassifier(n_neighbors=7)))
models.append(('knn9', KNeighborsClassifier(n_neighbors=9)))


clf_VC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", VotingClassifier(estimators=models,voting='hard'))]
)

clf_VC.fit(X_train, y_train)
print("model score: %.3f" % clf_VC.score(X_test, y_test))


model score: 0.206


In [None]:
models = list()
models.append(('svm1', SVC(probability=True, kernel='poly', degree=1)))
models.append(('svm2', SVC(probability=True, kernel='poly', degree=2)))
models.append(('svm3', SVC(probability=True, kernel='poly', degree=3)))
models.append(('svm4', SVC(probability=True, kernel='poly', degree=4)))
models.append(('svm5', SVC(probability=True, kernel='poly', degree=5)))


clf_VC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", VotingClassifier(estimators=models,voting='hard'))]
)

clf_VC.fit(X_train, y_train)
print("model score: %.3f" % clf_VC.score(X_test, y_test))


model score: 0.257


In [None]:
models = list()
models.append(('cart1', DecisionTreeClassifier(max_depth=1)))
models.append(('cart2', DecisionTreeClassifier(max_depth=2)))
models.append(('cart3', DecisionTreeClassifier(max_depth=3)))
models.append(('cart4', DecisionTreeClassifier(max_depth=4)))
models.append(('cart5', DecisionTreeClassifier(max_depth=5)))

clf_VC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", VotingClassifier(estimators=models,voting='hard'))]
)

clf_VC.fit(X_train, y_train)
print("model score: %.3f" % clf_VC.score(X_test, y_test))


model score: 0.269


In [None]:
models = list()
models.append(('lr1', LogisticRegression(penalty = 'l1', solver='liblinear')))
models.append(('lr2', LogisticRegression(penalty = 'l2', solver='liblinear')))
models.append(('lr3', LogisticRegression(penalty = 'elasticnet', solver='saga', l1_ratio=0.5)))
models.append(('lr4', LogisticRegression(penalty = 'none', solver='saga')))

clf_VC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", VotingClassifier(estimators=models,voting='hard'))]
)

clf_VC.fit(X_train, y_train)
print("model score: %.3f" % clf_VC.score(X_test, y_test))





model score: 0.273




In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

lr = LogisticRegression()
dt = DecisionTreeClassifier()
svm= SVC(probability=True)
knn= KNeighborsClassifier()

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf_VC = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", VotingClassifier(estimators=[('lr', lr), ('dt', dt), ('svc', svm), ('knn',knn)],voting='hard'))]
)

clf_VC.fit(X_train, y_train)
print("model score: %.3f" % clf_VC.score(X_test, y_test))


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


model score: 0.269




---



---

