# 1 Ensemble Learning

This task will be some hands-on programming of two different ensemble learning techniques, i.e. bagging and boosting. More specifically, we will use the [Bagging Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html) as well as [AdaBoost Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) provided by the scikit-learn library.

As you are already familiar with loading datasets, splitting into training and test data, fitting a classifier etc., we will this time just give you hints about what steps to do. Also, we will give you some libraries to import, so that you have a good impression already about what you might need for this task. You can use additional imports of course, but please stick with the libraries we have used in this course so far!

In [5]:
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn import metrics
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier

# Additional imports here, if needed
from sklearn.datasets import load_digits

np.random.seed(42)

## 1.1 Load and Split **Digits** Dataset

### Task 1.1.1 Load dataset 

Load the [Digits Toy Dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset) provided by Scikit-Learn and save it in the variable given below. Make sure that **all 10 digits** are included in the dataset.

In [15]:
digits = load_digits()

print(digits.DESCR)

[0 1 2 ... 8 9 8]


### Task 1.1.2 Save feature and target data

Save the feature data from the dataset (i.e. the vectors representing the digits) and the respective target labels in two different variables *X* and *y*.

In [19]:
X = digits.data
y = digits.target

### Task 1.1.3 Split train and test data

Split all data into train and test set, denoting your training set *(X_train, y_train)* and your test set *(X_test, y_test)*. We want to have 70% of the samples for training and 30% of the samples for testing.

In [23]:
batchsize = int(len(X) * 0.7)
X_train = X[:batchsize]
y_train = y[:batchsize]
X_test = X[batchsize:]
y_test = y[batchsize:]

## Task 1.2 Bagging Classifier

### Task 1.2.1 Train Classifier

Create a **Bagging Classifier** object and train it on the given training data you got from task 1.1.3. You can play with different sets of parameters for the classifier (such as number of estimators etc.), but please keep the original base estimator (i.e. decision tree).

In [27]:
bc = BaggingClassifier(n_estimators=50, max_samples=20, n_jobs=8)

bc_model = bc.fit(X_train, y_train)

### Task 1.2.2 Evaluate Classifier

Now it's time to see how the classifier performs. Make the prediction of target labels *y_pred* based on the test samples and print the accuracy of the model.

In [28]:
y_pred = bc_model.predict(X_test)

bc_accuracy = np.sum(y_pred == y_test) / len(y_pred)

print("Accuracy of Bagging Classifier on Digits Dataset: %.4f" % bc_accuracy)

Accuracy of Bagging Classifier on Digits Dataset: 0.7556


Great, you're done! Now it's time to do the same few steps using the AdaBoost classifier.

## Task 1.3 AdaBoost Classifier

### Task 1.3.1 Train Classifier

Create an **AdaBoost Classifier** object and train it on the given training data you got from task 1.1.3. You can play with different sets of parameters for the classifier (such as number of estimators, learning rate etc.), but please keep the original base estimator (i.e. decision tree).

In [76]:
abc = AdaBoostClassifier(n_estimators=10, learning_rate=5., random_state=42)

abc_model = abc.fit(X_train, y_train)

### Task 1.3.2 Evaluate Classifier

Once again it's time to see how the classifier performs. Make the prediction of target labels *y_pred* based on the test samples and print the accuracy of the model.

In [77]:
y_pred = abc_model.predict(X_test)

abc_accuracy = np.sum(y_pred == y_test) / len(y_pred)

print("Accuracy of AdaBoost Classifier on Digits Dataset: %.4f" % abc_accuracy)

Accuracy of AdaBoost Classifier on Digits Dataset: 0.3685


_________________________

You're done! Are the accuracy results as you had expected? If not, feel free to adjust some parameters to maximize your outcome. Either way, you might have just written your very first ensemble classifiers in python!