---

## Week 8: Combining Different Models for Ensemble Learning   


### Unit Convenor & Lecturer    
[George Milunovich](https://www.georgemilunovich.com)  
[george.milunovich@mq.edu.au](mailto:george.milunovich@mq.edu.au)

### References     

1. Python Machine Learning 3rd Edition by Raschka & Mirjalili - Chapter 7
2. Various open-source material

### Overview    


- Learning with Ensembles
    - Using the Majority Voting Principle to Make Predictions
- Bagging â€“ Building an Ensemble of Classifiers from Bootstrap Samples
    - Applying bagging to classify examples in the Wine dataset
- Adaptive Boosting (AdaBoost) - Leveraging Weak Learners
    - Applying AdaBoost to Classify Examples in the Wine Dataset
    
---

# Learning with Ensembles   

- In predictive analytics, **ensembles** refer to methods that combine multiple predictive models to improve accuracy and reduce the likelihood of an incorrect prediction.

<hr style="width:35%;margin-left:0;">
 

<img src="images/07_02.png" alt="Drawing" style="width: 400px;"/>  

Ensemble Methods
- Combine different classifiers into a meta-classifier that has a better predictive performance than any of the individual classifiers  
- **Majority voting** principle  
    - Select class label that has been predicted by the majority of classifiers (received at least 50% of votes)  
- **Plurality voting** - multi-class settings: select the class that has received the most votes  
- Ensemble methods have the ability to improve **bias** and/or **variance** depending on the method used
    - Some methods improve bias, some variance, and some both
    - These results hold in general and do not guarantee that ensemble methods will improve classification ability in every application


<img src="images/07_01.png" alt="Drawing" style="width: 400px;"/>


<hr style="width:35%;margin-left:0;">

### Ensemble Learning    

1. Start with $m$ classifiers $(C_1, C_2, \dots,C_m)$  
    - E.g. $C_1$ = Decision Tree, $C_2$ = Support Vector Machine, etc.  
2. To predict a class label $(\hat{y})$ via majority (plurality) get predictions from each classifier $C_j$ and combine them  
    - $\hat{y}=\text{mode}\left(C_1(x), C_2(x),\dots,C_m(x)\right)$  
    - Remember: **mode** is the most frequent observation  

<hr style="width:35%;margin-left:0;">

### Majority Voting in scikit-learn    

- In scikit-learn we use `from sklearn.ensemble import VotingClassifier`  
    - [https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html)  

<hr style="width:35%;margin-left:0;">



### Ensemble Learning with Weights  

**Learning with Weights**

Sometimes different classification algorithms can be given individual weights for confidence  
- Here the weights will be denoted $w_j$ ($j=1,2,\dots,m$)
- The weights should sum up to 1
- E.g. $w_1 = 0.3, w_2 = 0.4, w_3 = 0.3 \rightarrow \sum_i w_i = 1$
- Some algorithms have greater importance in voting than others 
- The weights could be allocated based on previous evidence, experience, etc

**Hard Voting**  
- If `voting=hard` (voting parameter set to hard value) use predicted class labels for majority rule voting   
- Prediction $\hat{y}=$ class which has the highest sum of weights across all different classifiers $(C_1, C_2, \dots,C_m)$    
- Example: 3 Classifiers
    - Weights given to each classifier: $w_1=0.2, w_2=0.2, w_3=0.6$,   
    - Two classes 0 & 1: $C_1\rightarrow0,C_2\rightarrow0, C_3\rightarrow1$
    - Sum the weights of the classifiers voting for each class
    - $\sum w_j = 0.4$ (for Class 0)  
    - $\sum w_j = 0.6$ (for Class 1)  
    - Sum of the weights of classifiers voting for Class 1 is greater -> $\hat{y}=1$  
  
**Soft Voting**  
- If `voting=soft` predict the class label which has the greatest weighted average probability - weighted sum of predicted probabilities is greatest  
- $\hat{y}=\text{arg max}_{i\in 0,1}\sum_{j=1}^mw_jp_{ij}$ class for which the sum of $w_jp_{ij}$ across all different classifiers $(C_1, C_2, \dots,C_m)$ is greatest  
- Example: 3 Classifiers
    - $w_1=0.2, w_2=0.2, w_3=0.6$,   
    - Classes 0 & 1 for which the predicted probabilities are as follows: $C_1\rightarrow[0.9, 0.1],C_2\rightarrow[0.8,0.2], C_3\rightarrow[0.4,0.6]$  
    - weighted average $p(\text{class 0}|x)=0.2\times0.9 + 0.2\times0.8 + 0.6\times0.4=0.58$  
    - weighted average $p(\text{class 1}|x)=0.2\times0.1 + 0.2\times0.2 + 0.6\times0.6=0.42$  
    - Prediction $\hat{y}=\text{arg max}_{i\in 0,1}[p(\text{class 0}|x),p(\text{class 1}|x)] =0$  
        


<hr style="width:35%;margin-left:0;">   

## Using the Majority Voting Principle to Make Predictions {-}

<span style='background:orange'>  **------------------ Exercise 1 ------------------**  
1. Import Iris dataset from sklearn's datasets
2. Compute accuracy for Perceptron, SVClassifier and DecisionTreeClassifier using a 10-fold cross-validation
3. Combine the three classifiers into a VotingClassifier with hard voting and compute its accuracy via cross-validation on training data
4. Plot decision regions using training data  
5. Print all parameters of voting_classifier
6. Fine tune some parameters of the VotingClassifier  
7. Print best_params_ and best_score_ of VotingClassifier
8. Export the optimized VotingClassifier as final_mv_classifier using best_estimator_
9. Export the optimized VotingClassifier as final_mv_classifier using `best_estimator_`  
- Make a prediction of y_test
- Compute accuracy for y_test

    


<hr style="width:35%;margin-left:0;">   

1. Import Iris dataset from sklearn's datasets
    - Choose only iris-versicolor & iris-virginica labals -> rows 50 - end
    - Set X as sepal width and petal length -> columns 1 & 2
    - Generate training and test (30%) datasets


```
# ------------------------------------------------------------

from sklearn import datasets

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

X, y = iris.data[50:, [1, 2]], iris.target[50:]

le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test =\
       train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

y_train
```

<hr style="width:35%;margin-left:0;">   

2. Compute accuracy for Perceptron, SVClassifier and DecisionTreeClassifier using a 10-fold cross-validation

```
import numpy as np
from sklearn.preprocessing import StandardScaler

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import Perceptron
from sklearn.svm import SVC

from sklearn.pipeline import Pipeline

from sklearn.model_selection import cross_val_score


clf1 = Perceptron()
pipe1 = Pipeline([['sc', StandardScaler()],
                  ['clf', clf1]])

clf2 = SVC(kernel='poly', probability=True)
pipe2 = Pipeline([['sc', StandardScaler()],
                  ['clf', clf2]])

clf3 = DecisionTreeClassifier()

clf_labels = ['Perceptron', 'Support Vector Classifier', 'Decision Tree']

print('10-fold cross validation:\n')
for clf, label in zip([pipe1, pipe2, clf3], clf_labels):
    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='accuracy')
    print(f'Accuracy:{scores.mean():.3f} (+/- {scores.std():.3f})', label)
```


<hr style="width:35%;margin-left:0;">   

3. Combine the three classifiers into a VotingClassifier with hard voting and compute its accuracy via cross-validation on training data

```
from sklearn.ensemble import VotingClassifier


voting_clf = VotingClassifier(estimators=[('perceptron_pipe', pipe1), ('supportvector_pipe', pipe2), ('Decision Tree', clf3)], voting='hard')


clf_labels += ['Majority voting']

# print(clf_labels)

all_clf = [pipe1, pipe2, clf3, voting_clf]

for clf, label in zip(all_clf, clf_labels):

    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='accuracy')
                             # scoring='roc_auc')


    print(f'Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}, {label}')
    
# print(clf, label)
# print(clf.get_params())
```

<hr style="width:35%;margin-left:0;">  

4. Plot decision regions using training data

```
import matplotlib.pyplot as plt

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)

from itertools import product

x_min = X_train_std[:, 0].min() - 1
x_max = X_train_std[:, 0].max() + 1
y_min = X_train_std[:, 1].min() - 1
y_max = X_train_std[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(nrows=2, ncols=2, 
                        sharex='col', 
                        sharey='row', 
                        figsize=(7, 5))

for idx, clf, tt in zip(product([0, 1], [0, 1]),
                        all_clf, clf_labels):
    clf.fit(X_train_std, y_train)
    
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.3)
    
    axarr[idx[0], idx[1]].scatter(X_train_std[y_train==0, 0], 
                                  X_train_std[y_train==0, 1], 
                                  c='blue', 
                                  marker='^',
                                  s=50)
    
    axarr[idx[0], idx[1]].scatter(X_train_std[y_train==1, 0], 
                                  X_train_std[y_train==1, 1], 
                                  c='green', 
                                  marker='o',
                                  s=50)
    
    axarr[idx[0], idx[1]].set_title(tt)

plt.text(-3.5, -5., 
         s='Sepal width [standardized]', 
         ha='center', va='center', fontsize=12)
plt.text(-12.5, 4.5, 
         s='Petal length [standardized]', 
         ha='center', va='center', 
         fontsize=12, rotation=90)

#plt.savefig('images/07_05', dpi=300)
plt.show()
```


5. Print all parameters of `voting_classifier`

```
voting_clf.get_params()
```

<hr style="width:35%;margin-left:0;">  

6. **Fine tune the following hyperparameters of the VotingClassifier**
- `Decision_Tree2__max_depth` 
- `supportvector_pipe__clf__C`  
- `perceptron_pipe__clf__l1_ratio` 

**Note**  
By default, the default setting for `refit` in `GridSearchCV` is `True` (i.e., `GridSeachCV(..., refit=True)`), which means that we can use the fitted `GridSearchCV` estimator to make predictions via the `predict` method.

```
from sklearn.model_selection import GridSearchCV

params = {'Decision Tree__max_depth': [1, 2, 3, 5, 10],
          'supportvector_pipe__clf__C': [0.001, 0.1, 10, 100.0],
          'perceptron_pipe__clf__l1_ratio': [0.01, 0.15, 0.5, 0.75, 0.99]}



grid = GridSearchCV(estimator=voting_clf,
                    param_grid=params,
                    cv=10,
                    scoring='accuracy')

grid.fit(X_train, y_train)

# for each combination of parameters do cross validation across 10 folds and print result
for i in range(len(grid.cv_results_['mean_test_score'])):
    print(f"{grid.cv_results_['mean_test_score'][i]:.4} +/- {grid.cv_results_['std_test_score'][i]:.2f}, {grid.cv_results_['params'][i]}")
```

<hr style="width:35%;margin-left:0;"> 

7. Print `best_params_` and `best_score_` of VotingClassifier

```
print(f'Best parameters: {grid.best_params_}')
print(f'Accuracy: {grid.best_score_:.3f}')
```
      

<hr style="width:35%;margin-left:0;">   

8. Export the optimized VotingClassifier as final_mv_classifier using `best_estimator_`
- Make a prediction of `y_test`
- Compute accuracy for `y_test`

```
final_voting_classifier = grid.best_estimator_

print('Predictions of y_test:', final_voting_classifier.predict(X_test))
print('Accuracy on test data:', final_voting_classifier.score(X_test, y_test))
```

<br>
<br>

---

# Bagging - Building an Ensemble of Classifiers from Bootstrap Samples {-}

- In Bagging (*bootstrap aggregating*) we still create ensembles of classifiers but **do not use the same training dataset** for each classifier 
- Create bootstrap samples (random samples with replacement) from the initial dataset
- For each bootstrap sample, a separate model is trained.
    - These models are of the **same type** but are trained independently of each other.
    - Typically use **unpruned decision trees** as base classifiers
    - Because the data in each bootstrap sample varies, the resulting models are different, capturing different patterns from the training data.
- Reduces Overfitting: Bagging can reduce the risk of overfitting to the training data without significantly increasing bias.


<img src="images/07_06.png" alt="Drawing" style="width: 400px;"/>

<hr style="width:35%;margin-left:0;">   

## Bagging Example {-}

- Say we have only 7 training examples
    - Each round of bagging will sample randomly **with replacement** from the 7 instances
        - Random samples are denoted as **Bagging rounds**
    - Each bootstrap sample is used to fit a classifier $C_j$ which is typically **an unpruned decision tree**
    - Once the individual classifiers are fitted to the bootstrap samples, the predictions are combined using majority voting
    

<img src="images/07_07.png" alt="Drawing" style="width: 400px;"/>

<hr style="width:35%;margin-left:0;">   

### Bagging in scikit-learn {-}
- In scikit-learn use `from sklearn.ensemble import BaggingClassifier`
    - [https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
    - Will need to specify `base_estimator` parameter which is usually a DecisionTree as well as `n_estimators`

<hr style="width:35%;margin-left:0;">   

## Applying Bagging to Classify Examples in the Wine Dataset {-}

<span style='background:orange'>  **------------------ Exercise 2 ------------------**  
 
1. Import Wine dataset
    - Set y = 'Class label' column
    - Set X = 'Alcohol' and 'OD280/OD315 of diluted wines' columns
2. Encode class labels into binary format & split data into training and test (20%) datasets
3. Initialize `BaggingClassifier` with 500 Decision Trees, use `entropy` as criterion
4. Compute `accuracy_score` on the training and test datasets for both single DecisionTree and BaggingClassifier 
5. Plot decision regions and the training dataset

<hr style="width:35%;margin-left:0;">   

1. Import Wine dataset
    - Set y = 'Class label' column
    - Set X = 'Alcohol' and 'OD280/OD315 of diluted wines' columns
    
```
import pandas as pd

df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data')

df_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash',
                   'Alcalinity of ash', 'Magnesium', 'Total phenols',
                   'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins',
                   'Color intensity', 'Hue', 'OD280/OD315 of diluted wines',
                   'Proline']


# if the Wine dataset is temporarily unavailable from the
# UCI machine learning repository, un-comment the following line
# of code to load the dataset from a local path:

# df_wine = pd.read_csv('wine.data', header=None)

# ---- drop 1 class ------

df_wine = df_wine[df_wine['Class label'] != 1]  # currently 3 labels: 1, 2, 3. Drop label 1

y = df_wine['Class label'].values
X = df_wine[['Alcohol', 'OD280/OD315 of diluted wines']].values

y
```

<hr style="width:35%;margin-left:0;">   

2. Encode class labels into binary format & split data into training and test (20%) datasets

```
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split


le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)
```

<hr style="width:35%;margin-left:0;">   

3. Initialize `BaggingClassifier` with 500 Decision Trees, use `entropy` as criterion

```
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(criterion='entropy', 
                              max_depth=None,
                              random_state=1)

bag = BaggingClassifier(base_estimator=tree,
                        n_estimators=500, 
                        n_jobs=1, 
                        random_state=1)
```

<hr style="width:35%;margin-left:0;">   

4. Compute `accuracy_score` on the training and test datasets for both single DecisionTree and BaggingClassifier 

```
tree = tree.fit(X_train, y_train)

tree_train = tree.score(X_train, y_train) 
tree_test = tree.score(X_test, y_test)
print(f'Decision tree train/test accuracies {tree_train:.3f}/{tree_test:.3f}')

bag = bag.fit(X_train, y_train)
bag_train = bag.score(X_train, y_train) 
bag_test = bag.score(X_test, y_test) 
print(f'Bagging train/test accuracies {bag_train:.3f}/{bag_test:.3f}')

```

<hr style="width:35%;margin-left:0;">   

5. Plot decision regions and the training dataset

```
import numpy as np
import matplotlib.pyplot as plt

x_min = X_train[:, 0].min() - 1
x_max = X_train[:, 0].max() + 1
y_min = X_train[:, 1].min() - 1
y_max = X_train[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(nrows=1, ncols=2, 
                        sharex='col', 
                        sharey='row', 
                        figsize=(8, 3))


for idx, clf, tt in zip([0, 1],
                        [tree, bag],
                        ['Decision tree', 'Bagging']):
    clf.fit(X_train, y_train)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx].contourf(xx, yy, Z, alpha=0.3)
    axarr[idx].scatter(X_train[y_train == 0, 0],
                       X_train[y_train == 0, 1],
                       c='blue', marker='^')

    axarr[idx].scatter(X_train[y_train == 1, 0],
                       X_train[y_train == 1, 1],
                       c='green', marker='o')

    axarr[idx].set_title(tt)

axarr[0].set_ylabel('Alcohol', fontsize=12)

plt.tight_layout()
plt.text(0, -0.2,
         s='OD280/OD315 of diluted wines',
         ha='center',
         va='center',
         fontsize=12,
         transform=axarr[1].transAxes)

#plt.savefig('images/07_08.png', dpi=300, bbox_inches='tight')
plt.show()
```

**How to see what's inside an object**

- `dir()` function
    - Without arguments, returns the list of names in the current local scope. 
    - With an argument, attempt to return a list of valid attributes for that object.

```
dir(tree)

tree.get_depth()

```

<br>
<br>

---


# Adaptive Boosting (AdaBoost) - Leveraging Weak Learners {-}

Boosting is an ensemble technique that works by sequentially combining multiple weak learners (models that are only slightly better than random guessing) into a strong learner. 

The core idea behind boosting is to correct the mistakes of previous learners in the sequence by giving more weight to the training instances that were misclassified, thereby focusing subsequent models on the harder cases.

**Boosting**: Focus on training examples that are hard to classify  
- Boosting is said to decrease bias when compared to bagging  
- In practice boosting models tend to overfit (high variance)  
- Ensemble (the set of classifiers) consists of very simple classifiers (weak learners)    
    - E.g. **decision tree stumps** (one level decision trees)  
- Let weak learners learn from misclassified training examples to improve the performance of the ensemble  


**Sensitivity to Noisy Data**: Because AdaBoost focuses on instances that are hard to classify, it can be sensitive to noise and outliers, as these can receive disproportionately high weights.

<hr style="width:35%;margin-left:0;"> 
 


## AdaBoost Method {-}

- AdaBoost (a special type of boosting) will use complete training dataset to train weak learners  
    - Reweight training examples in each iteration to build a strong classifier that learns from the mistakes of the previous weak learners in the ensemble  

1. Fig. 1 - train a decision stump to classify two classes  
    - Weak learner misclassifies two examples (circles)  
2. Fig. 2 - misclassified circles from above are given more weight while every other example given lower weight  
    - Fit decision stump 2 to this data -> more focused on the examples which are hard to classify  
    - Weak learner 2 misclassifies 3 different examples (circles)  
3. Fig. 3 - misclassified examples from Fig 2 given even greater weight  
    - Decision stump 3 fitted to this data  
4. Continue in this fashion until a desired number of boosting rounds is reached  
5. Fig. 4 - combine weak learners trained on different reweighted training subsets by a weighted majority vote  
    - Assumes 3 rounds of boosting  
    
<img src="images/07_09.png" alt="Drawing" style="width: 400px;"/>

<hr style="width:35%;margin-left:0;"> 

## Applying AdaBoost using scikit-learn {-} 
- [https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)
- Use `from sklearn.ensemble import AdaBoostClassifier`  

- Need to specify parameters  
    - `estimator` - The base estimator from which the boosted ensemble is built, usually set to **decision tree**  
    - `n_estimators` - The maximum number of estimators at which boosting is terminated  
    - `learning_rate` - Learning rate shrinks the contribution of each classifier by learning_rate. There is a trade-off between learning_rate and n_estimators  
    
    

<hr style="width:35%;margin-left:0;"> 

## Applying AdaBoost to Classify Examples in the Wine Dataset {-}

<span style='background:orange'>  **------------------ Exercise 3 ------------------**  
 
1. Initialize an AdaBoostClassifier  
    - base_estimator = DecisionTree  
    - n_estimators = 500  
    - learning_rate = 0.1  
    
2. Fit and compute accuracy_score on the training and test datasets for both single DecisionTree and AdaBoostClassifier   
3. Plot decision regions and the training dataset  

<hr style="width:35%;margin-left:0;"> 
    
```
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(criterion='entropy', 
                              max_depth=1,
                              random_state=1)

ada = AdaBoostClassifier(estimator=tree,
                         n_estimators=500, 
                         learning_rate=0.1,
                         random_state=1)
```

<hr style="width:35%;margin-left:0;"> 

2. Fit and compute accuracy_score on the training and test datasets for both single DecisionTree and AdaBoostClassifier 

```
tree = tree.fit(X_train, y_train)

tree_train_accuracy = tree.score(X_train, y_train)
tree_test_accuracy = tree.score(X_test, y_test)

print(f'Decision tree train/test accuracies {tree_train_accuracy:.3f}/{tree_test_accuracy:.3f}')

ada = ada.fit(X_train, y_train)

ada_train_accuracy = ada.score(X_train, y_train)
ada_test_accuracy = ada.score(X_test, y_test)

print(f'AdaBoost train/test accuracies {ada_train_accuracy:.3f}/{ada_test_accuracy:.3f}')
```

<hr style="width:35%;margin-left:0;"> 

3. Plot decision regions and the training dataset

```
import matplotlib.pyplot as plt
import numpy as np

x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(1, 2, sharex='col', sharey='row', figsize=(8, 3))


for idx, clf, tt in zip([0, 1],
                        [tree, ada],
                        ['Decision tree', 'AdaBoost']):
    clf.fit(X_train, y_train)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx].contourf(xx, yy, Z, alpha=0.3)
    axarr[idx].scatter(X_train[y_train == 0, 0],
                       X_train[y_train == 0, 1],
                       c='blue', marker='^')
    axarr[idx].scatter(X_train[y_train == 1, 0],
                       X_train[y_train == 1, 1],
                       c='green', marker='o')
    axarr[idx].set_title(tt)

axarr[0].set_ylabel('Alcohol', fontsize=12)

plt.tight_layout()
plt.text(0, -0.2,
         s='OD280/OD315 of diluted wines',
         ha='center',
         va='center',
         fontsize=12,
         transform=axarr[1].transAxes)

#plt.savefig('images/07_11.png', dpi=300, bbox_inches='tight')
plt.show()
```