In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("auto.csv")
df.head()

Unnamed: 0,mpg,displ,hp,weight,accel,origin,size
0,18.0,250.0,88,3139,14.5,US,15.0
1,9.0,304.0,193,4732,18.5,US,20.0
2,36.1,91.0,60,1800,16.4,Asia,10.0
3,18.5,250.0,98,3525,19.0,US,15.0
4,34.3,97.0,78,2188,15.8,Europe,10.0


In [3]:
X = df.drop(["mpg", "origin"], axis = 1)

In [4]:
y = df["mpg"]

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error 

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=3)

In [7]:
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.tree import DecisionTreeRegressor

### The intuition behind stacking

Stacking ensembles leverage this idea of relay races. Instead of passing a baton, individual models pass their predictions together with the input features to the next model.

#### Stacking architecture

- Each individual model uses the same dataset and input features. These are the first-layer estimators. 

- Then, estimators pass their predictions as additional input features to the second-layer estimator. 

- So far, we have seen ensemble methods that use simple arithmetic operations like the mean or the mode as combiners. However, in Stacking, the combiner is itself a trainable model. 

- In addition, this combiner model has not only the predictions as input features, but also the original dataset. This allows it to determine which estimator is more accurate depending on the input features.

#### Combiner model as anchor 

- The combiner model plays a similar role to the anchor in the relay race. It is also the last team member and the one which provides the final predictions. 

- In order to be effective, the combiner model must display the same characteristics as the team anchor. It should learn to identify the strengths and weaknesses of the individual estimators. 

- It also defines tasks. Depending on the input features, it chooses which model provides the best prediction. 

- In addition, the combiner is itself a model, and therefore takes part in the team job of learning useful patterns to predict the target.

### Build your first stacked ensemble

Unfortunately, scikit-learn does not provide any implementations of stacking. This provides us an opportunity to build a stacked ensemble from scratch and really understand how it works. For this purpose, we'll make use of the scikit-learn estimators which you already know as a base for the implementation.

#### General Steps
There are 5 steps to building this. 

- Step one, prepare the dataset. 

- Step two, build the first-layer estimators. 

- Step three, append the predictions from the individual estimator to the original dataset. 

- Step four, build the second-layer meta estimator. And 

- step five, use the stacked ensemble model for the final predictions.

#### Step two: Build the first-layer estimators.

The second step is to instantiate the first-layer estimators and fit them to the training set. As an example, we can use the models which you built previously. First, we build a Linear regressor, Then, we build a Ridge regressor and a Decision Tree Regressor. 

Something important to remember is that all the estimators are trained with the same combination of input features and target.

In [8]:
### Build and fit the first-layer estimators
# Instantiate dt
dt = DecisionTreeRegressor(max_depth=3,random_state=500)

# Build and fit linear regression model
reg_lm = LinearRegression(normalize = True)

# Build and fit Ridge regression model
reg_ridge = Ridge(random_state =500)

In [9]:
dt.fit(X_train,y_train)
reg_lm.fit(X_train, y_train)
reg_ridge.fit(X_train, y_train)

Ridge(random_state=500)

#### Step 3: Append the predictions to the dataset

The third step is to append the predictions to the dataset. 

- We first calculate predictions on the training set with the first-layer estimators. 

- We combine those predictions to form a new DataFrame. 

- Finally, we concatenate the training features with the DataFrame using the concat function from Pandas.

In [10]:
# Predict with the first-layer estimators on X_train

### training set predictions
pred_lr_t = reg_lm.predict(X_train)
pred_ridge_t = reg_ridge.predict(X_train)
pred_dt_t = dt.predict(X_train)

In [11]:
## # Create a Pandas DataFrame with the predictions
train_pred_df = pd.DataFrame({'pred_lr_t':pred_lr_t, 'pred_ridge_t':pred_ridge_t , 'pred_dt_t':pred_dt_t}, index=X_train.index)
train_pred_df.head()

Unnamed: 0,pred_lr_t,pred_ridge_t,pred_dt_t
75,27.171174,27.169658,25.501613
14,16.95436,16.952343,15.101961
47,12.87492,12.875501,12.863636
46,28.123223,28.122714,25.501613
123,24.724055,24.725397,29.437209


In [12]:
# Concatenate X_train with the predictions DataFrame
X_train_2nd = pd.concat([X_train,train_pred_df], axis=1)
X_train_2nd.head()

Unnamed: 0,displ,hp,weight,accel,size,pred_lr_t,pred_ridge_t,pred_dt_t
75,140.0,92,2572,14.9,10.0,27.171174,27.169658,25.501613
14,400.0,150,3761,9.5,20.0,16.95436,16.952343,15.101961
47,318.0,150,4457,13.5,20.0,12.87492,12.875501,12.863636
46,119.0,97,2300,14.7,10.0,28.123223,28.122714,25.501613
123,121.0,67,2950,19.9,12.5,24.724055,24.725397,29.437209


#### 4. Build the second-layer meta estimator

The fourth step is to build the second layer estimator. Any model of your choice can be used here. The meta-estimator must be trained using the second training set.

In [13]:
# Build the second-layer meta estimator
clf_stack = DecisionTreeRegressor(max_depth=3,random_state=500)

## Train the model using the second training set
clf_stack.fit(X_train_2nd, y_train)

DecisionTreeRegressor(max_depth=3, random_state=500)

#### 5. Use the stacked ensemble for predictions

The fifth and final step is to use the stacked ensemble for predictions on new data, like the test set. It's similar to the third step. 

- First, we obtain the predictions from the first-layer estimators and join them in a DataFrame. 

- Then, we concatenate the features with the predictions DataFrame. 

- Finally, the actual predictions are obtained from the second-layer meta estimator.

In [14]:
## Use the stacked ensemble for predictions
# Predict with the first-layer estimators on X_test

# Calculate the predictions on the test set
pred_lr = reg_lm.predict(X_test)


# Calculate the predictions on the test set
pred_ridge = reg_ridge.predict(X_test)

# Compute y_pred
pred_dt = dt.predict(X_test)

In [15]:
# Create a Pandas DataFrame with the predictions
test_pred_df = pd.DataFrame({'pred_lr':pred_lr, 'pred_ridge':pred_ridge , 'pred_dt':pred_dt}, index=X_test.index)
test_pred_df.head()

Unnamed: 0,pred_lr,pred_ridge,pred_dt
352,21.542646,21.544295,20.93
16,31.26082,31.262056,32.903704
288,14.995364,14.996189,15.101961
281,29.131151,29.129484,25.501613
201,31.324604,31.32608,32.903704


In [16]:
# Concatenate X_test with the predictions DataFrame
X_test_2nd = pd.concat([X_test, test_pred_df],axis=1)
X_test_2nd.head()

Unnamed: 0,displ,hp,weight,accel,size,pred_lr,pred_ridge,pred_dt
352,163.0,125,3140,13.6,15.0,21.542646,21.544295,20.93
16,79.0,58,1825,18.6,10.0,31.26082,31.262056,32.903704
288,318.0,140,4080,13.7,20.0,14.995364,14.996189,15.101961
281,70.0,97,2330,13.5,7.5,29.131151,29.129484,25.501613
201,68.0,49,1867,19.5,10.0,31.324604,31.32608,32.903704


In [17]:
# Obtain the final predictions from the second-layer estimator
pred_stack = clf_stack.predict(X_test_2nd)

In [18]:
# Evaluate the performance using the RMSE
rmse_stack1 = np.sqrt(mean_squared_error(y_test, pred_stack))

### Let’s mlxtend it!
In this lesson, you'll be introduced to the Mlxtend library, which allows you to easily build stacking ensembles.

Mlxtend stands for Machine Learning Extensions. It is a third-party Python library which contains many utilities and tools for machine learning and Data Science tasks, including feature selection, ensemble methods, visualization, and model evaluation. It has an intuitive API, and works well with scikit-learn estimators, which is very convenient for our purpose.

#### Stacking implementation from mlxtend

Mlxtend uses a slightly different stacking architecture to the one we've seen previously. Similar to the architecture we already know, 

the first-layer estimators are trained on the complete feature set. However, it uses only the predictions as the input features for the second-layer meta-estimator, which makes it lighter and faster for both training and predicting. 

An additional important property of the implementation, is that the second-layer estimators can be trained using the predicted class-labels or class probabilities as input features. The use of the class probabilities may allow you to solve more complex problems.

#### StackingRegressor with mlxtend

- For Stacking regressors with Mlxtend, First, you import StackingRegressor from the mlxtend dot regressor module. 

- Then, you instantiate the 1st-layer regressors which you want to use without training them, as the Stacking regressor will take care of that. In the same way, you must instantiate the second-layer meta classifier of your choice. With this, you are ready to build the Stacking regressor. 

- The first parameter it expects is regressors, which is a list of the first-layer regressors. The second parameter is meta_regressor, which is the meta-regressor you instantiated previously. 

- An additional parameter which you may be interested in is use_features_in_secondary. This allows you to train the model on both the original input features as well as the individual predictions. By default it is false. 

- After instantiating the Stacking regressors, you can call the fit and predict methods just like you would for a scikit-learn estimator.

In [19]:
from mlxtend.regressor import StackingRegressor

In [20]:
# Instantiate the 1st-layer regressors
reg_dt = DecisionTreeRegressor(max_depth=3,random_state=500)
reg_lr = LinearRegression(normalize=True)
reg_ridge = Ridge(random_state = 500)

# Instantiate the 2nd-layer regressor
reg_meta = Ridge(random_state = 500)

In [21]:
# Build the Stacking regressor
reg_stack = StackingRegressor(regressors=[reg_dt,reg_lr,reg_ridge], meta_regressor=reg_meta)
reg_stack.fit(X_train,y_train)

StackingRegressor(meta_regressor=Ridge(random_state=500),
                  regressors=[DecisionTreeRegressor(max_depth=3,
                                                    random_state=500),
                              LinearRegression(normalize=True),
                              Ridge(random_state=500)])

In [22]:
# Obtain the final predictions from the second-layer estimator
pred_stack = reg_stack.predict(X_test)

# Evaluate the performance using the RMSE
rmse_stack = np.sqrt(mean_squared_error(y_test, pred_stack))


In [23]:
# Evaluate the performance using the RMSE
rmse_lr = np.sqrt(mean_squared_error(y_test,pred_lr))
print('RMSE Linear Regression: {:.3f}'.format(rmse_lr))

# Evaluate the performance using the RMSE
rmse_ridge = np.sqrt(mean_squared_error(y_test,pred_ridge))
print('RMSE Ridge Regression: {:.3f}'.format(rmse_ridge))

# Evaluate the performance using the RMSE
rmse_dt = np.sqrt(mean_squared_error(y_test, pred_dt))
print('RMSE Decision Tree: {:.3f}'.format(rmse_dt))

# Evaluate the performance on the test set using the MAE metric
print('RMSE after manual Stacking: {:.3f}'.format(rmse_stack1))

# Evaluate the performance on the test set using the MAE metric
print('RMSE after Stacking with Mlxtend: {:.3f}'.format(rmse_stack))

RMSE Linear Regression: 5.007
RMSE Ridge Regression: 5.006
RMSE Decision Tree: 4.365
RMSE after manual Stacking: 4.917
RMSE after Stacking with Mlxtend: 4.356
