# Improving Regression Models with the Voting Regressor Ensemble 🏡

**Ensemble Learning** is a powerful technique that combines the predictions from multiple machine learning models to produce a more robust and accurate final prediction. While the `VotingClassifier` is used for classification tasks, its counterpart for regression is the **`VotingRegressor`**.

### How Does a Voting Regressor Work?

A `VotingRegressor` is a simple yet effective ensemble method. It works by training several different regression models (e.g., a Linear Regression, a Ridge regressor, and a Decision Tree regressor) on the same dataset. To make a final prediction for a new data point, it simply **averages the individual predictions** from each of its base models.

This averaging process can help to smooth out the errors and biases of the individual models, often resulting in a more stable and accurate overall prediction.

---

## 1. Predicting Home Prices

We will use a home prices dataset to predict the `price_lakhs` based on the `area_sqr_ft` and the number of `bedrooms`.


In [9]:
import pandas as pd

df = pd.read_csv("regression_home_prices.csv")
df.head()

Unnamed: 0,area_sqr_ft,price_lakhs,bedrooms
0,656.0,39.0,2
1,1260.0,83.2,2
2,1057.0,86.6,3
3,1259.0,59.0,2
4,1800.0,140.0,3


First, we split our data into training and testing sets.

In [10]:
from sklearn.model_selection import train_test_split

X = df[["area_sqr_ft","bedrooms"]]
y = df["price_lakhs"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

## 2. Baseline Model: A Single Linear Regressor

Before building our ensemble, let's train a single `LinearRegression` model to establish a baseline R-squared ($R^2$) score.


In [11]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.8874887686858771

Our baseline model achieves an R² score of **88.7%**.

## 3. Combining Models with the Voting Regressor

Now, let's create an ensemble using three different types of regression models:
1.  `LinearRegression`
2.  `Ridge` (a regularized linear model)
3.  `DecisionTreeRegressor`

The `VotingRegressor` will combine the predictions of these three models.

In [12]:
from sklearn.ensemble import VotingRegressor
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeRegressor

# Create the individual models
lin_reg = LinearRegression()
ridge_reg = Ridge(alpha=1.0)
dt_reg = DecisionTreeRegressor()

# Create the Voting Regressor
vr = VotingRegressor(estimators=[
    ('lr', lin_reg),
    ('rr', ridge_reg),
    ('dr', dt_reg)
])

vr.fit(X_train, y_train)
vr.score(X_test, y_test)

0.870340859079508

**Result:** In this case, the ensemble's R² score is **87.0%**, which is slightly lower than our single best model (Linear Regression). This can happen, especially when the base models are not very diverse or when one model is significantly better than the others on a particular dataset. The performance of the `VotingRegressor` can often be improved by tuning the hyperparameters of its base models (e.g., the `alpha` in `Ridge` or `max_depth` in `DecisionTreeRegressor`).


## 4. A Look at the Predictions

To better understand how the ensemble works, let's compare the predictions of the individual models to the final prediction of the `VotingRegressor` for the first few test samples.


In [13]:
# Train individual models to see their separate predictions
lin_reg.fit(X_train, y_train)
ridge_reg.fit(X_train, y_train)
dt_reg.fit(X_train, y_train)

# Create a comparison DataFrame
df_pred = pd.DataFrame({
    'Actual': y_test[:3].values,
    'Linear': lin_reg.predict(X_test[:3]),
    'Ridge': ridge_reg.predict(X_test[:3]),
    'Decision Tree': dt_reg.predict(X_test[:3]),
    'Voting Regressor': vr.predict(X_test[:3])
})

df_pred

Unnamed: 0,Actual,Linear,Ridge,Decision Tree,Voting Regressor
0,68.0,79.1594,79.505063,82.0,80.221488
1,80.1,70.964695,71.280219,68.0,70.081638
2,69.0,63.514964,63.803088,68.0,65.106017


As you can see, the prediction from the `VotingRegressor` is the average of the predictions from the three individual models. For the first row: `(79.16 + 79.16 + 82.34) / 3 ≈ 80.22`.


## 5. Conclusion

The `VotingRegressor` is a straightforward and powerful technique for creating a more robust regression model. By averaging the outputs of several different models, it can often smooth out individual errors and lead to more stable predictions. While it didn't outperform the best single model in this specific instance with default parameters, it remains a valuable tool in the ensemble learning toolkit.