## Voting Regressor

Voting Regressor is an ensemble meta-estimator in scikit-learn that combines the predictions of multiple base regressor models to make a final prediction. It aggregates the predictions from individual regressor models by averaging their outputs.

Here's an explanation of the Voting Regressor and its hyperparameters:

- **Concept**: Voting Regressor combines the predictions from multiple base regressor models using a voting scheme, where each model's prediction contributes equally to the final prediction. The final prediction is obtained by averaging the predictions of individual regressor models.

- **Implementation**: It can use various types of base regressors, such as linear regression, decision trees, support vector machines, etc. The Voting Regressor aggregates the predictions of these base regressors to produce a final prediction.

- **Advantages**:
  - Improves prediction accuracy and robustness by leveraging the collective knowledge of multiple regressor models.
  - Reduces overfitting and generalizes well to unseen data by combining diverse models.
  - Easy to implement and use, similar to other ensemble methods in scikit-learn.

### Hyperparameters:

1. **estimators**: This parameter specifies the list of base regressor models to be included in the voting ensemble. It is typically a list of tuples, where each tuple contains a name or identifier for the regressor and the regressor object itself.

2. **voting**: This parameter determines the type of voting scheme to be used for combining predictions. It can take one of the following values:
   - 'hard': Hard voting, where the final prediction is based on the majority vote of individual regressor models.
   - 'soft': Soft voting, where the final prediction is based on the average of predicted probabilities or scores from individual regressor models.

3. **weights**: This parameter allows specifying the weights assigned to each base regressor model in the ensemble. It is used when soft voting is employed, allowing certain models to have more influence on the final prediction than others.

4. **n_jobs**: This parameter specifies the number of CPU cores to use for parallelizing the computation. It speeds up the fitting process when multiple regressor models are trained simultaneously.

5. **verbose**: This parameter controls the verbosity of the output during fitting. Higher values provide more verbose output for debugging purposes.

By adjusting these hyperparameters, you can customize the behavior of the Voting Regressor to suit the characteristics of your dataset and the specific requirements of your regression task.

In [1]:
from sklearn.datasets import load_boston
import numpy as np

In [2]:
X,y = load_boston(return_X_y=True)


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [3]:
X.shape

(506, 13)

In [4]:
y.shape

(506,)

In [5]:
X

array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
        4.9800e+00],
       [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
        9.1400e+00],
       [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
        4.0300e+00],
       ...,
       [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        5.6400e+00],
       [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
        6.4800e+00],
       [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        7.8800e+00]])

In [6]:
y

array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
       18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
       15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
       13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
       21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
       35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
       19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
       20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
       23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
       33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
       21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
       20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
       23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
       15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21

In [7]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score

In [8]:
lr = LinearRegression()
dt = DecisionTreeRegressor()
svr = SVR()

In [9]:
estimators = [('lr', lr), ('dt', dt), ('svr' , svr)]

In [10]:
for estimator in estimators:
    x = cross_val_score(estimator[1], X, y,cv= 10, scoring='r2')
    print(estimator[0], np.round(np.mean(x), 2))

lr 0.2
dt -0.27
svr -0.41


In [11]:
from sklearn.ensemble import VotingRegressor

vr = VotingRegressor(estimators)
scores = cross_val_score(vr, X, y, scoring= 'r2', cv=10)
print('Voting Regressor', np.round(np.mean(scores), 2))

Voting Regressor 0.43


In [12]:
for i in range(1,4):
    for j in range(1,4):
        for k in range(1,4):
            vr = VotingRegressor(estimators, weights= [i,j,k])
            scores = cross_val_score(vr, X, y, scoring= 'r2', cv=10)
            print('For i = {}, j = {}, k = {}'.format(i,j,k), np.round(np.mean(scores), 2))

For i = 1, j = 1, k = 1 0.44
For i = 1, j = 1, k = 2 0.35
For i = 1, j = 1, k = 3 0.26
For i = 1, j = 2, k = 1 0.43
For i = 1, j = 2, k = 2 0.39
For i = 1, j = 2, k = 3 0.34
For i = 1, j = 3, k = 1 0.34
For i = 1, j = 3, k = 2 0.38
For i = 1, j = 3, k = 3 0.39
For i = 2, j = 1, k = 1 0.46
For i = 2, j = 1, k = 2 0.42
For i = 2, j = 1, k = 3 0.33
For i = 2, j = 2, k = 1 0.4
For i = 2, j = 2, k = 2 0.45
For i = 2, j = 2, k = 3 0.41
For i = 2, j = 3, k = 1 0.45
For i = 2, j = 3, k = 2 0.41
For i = 2, j = 3, k = 3 0.4
For i = 3, j = 1, k = 1 0.44
For i = 3, j = 1, k = 2 0.43
For i = 3, j = 1, k = 3 0.4
For i = 3, j = 2, k = 1 0.39
For i = 3, j = 2, k = 2 0.41
For i = 3, j = 2, k = 3 0.42
For i = 3, j = 3, k = 1 0.38
For i = 3, j = 3, k = 2 0.46
For i = 3, j = 3, k = 3 0.44


In [14]:
vr = VotingRegressor(estimators, weights= [2,1,1])
scores = cross_val_score(vr, X, y, scoring= 'r2', cv=10)
print(np.round(np.mean(scores), 2))

0.46
