### Bagging Regressor

*Bagging Regressor* : A Bagging Regressor is an ensemble learning meta-algorithm that improves the accuracy and stability of regression models by training multiple base regressors on different random subsets of the training data and averaging their predictions. The process is also known as Bootstrap Aggregating. 

**How It Works**
- The core concept of a Bagging Regressor involves three main steps:
   - **1.Bootstrapping**: The original training dataset is used to create multiple new, random subsets (bootstrap samples) by selecting data points with replacement. This means a single data point can appear multiple times in a subset, while others might not be selected at all.
   - **2.Parallel Training**: A separate base regression model (often a decision tree regressor by default in libraries like scikit-learn) is trained independently and in parallel on each of these unique bootstrap samples.
  - **3.Aggregation**: For a new input, each base model makes a prediction. The Bagging Regressor then combines these individual predictions by taking their average to produce the final, more robust prediction.

**Key Benefits**
  - Reduces Variance: By training models on different data subsets and averaging their results, bagging significantly lowers the variance of the ensemble model, making it less prone to overfitting a specific training set.
  - Improves Stability & Accuracy: The aggregated prediction is more stable and reliable than any single base model's prediction.
  - Flexibility: It can be used with various base learners (e.g., decision trees, neural networks), although it works best with unstable, high-variance models.
  - Parallelizable: Since each base model is trained independently, the process can be parallelized, making it computationally efficient.

In [3]:
from IPython.display import Image
Image(url="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/08/image-7.png")

In [1]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

x, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=1 )

In [2]:
x

array([[-1.24634541, -2.3575232 ,  0.60972409, ..., -1.25935848,
        -0.11048061,  0.46983129],
       [-0.68085157, -1.06787658,  0.57296273, ..., -0.01781755,
         0.45794708, -0.6001388 ],
       [-0.14894123,  1.16533544, -0.63259014, ...,  1.89716069,
        -0.20984695, -1.38139115],
       ...,
       [ 1.32960903, -1.08278525,  0.44347873, ..., -0.48363166,
         0.01880501,  0.56264832],
       [ 1.03703898,  0.67261975,  1.00568668, ...,  0.61472628,
         0.35356722, -0.34898419],
       [ 0.438562  ,  0.92781985,  0.72667997, ..., -1.09330391,
        -0.37195994,  0.22445073]])

In [3]:
y

array([-1.29703788e+02, -1.80999622e+01, -2.48373532e+02, -5.47790773e+01,
        2.53389312e+01,  1.64353500e+01, -1.62379961e+02,  8.68801965e+01,
       -3.03384035e+01, -1.99726617e+02, -1.13506241e+02, -1.90063912e+02,
       -1.87538897e+02,  1.72228819e+02,  6.16386076e+01, -3.56835616e+02,
        8.24251916e+01,  1.61358017e+02,  1.52957393e+02,  5.00424227e+02,
        1.21062317e+02,  4.95831180e+01, -1.77278355e+02, -2.49458193e+01,
       -1.65450348e+02,  4.18445290e+01, -6.76893860e+01, -6.46220878e+01,
       -1.11488894e+02,  2.30651763e+02,  2.56645774e+02, -1.84174552e+02,
        2.98886683e+02, -2.73732404e+01,  4.17040920e+02,  1.01523419e+02,
       -1.23009941e+02,  3.56833517e+02, -1.71822143e+02,  3.61894176e+02,
        1.97745189e+02, -1.85490215e+02,  1.36198413e+02, -3.26485048e+02,
       -1.14069731e+02,  4.77747550e+01,  1.62640400e+02,  9.41057983e+01,
       -4.16816637e+01, -5.20442942e+01,  3.44121608e+02,  1.04900656e+02,
       -2.01206138e+02, -

In [5]:
# Train Test 
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.30, random_state=1, )

In [6]:
x_train.shape, x_test.shape

((700, 10), (300, 10))

In [None]:
Multiple regression 

In [13]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR

In [14]:
lr = LinearRegression()
dtr = DecisionTreeRegressor()
svr = SVR(kernel="linear")

### To create ensemble

In [15]:
from sklearn.ensemble import VotingRegressor

In [16]:
ensemble_clf = VotingRegressor(estimators=[("Linear_reg", lr), ("Decision_tree_reg", dtr), ("SVR", svr)])

In [17]:
ensemble_clf

### Train the model

In [19]:
## fit the model 
ensemble_clf.fit(x_train, y_train)

In [21]:
### predict the model
y_pred = ensemble_clf.predict(x_test)

In [23]:
### Accuracy Score
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.9549606671674286