# 📄 Dataset Generation using `make_regression`

This example shows how to create a **synthetic regression dataset** using `sklearn.datasets.make_regression`.

---

## 📚 Libraries Used

```python
from sklearn.datasets import make_regression
```
- Import `make_regression` for generating random regression data.

---

## 🛠️ Code Explanation

```python
# Generate a synthetic regression dataset
X, y = make_regression(
    n_samples=1000,   # Create 1000 samples (rows)
    n_features=10,    # Each sample has 10 input features (columns)
    noise=0.1,        # Add a little noise (0.1) to make data more realistic
    random_state=42   # Set random seed for reproducibility (same output every run)
)
```

---

## 🧠 Key Points

- **Regression Dataset**: Output `y` is continuous (not categorical).
- **X**: Input features (shape = 1000 rows × 10 columns).
- **y**: Target variable (shape = 1000 values).
- **noise**: Small randomness added to make the dataset less "perfect" and more like real-world data.
- **random_state**: Fixes the randomness so you always get the same dataset when you run the code.

In [None]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate a synthetic regression dataset
X, y = make_regression(
    n_samples=1000,   # Create 1000 samples (rows)
    n_features=10,    # Each sample has 10 input features (columns)
    noise=0.1,        # Add a little noise (0.1) to make data more realistic
    random_state=42   # Set random seed for reproducibility (same output every run)
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

### Why Add Noise?
- In the real world, things aren't always perfectly predictable. Factors like genetics, environment, and other random influences affect the data.
- Noise makes the model robust and better at handling small discrepancies when it predicts on new data.

In [3]:
X_train.shape, X_test.shape

((800, 10), (200, 10))

In [4]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR

In [5]:
lr = LinearRegression()
dtr = DecisionTreeRegressor()
svr = SVR(kernel="linear")

# 📄 Ensemble Learning with `VotingRegressor`

This example demonstrates how to create an **ensemble regressor** by combining multiple regression models using `VotingRegressor` from `sklearn.ensemble`.

---

## 📚 Libraries Used

```python
from sklearn.ensemble import VotingRegressor
```
- Import the `VotingRegressor` class from `sklearn.ensemble` to create the ensemble model.

---

## 🛠️ Code Explanation

```python
# Create an ensemble model that combines multiple regression models
ensemble_regressor = VotingRegressor(
    estimators=[
        ('mlr', lr),   # 'mlr' is the name, lr is the Linear Regression model
        ('dtr', dtr),  # 'dtr' is the name, dtr is the Decision Tree Regressor model
        ('svr', svr)   # 'svr' is the name, svr is the Support Vector Regressor model
    ]
)
```

---

## 🧠 Key Points

- **Ensemble Model**: Combines predictions from multiple regression models.
- **estimators**: A list of tuples with:
  - A **name** (e.g., `'mlr'`, `'dtr'`, `'svr'`).
  - The corresponding **model instance** (e.g., `lr`, `dtr`, `svr`).
- **Voting Method**: By default, `VotingRegressor` uses **average voting**, where the predictions of the individual models are averaged to give the final result.

In [6]:
#to create ensemle
from sklearn.ensemble import VotingRegressor
ensemble_regressor = VotingRegressor(estimators=[('mlr', lr), ("dtr", dtr), ("svr", svr)])

## Voting Method
- By default, VotingRegressor uses average voting, where the predictions of the individual models are averaged to give the final result.

In [7]:
ensemble_regressor

In [None]:
# Train The Model.....
ensemble_regressor.fit(X_train, y_train)

In [None]:
# Predict the target values on the test data
y_pred = ensemble_regressor.predict(X_test)

# We Will Evaluate other metrics, like R² score....
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(f'R² Score: {r2}')

R² Score: 0.842
