### Q1. What is Random Forest Regressor?
The **Random Forest Regressor** is an ensemble learning method used for regression tasks. It consists of multiple decision trees trained on random subsets of the data. Each decision tree produces a prediction, and the **average of all trees’ predictions** is taken as the final output. It is an extension of the Random Forest algorithm, which is used for both classification and regression.

### Q2. How does Random Forest Regressor reduce the risk of overfitting?
The **Random Forest Regressor** reduces overfitting by:
- **Training multiple decision trees** on different random subsets of the dataset (bootstrapping). Each tree will be slightly different, leading to less overfitting to specific data points.
- **Averaging predictions** across multiple trees smooths out the predictions, reducing variance. This prevents individual trees from making extreme predictions that lead to overfitting.

### Q3. How does Random Forest Regressor aggregate the predictions of multiple decision trees?
In **Random Forest Regressor**, each decision tree produces a continuous numerical prediction for the target variable. The final prediction is obtained by:
- **Averaging** the predictions from all the individual decision trees. This approach ensures that the overall prediction is more robust and reduces the effect of outlier predictions from individual trees.

### Q4. What are the hyperparameters of Random Forest Regressor?
Some key **hyperparameters** of the Random Forest Regressor include:
- **n_estimators**: The number of trees in the forest (e.g., 100, 200).
- **max_depth**: The maximum depth of each decision tree (limits overfitting).
- **min_samples_split**: The minimum number of samples required to split an internal node.
- **min_samples_leaf**: The minimum number of samples required to be at a leaf node.
- **max_features**: The number of features to consider when looking for the best split (e.g., 'sqrt', 'log2').
- **bootstrap**: Whether bootstrap samples are used when building trees (usually set to `True`).

### Q5. What is the difference between Random Forest Regressor and Decision Tree Regressor?
- **Decision Tree Regressor**: A single tree that makes predictions based on recursive partitioning of the data. It is prone to **overfitting** if the tree grows too deep.
- **Random Forest Regressor**: An ensemble of multiple decision trees, which improves generalization by averaging the predictions of many trees, thus **reducing overfitting**.

### Q6. What are the advantages and disadvantages of Random Forest Regressor?
**Advantages**:
- **Reduces overfitting**: By combining the predictions of many trees, Random Forest creates more generalized models compared to individual decision trees.
- **Handles large datasets** well: It can efficiently handle high-dimensional data with many features.
- **Works well with missing values**: It can handle datasets with missing data and provide robust results.
  
**Disadvantages**:
- **Slower prediction time**: Since it averages over many trees, it can be slower to make predictions compared to a single decision tree.
- **Less interpretability**: Random Forests are harder to interpret compared to decision trees because of the ensemble nature.

### Q7. What is the output of Random Forest Regressor?
The **output of Random Forest Regressor** is a **continuous numerical value** that represents the predicted value for a given input. In the case of regression tasks, the output is the **average of the predictions** made by the individual decision trees in the forest.

### Q8. Can Random Forest Regressor be used for classification tasks?
No, the **Random Forest Regressor** is specifically designed for **regression** tasks, where the output is a continuous value. However, the **Random Forest algorithm** has a classification counterpart known as the **Random Forest Classifier**, which is used for classification tasks. The key difference is that the classifier uses majority voting across decision trees for classification, while the regressor uses averaging for continuous outputs.

In [2]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Random Forest Regressor model
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_regressor.fit(X_train, y_train)

# Make predictions
y_pred = rf_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error of the Random Forest Regressor: {mse:.2f}")


Mean Squared Error of the Random Forest Regressor: 0.26


Explanation:
- The California Housing dataset is used, which is suitable for regression tasks.
- The rest of the code remains the same for training and evaluating the Random Forest Regressor.
