#Q1. What is Random Forest Regressor? 

Random Forest Regressor is an ensemble learning method based on the principle of constructing a multitude of decision trees at training time and outputting the average prediction of the individual trees for regression tasks. It builds multiple decision trees and merges them together to get a more accurate and stable prediction.

In [1]:
#1
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate some random data for demonstration
np.random.seed(42)
X = np.random.rand(100, 1)
y = 4 * (X.squeeze()) + np.random.randn(100)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

# Fit the model to the training data
rf_regressor.fit(X_train, y_train)

# Make predictions on the test data
predictions = rf_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.7741987430982417


#Q2 How does Random Forest Regressor reduce the risk of overfitting? 

Random Forest Regressor reduces overfitting by averaging the predictions of multiple decision trees, each trained on different subsets of the data. This diversity helps the model generalize well to new, unseen data.


In [2]:
#2
# Using the same data as before
# ...

# Create a Random Forest Regressor with max_depth to control overfitting
rf_regressor = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)

# Fit the model to the training data
rf_regressor.fit(X_train, y_train)

# Make predictions on the test data
predictions = rf_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.6834414059910386


#Q3 How does Random Forest Regressor aggregate the predictions of multiple decision trees?

Random Forest Regressor aggregates predictions by averaging them for regression tasks. For classification, it can use voting or averaging probabilities.

In [3]:
#3
# Using the same data as before
# ...

# Create a Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)

# Fit the model to the training data
rf_regressor.fit(X_train, y_train)

# Access individual tree predictions (for the first tree)
tree_predictions = rf_regressor.estimators_[0].predict(X_test)

# Average predictions across all trees
average_predictions = np.mean([tree.predict(X_test) for tree in rf_regressor.estimators_], axis=0)

# Evaluate the model using average predictions
mse = mean_squared_error(y_test, average_predictions)
print(f"Mean Squared Error with Aggregated Predictions: {mse}")

Mean Squared Error with Aggregated Predictions: 0.7741987430982417


#Q4 What are the hyperparameters of Random Forest Regressor? 

Some key hyperparameters of RandomForestRegressor include:

n_estimators: Number of trees in the forest.

max_depth: Maximum depth of the trees.

min_samples_split: Minimum number of samples required to split an internal node.

min_samples_leaf: Minimum number of samples required to be at a leaf node.

In [5]:
#4
# Using the same data as before
# ...

# Create a Random Forest Regressor with specified hyperparameters
rf_regressor = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=42
)

# Fit the model to the training data
rf_regressor.fit(X_train, y_train)

# Make predictions on the test data
predictions = rf_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.7704437570952613


#Q5 What is the difference between Random Forest Regressor and Decision Tree Regressor?

The main differences between Random Forest Regressor and Decision Tree Regressor are:

Ensemble vs. Single Tree: Random Forest is an ensemble of multiple decision trees, while Decision Tree Regressor consists of a single decision tree.

Overfitting: Random Forest is less prone to overfitting compared to a single Decision Tree, as it averages predictions across multiple trees.


#Q6 What are the advantages and disadvantages of Random Forest Regressor?

Advantages:

Effective for both regression and classification tasks.

Robust to overfitting due to the ensemble nature.

Handles missing values well.

Provides feature importance.

Disadvantages:

Can be computationally expensive.

Harder to interpret compared to a single decision tree.

#Q7 What is the output of Random Forest Regressor?

The output of a Random Forest Regressor is a continuous prediction for each input sample.

In [7]:
#7
# Using the same data as before
# ...

# Make predictions on new data
new_data = np.array([[0.8], [0.2]])
new_predictions = rf_regressor.predict(new_data)

print("Predictions for new data:")
print(new_predictions)

Predictions for new data:
[3.24493307 0.40735797]


#Q8  Can Random Forest Regressor be used for classification tasks? 

Yes, Random Forest can be used for classification tasks as well. It aggregates the votes or probabilities from individual trees to make a final prediction.

In [8]:
#8
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Iris dataset for classification
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Create a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model to the training data
rf_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = rf_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0
