# Random Forest Regression



## Random forest regression is an ensemble algorithm that combines multiple decision trees to make predictions. It is useful when the data has a large number of features and non-linear relationships.

1. Import necessary libraries: First, you need to import the necessary libraries such as Pandas, NumPy, Matplotlib, and Scikit-learn (which includes random forest regression model and evaluation metrics).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

2. Load the dataset: You can load the dataset into a Pandas dataframe using the read_csv function.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_csv('/content/drive/MyDrive/combank.csv')

3. Preprocess the data: Before training the random forest regression model, you need to preprocess the data. You can drop any unnecessary columns, convert the date column to the Pandas datetime format, and split the data into training and testing datasets.

In [None]:
# Drop unnecessary columns
data = data.drop(['High', 'Low', 'Open', 'Trades', 'Volume', 'Turnover'], axis=1)

# Convert date column to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%m-%d')

# Split data into training and testing datasets
train_data = data[data['Date'].dt.year < 2015]
test_data = data[data['Date'].dt.year >= 2015]

4. Prepare the data for training: You need to extract the independent variable (date) and dependent variable (closing price) from the training dataset.

In [None]:
# Extract independent variable and dependent variable from training dataset
train_x = np.array(train_data['Date'].dt.strftime('%s').astype('int')).reshape((-1, 1))
train_y = np.array(train_data['Close'])

5. Train the random forest regression model: You can train the random forest regression model on the training dataset using the fit method of the RandomForestRegressor class.

In [None]:
# Create a random forest regression object and fit the training data
rfr = RandomForestRegressor(n_estimators=100, random_state=42)
rfr.fit(train_x, train_y)

6. Make predictions: You can use the trained random forest regression model to make predictions on the testing dataset using the predict method.

In [None]:
# Extract independent variable and dependent variable from testing dataset
test_x = np.array(test_data['Date'].dt.strftime('%s').astype('int')).reshape((-1, 1))
test_y = np.array(test_data['Close'])

# Make predictions on testing data
pred_y = rfr.predict(test_x)

7. Evaluate the model: Finally, you can evaluate the performance of the random forest regression model using evaluation metrics such as mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and R-squared (R2) score.

In [None]:
# Calculate evaluation metrics
mse = mean_squared_error(test_y, pred_y)
rmse = np.sqrt(mse)
mae = mean_absolute_error(test_y, pred_y)
r2 = r2_score(test_y, pred_y)

# Print evaluation metrics
print("Mean Squared Error: ", mse)
print("Root Mean Squared Error: ", rmse)
print("Mean Absolute Error: ", mae)
print("R-squared: ", r2)