# Random Forest Regressor Model

The code here represents the Random Forest Regressor machine learning model to predict whether a person is affected by covid or pneumonia or whether the person is in normal state using the chest X-Ray. The Chest X-Ray images have been preproccessed and then the features have been extracted and stored in .mat files. We now load these files and create the Random Forest Regressor model. We also print different error values

In [1]:
#Importing the required packages
from utils import*
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.ensemble import RandomForestRegressor

In [2]:
# Mentioning the working directory
source_dir='./'

In [3]:
# Accessing covid.mat file and getting the data from the file
covid_features=sio.loadmat(os.path.join(source_dir,'covid.mat')) 
covid_features=covid_features['covid'] 
# Accessing normal.mat file and getting the data from the file
normal_features=sio.loadmat(os.path.join(source_dir,'normal.mat')) 
normal_features=normal_features['normal']  
# Accessing pneumonia.mat file and getting the data from the file
pneumonia_features=sio.loadmat(os.path.join(source_dir,'pneumonia.mat')) 
pneumonia_features=pneumonia_features['pneumonia']  

In [4]:
# Extracting the scores-i.e the inputs and storing it in X
X=np.concatenate((covid_features[:,:-1],normal_features[:,:-1],pneumonia_features[:,:-1]), axis=0)
# Extracting the target labels, the last column alone
y=np.concatenate((covid_features[:,-1],normal_features[:,-1],pneumonia_features[:,-1]), axis=0)

In [5]:
# Normalization of the data between 0 and 1
min_max_scaler=MinMaxScaler()
X = min_max_scaler.fit_transform(X)

In [6]:
# We use Kernel to reduce the feature components to 64 for the input data
transformer = KernelPCA(n_components=64, kernel='linear')
X = transformer.fit_transform(X)

In [7]:
# We do the splitting of data set and set the training data to be 80% ,i.e, test data = 20%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# From that 80%, test data fraction is set as 25% to get a better output as there is more randomness in the dataset for testing
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

In [8]:
# Printing the size for each of the data
print("Size of train data:", np.shape(X_train))
print("Size of train label:", np.shape(y_train))
print("Size of test data:", np.shape(X_test))
print("Size of test label:", np.shape(y_test))

Size of train data: (225, 64)
Size of train label: (225,)
Size of test data: (76, 64)
Size of test label: (76,)


In [9]:
# Creating the Random Forest Regressor Object that will create 20 trees
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
# Training the model
regressor.fit(X_train, y_train)
# Predicting/ testing the model
y_pred = regressor.predict(X_test)

In [10]:
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 0.22236842105263155
Mean Squared Error: 0.10585526315789473
Root Mean Squared Error: 0.3253540581549502
