Student: Joel S. Mollel

Number: C00313599

Algorithm: Random Forest

Provided with Random Forest Regression Code, we are required to

i) make sure it runs

ii)Change some hyperparameters and see the impact

iii) Use another dataset and perform other operations, and simulate as an app

(i)The code provided did run after the installation of the appropriate libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.RandomState(42)
x = 10 * rng.rand(200)

def model(x, sigma=0.3):
    fast_oscillation = np.sin(5 * x)
    slow_oscillation = np.sin(0.5 * x)
    noise = sigma * rng.randn(len(x))

    return slow_oscillation + fast_oscillation + noise

y = model(x)
plt.errorbar(x, y, 0.3, fmt='o');

ii)Changing some hyperparameters and see the impact

The hyperparameter here is sigma which indicates the noise level added to the model. The higher the noise, the more scattered is the module. Sigma is the standard deviation of the noise added to the model


(a)Hardcoding sigma values: Theoretically, the higher the sigma, the more scattered is the model and vice-versa. Let us see using three sigma values below;
Let us use sigma values 0.1,0.5 and 1.0 to see the difference

In [None]:
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.RandomState(42)
x = 10 * rng.rand(200)
# Define the model function with sigma as a hyperparameter
def model(x, sigma=0.3):
    fast_oscillation = np.sin(5 * x)
    slow_oscillation = np.sin(0.5 * x)
    noise = sigma * rng.randn(len(x))
    return slow_oscillation + fast_oscillation + noise

# Test with different sigma values
sigma_values = [0.1, 0.5, 1.0]

# Plot results for different sigma values
for sigma in sigma_values:
    y = model(x, sigma=sigma)
    plt.figure()
    plt.errorbar(x, y, yerr=sigma, fmt='o', label=f'sigma={sigma}')
    plt.title(f'Model with Noise Level (sigma={sigma})')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend()
    plt.show()


(b) Prompting user input for more flexible tests

You may test different Sigma values by entering value in the prompt below

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def get_sigma():
    while True:
        try:
            sigma = float(input("Enter the sigma value for noise: "))
            return sigma
        except ValueError:
            print("Invalid input. Please enter a numeric value.")

def model(x, sigma):
    fast_oscillation = np.sin(5 * x)
    slow_oscillation = np.sin(0.5 * x)
    noise = sigma * rng.randn(len(x))
    return slow_oscillation + fast_oscillation + noise

rng = np.random.RandomState(42)
x = 10 * rng.rand(200)

# Let us trigger user input
sigma = get_sigma()

# Output generation
y = model(x, sigma)

# We can visualize the results as follows
plt.errorbar(x, y, 0.3, fmt='o')
plt.title(f'Model with sigma = {sigma}')
plt.xlabel('x')
plt.ylabel('y')
plt.show()


iii) Using another dataset to perform other operations, and simulate as an app

The project will use a dataset drug200.csv which is the labelled dataset that has various features. By using Age, sex, BP, cholestrol, Na-to-K ratio, the drug is prescribed to a patient. All the patients are suffering from same disease but the treatment dosage is differentdepending on the measurement of each feature for each patient

Using the dataset, the model is trained with 70% of the dataset and tested with 30% of the dataset.
The performance is evaluated based on accuracy.

==> We will start with creating, training and testing the Random Forests as a classifier

a) Importing important libraries and the dataset

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

df = pd.read_csv('drug200.csv')


b)Encoding categorical features since the model work with numbers

In [None]:
df['Sex'] = df['Sex'].map({'F': 0, 'M': 1})  
df['BP'] = df['BP'].map({'HIGH': 0, 'LOW': 1, 'NORMAL': 2})  
df['Cholesterol'] = df['Cholesterol'].map({'HIGH': 0, 'NORMAL': 1}) 

c) Feature and Target label preparation

In [None]:
X = df[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']]
y = df['Drug']

d)Encoding target variable which is Drug

In [None]:
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

e)Splitting the data into training and testing sets, 70% and 30% respectively

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)

f) Creating and testing random Forest as Classifier

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

g) Run predictions and evaluate model performance

In [None]:
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

==> Prompting user input for user flexibility and variety of feature values

The program is similar to the one above but here user input is prompted. Therefore will only explain in a few combined steps

Step1: Loading the dataset and libraries

In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
df = pd.read_csv('drug200.csv')

Step2: Preparation and Encoding of categorical features and target 

In [None]:
df['Sex'] = df['Sex'].map({'F': 0, 'M': 1})  
df['BP'] = df['BP'].map({'HIGH': 0, 'LOW': 1, 'NORMAL': 2}) 
df['Cholesterol'] = df['Cholesterol'].map({'HIGH': 0, 'NORMAL': 1})  

X = df[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']]
y = df['Drug']


encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

Step3: Creating, training and classifying the patient into a target label(Drug) based on user input

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y_encoded)

print("Please provide the following details of your patient to predict the drug:")
age = int(input("Age (e.g., 23): "))
sex = input("Sex (F/M): ")
bp = input("Blood Pressure (HIGH/LOW/NORMAL): ")
cholesterol = input("Cholesterol (HIGH/NORMAL): ")
na_to_k = float(input("Sodium to Potassium ratio (Na_to_K, e.g., 25.355): "))

sex_encoded = 0 if sex.upper() == 'F' else 1
bp_encoded = {'HIGH': 0, 'LOW': 1, 'NORMAL': 2}.get(bp.upper(), 2)
cholesterol_encoded = {'HIGH': 0, 'NORMAL': 1}.get(cholesterol.upper(), 1)

user_input = pd.DataFrame([[age, sex_encoded, bp_encoded, cholesterol_encoded, na_to_k]], 
                          columns=['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K'])

predicted_label = model.predict(user_input)
predicted_drug = encoder.inverse_transform(predicted_label)
print("*"*40)
print(f"The predicted drug for this patient based on given details is: {predicted_drug[0]}")


REFERENCES

RagsX137. (2025). My own KNN Classifier. Notebook Community, available at: https://notebook.community/RagsX137/TF_Tutorial/My+own+KNN+Classifier [Accessed 5 Feb. 2025].

Yahoo Finance. (2025). Tesla, Inc. (TSLA) Stock Price, News, Quote & History. Yahoo Finance, available at: https://finance.yahoo.com/quote/TSLA/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAE0EIpCyfrPR0zR-jyMYOWNSqWOoYq3bdxTq2YmuXXmQO1JVPALSHNI0ZaHFjJygyLdh6qVqkW_gPDbzu_1NyGz1vrbq_ozMxjt4tGauDT1q4531B3vfwHJ-IrHQD15udeHq--64U9K1XuIlf8zYVeu2Oc11Czwp3XLfzGQQitSn [Accessed 12 Feb. 2025].