# Random Forest Model for Predicting Wq

This notebook provides a detailed walkthrough of building, training, and evaluating a Random Forest model to predict the queue waiting time \(Wq) based on several input features. The explanations are designed to be comprehensive for university-level research.

## 1. Introduction and Objective

The goal of this model is to predict the waiting time in a queue \(W_q\) using a Random Forest Regressor. The input features include:
- \(lambda): Arrival rate
- \(Lq): Average number of customers in the queue
- \(s): Number of servers
- \(mu): Service rate
- \(rho): Utilization factor

We will preprocess the data, build the Random Forest model, train it, and evaluate its performance.

## 2. Data Loading and Preprocessing

We start by loading the dataset and selecting the relevant features and target variable. Then, we scale the features using StandardScaler to normalize the data, which helps the model perform better.

We also split the data into training and testing sets to evaluate the model's generalization.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load data
data = pd.read_csv('../dataset/dataset.csv')
features = ['lambda', 'Lq', 's', 'mu', 'rho']
X = data[features]
y = data['Wq']

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

print(f"Training samples: {X_train.shape[0]}, Testing samples: {X_test.shape[0]}")

## 3. Model Building and Training

We build a Random Forest Regressor with 100 trees. The model is trained on the training data to learn the relationship between the input features and the target waiting time.

In [None]:
from sklearn.ensemble import RandomForestRegressor

# Build model
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train model
model.fit(X_train, y_train)

## 4. Evaluation Metrics and Interpretation

We evaluate the model using Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) on both training and testing sets. These metrics provide insights into the average prediction error and the variance of the errors.

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Make predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate metrics
train_mae = mean_absolute_error(y_train, y_train_pred)
train_mse = mean_squared_error(y_train, y_train_pred)
train_rmse = np.sqrt(train_mse)

test_mae = mean_absolute_error(y_test, y_test_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
test_rmse = np.sqrt(test_mse)

print("RANDOM FOREST PERFORMANCE")
print(f"Training: MAE: {train_mae:.4f}, MSE: {train_mse:.4f}, RMSE: {train_rmse:.4f}")
print(f"Testing:  MAE: {test_mae:.4f}, MSE: {test_mse:.4f}, RMSE: {test_rmse:.4f}")

## 5. Visualization and Analysis

We visualize the actual vs predicted values for selected features on both training and testing sets. This helps us understand how well the model captures the relationships between features and the target variable.

In [None]:
import matplotlib.pyplot as plt

# Use the first 10 samples from the test set for consistent comparison
comparison_indices = range(10)
y_test_comparison = y_test.iloc[comparison_indices]
y_test_pred_comparison = y_test_pred[comparison_indices]

# Create 6 plots
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Convert back to original scale for plotting
X_train_orig = scaler.inverse_transform(X_train)
X_test_orig = scaler.inverse_transform(X_test)

features_plot = ['lambda', 'Lq', 'rho']
indices = [0, 1, 4]

for row, (name, X_data, y_actual, y_pred) in enumerate([
    ('Training', X_train_orig, y_train, y_train_pred),
    ('Testing', X_test_orig, y_test, y_test_pred)
]):
    for col, (feature, idx) in enumerate(zip(features_plot, indices)):
        ax = axes[row, col]
        ax.scatter(X_data[:, idx], y_actual, alpha=0.6, color='blue', label='Actual', s=20)
        ax.scatter(X_data[:, idx], y_pred, alpha=0.6, color='red', label='Predicted', s=20)
        ax.set_xlabel(feature)
        ax.set_ylabel('Wq')
        ax.set_title(f'{name}: Wq vs {feature}')
        ax.legend()
        ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Sample Predictions Comparison

We compare the actual and predicted \(W_q\) values for the first 10 samples in the test set to observe the prediction accuracy on individual data points.

In [None]:
import pandas as pd
import numpy as np

comparison = pd.DataFrame({
    'Actual_Wq': y_test_comparison,
    'Predicted_Wq': y_test_pred_comparison,
    'Difference': np.abs(y_test_comparison - y_test_pred_comparison)
})
print(comparison.to_string(index=False))