# Regression Models to be tried

## 1. Getting the Cleaned and Processed Data

Prior to reading data from the cleaned CSV, this is the description we have

| Statistic | radiation | temperature | pressure | humidity | hour_sin | hour_cos | zonal_wind_u | meridional_wind_v | seasonal_pc1 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| **count** | 32686.00 | 32686.00 | 32686.00 | 32686.00 | 32686.00 | 32686.00 | 32686.00 | 32686.00 | 32686.00 |
| **mean** | 207.12 | 51.10 | 30.42 | 75.02 | -0.00 | 0.00 | 1.89 | -1.86 | 0.00 |
| **std** | 315.92 | 6.20 | 0.05 | 25.99 | 0.71 | 0.70 | 3.95 | 5.34 | 1.41 |
| **min** | 1.11 | 34.00 | 30.19 | 8.00 | -1.00 | -1.00 | -29.12 | -35.53 | -2.01 |
| **25%** | 1.23 | 46.00 | 30.40 | 56.00 | -0.71 | -0.71 | 0.00 | -5.61 | -1.24 |
| **50%** | 2.66 | 50.00 | 30.43 | 85.00 | -0.00 | 0.00 | 1.75 | -2.22 | -0.15 |
| **75%** | 354.24 | 55.00 | 30.46 | 97.00 | 0.71 | 0.71 | 4.14 | 2.06 | 1.17 |
| **max** | 1601.26 | 71.00 | 30.56 | 103.00 | 1.00 | 1.00 | 19.84 | 19.93 | 2.67 |

And the info we have is

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32686 entries, 0 to 32685
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   radiation          32686 non-null  float64
 1   temperature        32686 non-null  int64  
 2   pressure           32686 non-null  float64
 3   humidity           32686 non-null  int64  
 4   hour_sin           32686 non-null  float64
 5   hour_cos           32686 non-null  float64
 6   zonal_wind_u       32686 non-null  float64
 7   meridional_wind_v  32686 non-null  float64
 8   seasonal_pc1       32686 non-null  float64
dtypes: float64(7), int64(2)
memory usage: 2.2 MB
```

Now lets read the cleaned data and check the data again

In [None]:
!pip install tqdm

In [112]:
# All import statements here

import pandas as pd
import numpy as np
import xgboost as xgb
import lightgbm as lgb

from tqdm.auto import tqdm

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import Ridge
from sklearn.svm import SVR

In [3]:
# All constants here
DATA_LOCATION = '..\\Data\\'
FILE_NAME = 'Clean_Data'
FILE_TYPE ='.csv'
INTERMEDIARY_FILE = 'checking.csv'

# All variables here
df = []

In [4]:
# Function to load the DF
def reload_df():
    global df
    df = pd.read_csv(DATA_LOCATION + FILE_NAME + FILE_TYPE)
reload_df()
df

Unnamed: 0,radiation,temperature,pressure,humidity,hour_sin,hour_cos,zonal_wind_u,meridional_wind_v,seasonal_pc1
0,2.58,51,30.43,103,0.000582,1.000000,10.973467,2.479016,2.670511
1,2.83,51,30.43,103,0.022542,0.999746,4.024213,-8.050200,2.670511
2,2.16,51,30.43,103,0.087590,0.996157,4.840925,-6.205026,2.670511
3,2.21,51,30.43,103,0.109228,0.994017,10.549612,-14.584433,2.670511
4,2.25,51,30.43,103,0.131175,0.991359,10.387623,4.319697,2.670511
...,...,...,...,...,...,...,...,...,...
32681,1.22,41,30.34,83,-0.108722,0.994072,-5.782236,-3.482564,-2.010188
32682,1.21,41,30.34,82,-0.087083,0.996201,-4.702078,-3.078126,-2.010188
32683,1.21,42,30.34,81,-0.065113,0.997878,-4.875505,-6.177892,-2.010188
32684,1.19,41,30.34,80,-0.043401,0.999058,-4.539889,-6.428554,-2.010188


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32686 entries, 0 to 32685
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   radiation          32686 non-null  float64
 1   temperature        32686 non-null  int64  
 2   pressure           32686 non-null  float64
 3   humidity           32686 non-null  int64  
 4   hour_sin           32686 non-null  float64
 5   hour_cos           32686 non-null  float64
 6   zonal_wind_u       32686 non-null  float64
 7   meridional_wind_v  32686 non-null  float64
 8   seasonal_pc1       32686 non-null  float64
dtypes: float64(7), int64(2)
memory usage: 2.2 MB


In [6]:
df.describe()

Unnamed: 0,radiation,temperature,pressure,humidity,hour_sin,hour_cos,zonal_wind_u,meridional_wind_v,seasonal_pc1
count,32686.0,32686.0,32686.0,32686.0,32686.0,32686.0,32686.0,32686.0,32686.0
mean,207.124697,51.103255,30.422879,75.016307,-0.000993,0.000706,1.886346,-1.857141,0.0
std,315.916387,6.201157,0.054673,25.990219,0.709667,0.704558,3.953855,5.341352,1.406663
min,1.11,34.0,30.19,8.0,-1.0,-1.0,-29.115856,-35.531953,-2.010188
25%,1.23,46.0,30.4,56.0,-0.708083,-0.706181,0.0,-5.610554,-1.241493
50%,2.66,50.0,30.43,85.0,-0.002182,0.001345,1.753401,-2.220615,-0.152499
75%,354.235,55.0,30.46,97.0,0.708032,0.706078,4.141189,2.057191,1.169449
max,1601.26,71.0,30.56,103.0,1.0,1.0,19.84277,19.928081,2.670511


## 2. Quick Recap of the Data

Our target feature is 'radiation'. **hour_cos, hour_sin, zonal_wind_u, meridional_wind_v and seasonal_pc1** these are engineered features.

The units of each columns are:

- radiation: watts per meter&sup2;

- Temperature: degrees Fahrenheit (&deg;F)

- Humidity: percent (%)

- Barometric pressure: Hg

- seasonal_pc1: combined from daylight hours and day of the year

- zonal_wind_u: speed * (-1 to 1)

- meridional_wind_v: speed * (-1 to 1)

- hour_cos: -1 to 1

- hour_sin: -1 to 1

Lets try out some regression models on this cleaned data

In [7]:
df

Unnamed: 0,radiation,temperature,pressure,humidity,hour_sin,hour_cos,zonal_wind_u,meridional_wind_v,seasonal_pc1
0,2.58,51,30.43,103,0.000582,1.000000,10.973467,2.479016,2.670511
1,2.83,51,30.43,103,0.022542,0.999746,4.024213,-8.050200,2.670511
2,2.16,51,30.43,103,0.087590,0.996157,4.840925,-6.205026,2.670511
3,2.21,51,30.43,103,0.109228,0.994017,10.549612,-14.584433,2.670511
4,2.25,51,30.43,103,0.131175,0.991359,10.387623,4.319697,2.670511
...,...,...,...,...,...,...,...,...,...
32681,1.22,41,30.34,83,-0.108722,0.994072,-5.782236,-3.482564,-2.010188
32682,1.21,41,30.34,82,-0.087083,0.996201,-4.702078,-3.078126,-2.010188
32683,1.21,42,30.34,81,-0.065113,0.997878,-4.875505,-6.177892,-2.010188
32684,1.19,41,30.34,80,-0.043401,0.999058,-4.539889,-6.428554,-2.010188


## 3. Data splitting

In [9]:
# Separate Features (X) and Target (Y)
Y = df['radiation']
X = df.drop(columns=['radiation'])

# --- Step 1: Split into Training (60%) and Temp (40%) ---
X_train, X_temp, Y_train, Y_temp = train_test_split(
    X, Y, test_size=0.4, random_state=42
)

# --- Step 2: Split Temp into Validation (20%) and Test (20%) ---
# (0.5 of 0.4 = 0.2, or 20% of the original data)
X_val, X_test, Y_val, Y_test = train_test_split(
    X_temp, Y_temp, test_size=0.5, random_state=42
)

print(f"Train set size: {len(X_train)} samples")
print(f"Validation set size: {len(X_val)} samples")
print(f"Test set size: {len(X_test)} samples")

Train set size: 19611 samples
Validation set size: 6537 samples
Test set size: 6538 samples


## 4. Linear Regression

In [28]:
# 1. Initialize the Linear Regression model
linear_model = LinearRegression()

# 2. Train the model
linear_model.fit(X_train, Y_train)

# 3. Predict on the Test Set (or Validation Set if tuning)
Y_pred_val = linear_model.predict(X_val)
Y_pred_test = linear_model.predict(X_test)
Y = {
        'Test': [Y_test, Y_pred_test],
        'Validation': [Y_val, Y_pred_val]
    }

In [32]:
# Calculate metrics
def linear_reg_metrics(condition):
    mse = mean_squared_error(Y[condition][0], Y[condition][1])
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(Y[condition][0], Y[condition][1])
    r2 = r2_score(Y[condition][0], Y[condition][1])

    print("\n### Model Evaluation on",condition,"Set ###")
    print(f"RMSE: {rmse:.9f}")
    print(f"MAE: {mae:.9f}")
    print(f"R-squared (R²): {r2:.9f}")

In [33]:
linear_reg_metrics('Validation')


### Model Evaluation on Validation Set ###
RMSE: 165.271866180
MAE: 126.524393584
R-squared (R²): 0.724386281


In [34]:
linear_reg_metrics('Test')


### Model Evaluation on Test Set ###
RMSE: 164.577978777
MAE: 125.910881523
R-squared (R²): 0.731027820


## 5. Decision Tree Regressor

In [92]:
# 1. Initialize the Decision Tree Regressor
def decision_tree_reg(parameters={'max_depth':10}):
    print(f"Max_depth = {parameters['max_depth']}")
    dt_model = DecisionTreeRegressor(max_depth=parameters['max_depth'], random_state=42)

    # 2. Train and Predict
    # Assuming X_train, X_test, Y_train, Y_test are ready.
    dt_model.fit(X_train, Y_train)
    Y_pred_dt = { 
        'Test': dt_model.predict(X_test),
        'Validation': dt_model.predict(X_val)
    }
    return Y_pred_dt

# 3. Evaluate
def decision_tree_metrics(condition, parameters={'max_depth':10}):
    Y_pred_dt = decision_tree_reg(parameters=parameters)
    rmse_dt = np.sqrt(mean_squared_error(Y[condition][0], Y_pred_dt[condition]))
    r2_dt = r2_score(Y[condition][0], Y_pred_dt[condition])

    print("### Decision Tree Model Performance for",condition,"Set ###")
    print(f"RMSE: {rmse_dt:.9f}")
    print(f"R-squared (R² Score): {r2_dt:.9f}")

In [93]:
decision_tree_metrics(condition='Test',parameters={'max_depth':10})

Max_depth = 10
### Decision Tree Model Performance for Test Set ###
RMSE: 104.313894792
R-squared (R² Score): 0.891944162


In [94]:
decision_tree_metrics('Validation')

Max_depth = 10
### Decision Tree Model Performance for Validation Set ###
RMSE: 105.404268192
R-squared (R² Score): 0.887896450


In [98]:
depth_list = range(5,35,5)
for d in depth_list:
    decision_tree_metrics(condition='Validation',parameters={'max_depth':d})
    print('\n')

Max_depth = 5
### Decision Tree Model Performance for Validation Set ###
RMSE: 118.942198830
R-squared (R² Score): 0.857250401


Max_depth = 10
### Decision Tree Model Performance for Validation Set ###
RMSE: 105.404268192
R-squared (R² Score): 0.887896450


Max_depth = 15
### Decision Tree Model Performance for Validation Set ###
RMSE: 112.905606922
R-squared (R² Score): 0.871372453


Max_depth = 20
### Decision Tree Model Performance for Validation Set ###
RMSE: 116.569734355
R-squared (R² Score): 0.862888279


Max_depth = 25
### Decision Tree Model Performance for Validation Set ###
RMSE: 120.330029557
R-squared (R² Score): 0.853899731


Max_depth = 30
### Decision Tree Model Performance for Validation Set ###
RMSE: 118.732835232
R-squared (R² Score): 0.857752499




> Max Depth of 10 is best

## 6. Random Forest Regression

In [100]:
def random_forest_reg(n_estimators=100, max_depth='', random_state=42):
    # 1. Initialize the Random Forest Regressor
    # n_estimators=100 ; n_jobs=-1 uses all available cores.
    print(f'n_estimators={n_estimators}, random_state={random_state}, depth={max_depth}')
    if max_depth:
        rf_model = RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, random_state=random_state, n_jobs=-1)
    else:
        rf_model = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state, n_jobs=-1)

    # 2. Train the model
    # Assuming X_train and Y_train are ready from the splitting step.
    rf_model.fit(X_train, Y_train)

    # 3. Predict on the Test and Val Set
    return {
    'Test': rf_model.predict(X_test),
    'Validation': rf_model.predict(X_val),
    'Model': rf_model
    }

# random_forest_reg()

In [101]:
def random_forest_metrics(condition, parameters = {'n_estimators':100, 'max_depth':''}):
    # Calculate evaluation metrics
    Y_pred = random_forest_reg(
        n_estimators = parameters['n_estimators'],
        max_depth = parameters['max_depth']
    )
    
    mse_rf = mean_squared_error(Y[condition][0], Y_pred[condition])
    rmse_rf = np.sqrt(mse_rf)
    r2_rf = r2_score(Y[condition][0], Y_pred[condition])

    print("### Random Forest Model Performance (on",condition,"Set) ###")
    print(f"Root Mean Squared Error (RMSE): {rmse_rf:.9f}")
    print(f"R-squared (R² Score): {r2_rf:.9f}")

In [103]:
random_forest_metrics('Test')

n_estimators=100, random_state=42, depth=
### Random Forest Model Performance (on Test Set) ###
Root Mean Squared Error (RMSE): 80.841433228
R-squared (R² Score): 0.935101909


In [105]:
random_forest_metrics('Validation')

n_estimators=100, random_state=42, depth=
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.326679223
R-squared (R² Score): 0.931611329


In [102]:
# Lets try some n_estimators
depth = range(5,36,5)
for d in depth:
    random_forest_metrics('Validation',{'n_estimators':100, 'max_depth':d})
    print('\n')

n_estimators=100, random_state=42, depth=5
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.303751369
R-squared (R² Score): 0.877232465


n_estimators=100, random_state=42, depth=10
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.605989682
R-squared (R² Score): 0.926054581


n_estimators=100, random_state=42, depth=15
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.211804177
R-squared (R² Score): 0.931802049


n_estimators=100, random_state=42, depth=20
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.359851262
R-squared (R² Score): 0.931556206


n_estimators=100, random_state=42, depth=25
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.243492680
R-squared (R² Score): 0.931749465


n_estimators=100, random_state=42, depth=30
### Random Forest Model Perform

In [109]:
# Lets try some n_estimators
estimators = range(25,301, 25)
for e in estimators:
    random_forest_metrics('Validation',{'n_estimators':e, 'max_depth':''})
    print('\n')

n_estimators=25, random_state=42, depth=
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 84.010161347
R-squared (R² Score): 0.928785799


n_estimators=50, random_state=42, depth=
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.435315929
R-squared (R² Score): 0.931430721


n_estimators=75, random_state=42, depth=
### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.224536697
R-squared (R² Score): 0.931780923


n_estimators=100, random_state=42, depth=


KeyboardInterrupt: 

In [113]:
for d in tqdm(depth):
    for e in tqdm(estimators):
        random_forest_metrics('Validation',{'n_estimators':e, 'max_depth':d})
        print('\n')

range(5, 36, 5)
range(25, 301, 25)


  0%|                                                                                                   | 0/7 [00:00<?, ?it/s]
  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=5



  8%|███████▌                                                                                  | 1/12 [00:00<00:10,  1.04it/s][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.417158067
R-squared (R² Score): 0.876979893


n_estimators=50, random_state=42, depth=5



 17%|███████████████                                                                           | 2/12 [00:02<00:13,  1.38s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.301247967
R-squared (R² Score): 0.877238038


n_estimators=75, random_state=42, depth=5



 25%|██████████████████████▌                                                                   | 3/12 [00:05<00:16,  1.85s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.153813836
R-squared (R² Score): 0.877565998


n_estimators=100, random_state=42, depth=5



 33%|██████████████████████████████                                                            | 4/12 [00:08<00:19,  2.41s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.303751369
R-squared (R² Score): 0.877232465


n_estimators=125, random_state=42, depth=5



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:12<00:20,  2.94s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.291666350
R-squared (R² Score): 0.877259365


n_estimators=150, random_state=42, depth=5



 50%|█████████████████████████████████████████████                                             | 6/12 [00:16<00:20,  3.48s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.378417280
R-squared (R² Score): 0.877066203


n_estimators=175, random_state=42, depth=5



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [00:22<00:20,  4.09s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.482406447
R-squared (R² Score): 0.876834459


n_estimators=200, random_state=42, depth=5



 67%|████████████████████████████████████████████████████████████                              | 8/12 [00:28<00:19,  4.87s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.510161892
R-squared (R² Score): 0.876772567


n_estimators=225, random_state=42, depth=5



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [00:35<00:16,  5.50s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.495957267
R-squared (R² Score): 0.876804244


n_estimators=250, random_state=42, depth=5



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [00:44<00:13,  6.71s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.539220269
R-squared (R² Score): 0.876707754


n_estimators=275, random_state=42, depth=5



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [00:54<00:07,  7.45s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.567999515
R-squared (R² Score): 0.876643547


n_estimators=300, random_state=42, depth=5



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [01:04<00:00,  5.37s/it][A
 14%|█████████████                                                                              | 1/7 [01:04<06:26, 64.50s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 110.549730048
R-squared (R² Score): 0.876684309





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=10



  8%|███████▌                                                                                  | 1/12 [00:01<00:19,  1.81s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 86.376429954
R-squared (R² Score): 0.924717598


n_estimators=50, random_state=42, depth=10



 17%|███████████████                                                                           | 2/12 [00:05<00:28,  2.85s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.732825578
R-squared (R² Score): 0.925835300


n_estimators=75, random_state=42, depth=10



 25%|██████████████████████▌                                                                   | 3/12 [00:10<00:33,  3.70s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.596514196
R-squared (R² Score): 0.926070950


n_estimators=100, random_state=42, depth=10



 33%|██████████████████████████████                                                            | 4/12 [00:16<00:39,  4.92s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.605989682
R-squared (R² Score): 0.926054581


n_estimators=125, random_state=42, depth=10



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:24<00:41,  5.92s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.649957998
R-squared (R² Score): 0.925978603


n_estimators=150, random_state=42, depth=10



 50%|█████████████████████████████████████████████                                             | 6/12 [00:34<00:44,  7.36s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.725327191
R-squared (R² Score): 0.925848273


n_estimators=175, random_state=42, depth=10



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [00:45<00:42,  8.46s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.790876350
R-squared (R² Score): 0.925734830


n_estimators=200, random_state=42, depth=10



 67%|████████████████████████████████████████████████████████████                              | 8/12 [00:57<00:37,  9.44s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.900601899
R-squared (R² Score): 0.925544740


n_estimators=225, random_state=42, depth=10



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [01:10<00:32, 10.74s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.895335015
R-squared (R² Score): 0.925553870


n_estimators=250, random_state=42, depth=10



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [01:29<00:26, 13.29s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.999044934
R-squared (R² Score): 0.925373989


n_estimators=275, random_state=42, depth=10



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [01:50<00:15, 15.67s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.979945072
R-squared (R² Score): 0.925407134


n_estimators=300, random_state=42, depth=10



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:09<00:00, 10.80s/it][A
 29%|█████████████████████████▋                                                                | 2/7 [03:14<08:34, 102.82s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 85.992837547
R-squared (R² Score): 0.925384762





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=15



  8%|███████▌                                                                                  | 1/12 [00:02<00:25,  2.30s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 83.776599678
R-squared (R² Score): 0.929181222


n_estimators=50, random_state=42, depth=15



 17%|███████████████                                                                           | 2/12 [00:06<00:34,  3.50s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.592098419
R-squared (R² Score): 0.931169651


n_estimators=75, random_state=42, depth=15



 25%|██████████████████████▌                                                                   | 3/12 [00:12<00:42,  4.78s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.130699934
R-squared (R² Score): 0.931936541


n_estimators=100, random_state=42, depth=15



 33%|██████████████████████████████                                                            | 4/12 [00:21<00:50,  6.26s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.211804177
R-squared (R² Score): 0.931802049


n_estimators=125, random_state=42, depth=15



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:32<00:55,  7.87s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.125337531
R-squared (R² Score): 0.931945428


n_estimators=150, random_state=42, depth=15



 50%|█████████████████████████████████████████████                                             | 6/12 [00:44<00:56,  9.46s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.022410727
R-squared (R² Score): 0.932115906


n_estimators=175, random_state=42, depth=15



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [00:59<00:56, 11.23s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.958642055
R-squared (R² Score): 0.932221418


n_estimators=200, random_state=42, depth=15



 67%|████████████████████████████████████████████████████████████                              | 8/12 [01:16<00:51, 12.96s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.979065425
R-squared (R² Score): 0.932187634


n_estimators=225, random_state=42, depth=15



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [01:36<00:45, 15.23s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.946188377
R-squared (R² Score): 0.932242015


n_estimators=250, random_state=42, depth=15



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [01:58<00:34, 17.41s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.979743218
R-squared (R² Score): 0.932186513


n_estimators=275, random_state=42, depth=15



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [02:21<00:19, 19.12s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.890017828
R-squared (R² Score): 0.932334873


n_estimators=300, random_state=42, depth=15



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:46<00:00, 13.91s/it][A
 43%|██████████████████████████████████████▌                                                   | 3/7 [06:01<08:48, 132.12s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.900522860
R-squared (R² Score): 0.932317511





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=20



  8%|███████▌                                                                                  | 1/12 [00:02<00:30,  2.77s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 83.786594425
R-squared (R² Score): 0.929164323


n_estimators=50, random_state=42, depth=20



 17%|███████████████                                                                           | 2/12 [00:07<00:41,  4.19s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.455629613
R-squared (R² Score): 0.931396923


n_estimators=75, random_state=42, depth=20



 25%|██████████████████████▌                                                                   | 3/12 [00:15<00:53,  5.90s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.242290849
R-squared (R² Score): 0.931751460


n_estimators=100, random_state=42, depth=20



 33%|██████████████████████████████                                                            | 4/12 [00:26<01:01,  7.66s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.359851262
R-squared (R² Score): 0.931556206


n_estimators=125, random_state=42, depth=20



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:39<01:07,  9.70s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.171141761
R-squared (R² Score): 0.931869494


n_estimators=150, random_state=42, depth=20



 50%|█████████████████████████████████████████████                                             | 6/12 [00:56<01:13, 12.22s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.036923990
R-squared (R² Score): 0.932091880


n_estimators=175, random_state=42, depth=20



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [01:14<01:11, 14.21s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.939337111
R-squared (R² Score): 0.932253344


n_estimators=200, random_state=42, depth=20



 67%|████████████████████████████████████████████████████████████                              | 8/12 [01:35<01:05, 16.35s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.926987550
R-squared (R² Score): 0.932273764


n_estimators=225, random_state=42, depth=20



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [01:59<00:55, 18.63s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.885275396
R-squared (R² Score): 0.932342710


n_estimators=250, random_state=42, depth=20



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [02:25<00:41, 20.89s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.953512067
R-squared (R² Score): 0.932229903


n_estimators=275, random_state=42, depth=20



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [02:53<00:23, 23.17s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.875349685
R-squared (R² Score): 0.932359111


n_estimators=300, random_state=42, depth=20



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [03:24<00:00, 17.07s/it][A
 57%|███████████████████████████████████████████████████▍                                      | 4/7 [09:25<08:02, 160.83s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.924174244
R-squared (R² Score): 0.932278415





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=25



  8%|███████▌                                                                                  | 1/12 [00:03<00:34,  3.12s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 83.723489457
R-squared (R² Score): 0.929270985


n_estimators=50, random_state=42, depth=25



 17%|███████████████                                                                           | 2/12 [00:09<00:47,  4.75s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.369282252
R-squared (R² Score): 0.931540530


n_estimators=75, random_state=42, depth=25



 25%|██████████████████████▌                                                                   | 3/12 [00:17<00:59,  6.57s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.086062934
R-squared (R² Score): 0.932010504


n_estimators=100, random_state=42, depth=25



 33%|██████████████████████████████                                                            | 4/12 [00:29<01:10,  8.78s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.243492680
R-squared (R² Score): 0.931749465


n_estimators=125, random_state=42, depth=25



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:44<01:16, 10.96s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.111146266
R-squared (R² Score): 0.931968946


n_estimators=150, random_state=42, depth=25



 50%|█████████████████████████████████████████████                                             | 6/12 [01:02<01:18, 13.12s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.906215538
R-squared (R² Score): 0.932308102


n_estimators=175, random_state=42, depth=25



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [01:22<01:17, 15.58s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.894658331
R-squared (R² Score): 0.932327204


n_estimators=200, random_state=42, depth=25



 67%|████████████████████████████████████████████████████████████                              | 8/12 [01:46<01:12, 18.10s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.948166733
R-squared (R² Score): 0.932238743


n_estimators=225, random_state=42, depth=25



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [02:11<01:01, 20.47s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.912430705
R-squared (R² Score): 0.932297829


n_estimators=250, random_state=42, depth=25



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [02:40<00:46, 23.13s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.976368416
R-squared (R² Score): 0.932192096


n_estimators=275, random_state=42, depth=25



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [03:13<00:25, 25.97s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.899405475
R-squared (R² Score): 0.932319358


n_estimators=300, random_state=42, depth=25



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [03:48<00:00, 19.05s/it][A
 71%|████████████████████████████████████████████████████████████████▎                         | 5/7 [13:14<06:10, 185.26s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.873543768
R-squared (R² Score): 0.932362095





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=30



  8%|███████▌                                                                                  | 1/12 [00:03<00:36,  3.33s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 83.903858614
R-squared (R² Score): 0.928965907


n_estimators=50, random_state=42, depth=30



 17%|███████████████                                                                           | 2/12 [00:09<00:50,  5.07s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.454812073
R-squared (R² Score): 0.931398284


n_estimators=75, random_state=42, depth=30



 25%|██████████████████████▌                                                                   | 3/12 [00:18<01:02,  6.96s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.184324494
R-squared (R² Score): 0.931847632


n_estimators=100, random_state=42, depth=30



 33%|██████████████████████████████                                                            | 4/12 [00:31<01:12,  9.09s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.338631267
R-squared (R² Score): 0.931591470


n_estimators=125, random_state=42, depth=30



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:46<01:19, 11.32s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.145003005
R-squared (R² Score): 0.931912832


n_estimators=150, random_state=42, depth=30



 50%|█████████████████████████████████████████████                                             | 6/12 [01:04<01:22, 13.70s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.907026836
R-squared (R² Score): 0.932306761


n_estimators=175, random_state=42, depth=30



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [01:26<01:21, 16.25s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.864717358
R-squared (R² Score): 0.932376678


n_estimators=200, random_state=42, depth=30



 67%|████████████████████████████████████████████████████████████                              | 8/12 [01:55<01:20, 20.23s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.928279699
R-squared (R² Score): 0.932271627


n_estimators=225, random_state=42, depth=30



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [02:23<01:07, 22.66s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.884303134
R-squared (R² Score): 0.932344317


n_estimators=250, random_state=42, depth=30



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [02:53<00:50, 25.04s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.946300667
R-squared (R² Score): 0.932241829


n_estimators=275, random_state=42, depth=30



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [03:27<00:27, 27.70s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.887157210
R-squared (R² Score): 0.932339600


n_estimators=300, random_state=42, depth=30



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [04:05<00:00, 20.44s/it][A
 86%|█████████████████████████████████████████████████████████████████████████████▏            | 6/7 [17:19<03:25, 205.67s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.919687862
R-squared (R² Score): 0.932285832





  0%|                                                                                                  | 0/12 [00:00<?, ?it/s][A

n_estimators=25, random_state=42, depth=35



  8%|███████▌                                                                                  | 1/12 [00:04<00:45,  4.13s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 84.011613491
R-squared (R² Score): 0.928783337


n_estimators=50, random_state=42, depth=35



 17%|███████████████                                                                           | 2/12 [00:10<00:55,  5.51s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.434712589
R-squared (R² Score): 0.931431725


n_estimators=75, random_state=42, depth=35



 25%|██████████████████████▌                                                                   | 3/12 [00:20<01:06,  7.40s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.224782874
R-squared (R² Score): 0.931780514


n_estimators=100, random_state=42, depth=35



 33%|██████████████████████████████                                                            | 4/12 [00:33<01:16,  9.57s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.327937256
R-squared (R² Score): 0.931609239


n_estimators=125, random_state=42, depth=35



 42%|█████████████████████████████████████▌                                                    | 5/12 [00:48<01:22, 11.79s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 82.151836168
R-squared (R² Score): 0.931901504


n_estimators=150, random_state=42, depth=35



 50%|█████████████████████████████████████████████                                             | 6/12 [01:07<01:25, 14.27s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.923477777
R-squared (R² Score): 0.932279566


n_estimators=175, random_state=42, depth=35



 58%|████████████████████████████████████████████████████▌                                     | 7/12 [01:30<01:24, 16.84s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.856169351
R-squared (R² Score): 0.932390799


n_estimators=200, random_state=42, depth=35



 67%|████████████████████████████████████████████████████████████                              | 8/12 [01:57<01:20, 20.06s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.890347728
R-squared (R² Score): 0.932334328


n_estimators=225, random_state=42, depth=35



 75%|███████████████████████████████████████████████████████████████████▌                      | 9/12 [02:25<01:07, 22.56s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.836074658
R-squared (R² Score): 0.932423989


n_estimators=250, random_state=42, depth=35



 83%|██████████████████████████████████████████████████████████████████████████▏              | 10/12 [02:58<00:51, 25.85s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.884701208
R-squared (R² Score): 0.932343659


n_estimators=275, random_state=42, depth=35



 92%|█████████████████████████████████████████████████████████████████████████████████▌       | 11/12 [03:33<00:28, 28.58s/it][A

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.833687574
R-squared (R² Score): 0.932427932


n_estimators=300, random_state=42, depth=35



100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [04:11<00:00, 20.99s/it][A
100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [21:31<00:00, 184.54s/it]

### Random Forest Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 81.865155996
R-squared (R² Score): 0.932375953







In [45]:
# Create a DataFrame to display feature importance
rf_model = random_forest_reg()['Model']
importance = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': rf_model.feature_importances_
})

# Sort by importance
importance = importance.sort_values(by='Importance', ascending=False)

print("\n### Random Forest Feature Importance ###")
print(importance.to_string(index=False))

n_estimators=100, random_state=42

### Random Forest Feature Importance ###
          Feature  Importance
         hour_cos    0.752640
         humidity    0.109138
     seasonal_pc1    0.050064
      temperature    0.029477
         hour_sin    0.016007
     zonal_wind_u    0.016001
         pressure    0.013521
meridional_wind_v    0.013152


## 7. XGBoost

In [46]:
# 1. Initialize the XGBoost Regressor
import xgboost as xgb

xgb_model = xgb.XGBRegressor(
    objective='reg:squarederror', 
    n_estimators=100, 
    learning_rate=0.1, 
    random_state=42, 
    n_jobs=-1
)

# 2. Train the model
xgb_model.fit(X_train, Y_train)

# 3. Predict on the Test Set and Val set
def xgboost_predict():
    return {
        'Test': xgb_model.predict(X_test),
        'Validation': xgb_model.predict(X_val)
    }

In [56]:
# Calculate evaluation metrics
def xgboost_metrics(condition):
    xgb_model = xgboost_predict()
    mse_xgb = mean_squared_error(Y[condition][0], xgb_model[condition])
    rmse_xgb = np.sqrt(mse_xgb)
    r2_xgb = r2_score(Y[condition][0], xgb_model[condition])

    print("### XGBoost Model Performance (on",condition,"Set) ###")
    print(f"Root Mean Squared Error (RMSE): {rmse_xgb:.9f}")
    print(f"R-squared (R² Score): {r2_xgb:.9f}")

In [57]:
xgboost_metrics('Test')

### XGBoost Model Performance (on Test Set) ###
Root Mean Squared Error (RMSE): 79.533812440
R-squared (R² Score): 0.937184399


In [58]:
xgboost_metrics('Validation')

### XGBoost Model Performance (on Validation Set) ###
Root Mean Squared Error (RMSE): 83.940928341
R-squared (R² Score): 0.928903126


## 8. Additional Models

### 8.1 Ridge Regression

In [63]:
# 1. Initialize the Ridge Regressor
# Alpha (α) is the regularization strength; 1.0 is a common default.
def ridge_regression(alpha=1.0, condition='Validation'):
    ridge_model = Ridge(alpha=alpha, random_state=42)

    # 2. Train and Predict
    ridge_model.fit(X_train, Y_train)
    Y_pred_ridge = {
        'Test': ridge_model.predict(X_test),
        'Validation': ridge_model.predict(X_val)
    }

    # 3. Evaluate
    rmse_ridge = np.sqrt(mean_squared_error(Y[condition][0], Y_pred_ridge[condition]))
    r2_ridge = r2_score(Y[condition][0], Y_pred_ridge[condition])

    print("### Ridge Regression Model Performance ###")
    print(f"RMSE: {rmse_ridge:.9f}")
    print(f"R-squared (R² Score): {r2_ridge:.9f}")

In [64]:
ridge_regression(alpha=1.0,condition='Test')

### Ridge Regression Model Performance ###
RMSE: 164.583261226
R-squared (R² Score): 0.731010553


In [65]:
ridge_regression(alpha=0.5,condition='Test')

### Ridge Regression Model Performance ###
RMSE: 164.580624873
R-squared (R² Score): 0.731019170


In [66]:
ridge_regression(alpha=1.0,condition='Validation')

### Ridge Regression Model Performance ###
RMSE: 165.271166697
R-squared (R² Score): 0.724388614


In [67]:
ridge_regression(alpha=0.5,condition='Validation')

### Ridge Regression Model Performance ###
RMSE: 165.271484001
R-squared (R² Score): 0.724387555


### 8.2 LGBM

In [68]:
# 1. Initialize the LightGBM Regressor
# The 'metric' is what LGBM uses internally for optimization
lgbm_model = lgb.LGBMRegressor(
    objective='regression',
    metric='rmse',
    n_estimators=100,
    learning_rate=0.1,
    random_state=42,
    n_jobs=-1
)

# 2. Train and Predict
lgbm_model.fit(X_train, Y_train)
Y_pred_lgbm = {
    'Test': lgbm_model.predict(X_test),
    'Validation': lgbm_model.predict(X_val)
}

# 3. Evaluate
def lgbm_metrics(condition):
    rmse_lgbm = np.sqrt(mean_squared_error(Y[condition][0], Y_pred_lgbm[condition]))
    r2_lgbm = r2_score(Y[condition][0], Y_pred_lgbm[condition])

    print("### LightGBM Model Performance ###")
    print(f"RMSE: {rmse_lgbm:.9f}")
    print(f"R-squared (R² Score): {r2_lgbm:.9f}")

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003116 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1310
[LightGBM] [Info] Number of data points in the train set: 19611, number of used features: 8
[LightGBM] [Info] Start training from score 206.188860


In [69]:
lgbm_metrics('Test')

### LightGBM Model Performance ###
RMSE: 79.420756920
R-squared (R² Score): 0.937362854


In [70]:
lgbm_metrics('Validation')

### LightGBM Model Performance ###
RMSE: 82.041275320
R-squared (R² Score): 0.932084676


### 8.3 Support Vector Regressor (SVR)

In [71]:
# 1. Initialize the SVR Model
svr_model = SVR(kernel='rbf', C=10, epsilon=0.1)

# 2. Train and Predict
# (This step may take significant time)
svr_model.fit(X_train, Y_train)
Y_pred_svr = {
    'Test': svr_model.predict(X_test),
    'Validation': svr_model.predict(X_val)
}

# 3. Evaluate
def svr_metrics(condition):
    rmse_svr = np.sqrt(mean_squared_error(Y[condition][0], Y_pred_svr[condition]))
    r2_svr = r2_score(Y[condition][0], Y_pred_svr[condition])

    print("### SVR Model Performance ###")
    print(f"RMSE: {rmse_svr:.9f}")
    print(f"R-squared (R² Score): {r2_svr:.9f}")

In [72]:
svr_metrics('Test')

### SVR Model Performance ###
RMSE: 197.388583147
R-squared (R² Score): 0.613091764


In [73]:
svr_metrics('Validation')

### SVR Model Performance ###
RMSE: 196.995225447
R-squared (R² Score): 0.608425529


## 9. Final Statistical Summary & Conclusion

A summary of all regressions tried with basic hyperparameter values and their metrics are below. Based on this data, RandomForest and light GBM are the best regression models and they vary slightly. I will use RandomForest for final web service.


| Model | RMSE | MAE | R-squared (R²) |
| :--- | :--- | :--- | :--- |
| **LightGBM** | **82.04** | N/A | **0.9321** |
| **Random Forest** | 82.33 | N/A | 0.9316 |
| **XGBoost** | 83.94 | N/A | 0.9289 |
| **Decision Tree** | 105.40 | N/A | 0.8879 |
| **Linear Regression** | 165.27 | 126.52 | 0.7244 |
| **Ridge Regression** | 165.27 | N/A | 0.7244 |
| **SVR** | 197.00 | N/A | 0.6084 |