Author:
        
        KIM, JeongYoong, jeongyoong@ccnets.org
        
    COPYRIGHT (c) 2024. CCNets. All Rights reserved.

<p align="center">
  <img src="https://storage.googleapis.com/kaggle-datasets-images/4956778/8344638/a2a6aa289fce8461958dc287f1dab799/dataset-cover.jpg?t=2024-05-07-09-36-53" alt="IMG">
</p>

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 1. Content</i></h1>

<br>

> <h4 style = 'font-family: Times New Roman'>
This dataset explores how weather conditions impact renewable energy generation. <br><br>Spanning from January 1, 2017, to August 31, 2022, the dataset provides climate data such as temperature, pressure, wind speed, and sunlight duration at 15-minute intervals. <br><br>By incorporating variables like GHI and SunlightTime, it enables the prediction of solar energy production.


- DataSource: https://www.kaggle.com/datasets/pythonafroz/renewable-power-generation-and-weather-conditions/data

<hr>

CCNet result: https://wandb.ai/ccnets/causal-learning

<blockquote>
R&sup2 score: <mark>0.90</mark>

</blockquote>

Benchmark: https://www.kaggle.com/datasets/pythonafroz/renewable-power-generation-and-weather-conditions/code?datasetId=4956778&sortBy=voteCount
<blockquote>
R&sup2 score:

- RandomForest: 0.89
- LSTM: 0.87
- KNN: 0.80
- XGBoost: 0.82

</blockquote>

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 2. About Dataset</i></h1>

<br> 

<details>
    <summary>More Columns Info</summary>
    17 columns in the dataset, below is <u>description of main features</u>:

    *  (1) Time: The timestamp of the recorded data in the format of YYYY-MM-DD HH:MM:SS.
        
    *  (2) Energy delta[Wh]: The difference in energy consumption in Watt-hours (Wh) from the previous timestamp to the current timestamp.
        
    *  (3) GHI: Global Horizontal Irradiance in Watts per square meter (W/m²) measured by a pyranometer.
        
    *  (4) temp: The temperature in degrees Celsius (°C) measured at the same height as the pyranometer.

    *  (5) pressure: The atmospheric pressure in hectopascals (hPa) measured at the same height as the pyranometer.

    *  (6) humidity: The relative humidity in percentage (%) measured at the same height as the pyranometer.

    *  (7) wind_speed: The wind speed in meters per second (m/s) measured at the same height as the pyranometer.

    *  (8) rain_1h: The amount of precipitation in millimeters (mm) measured over the past hour.
    
    *  (9) snow_1h: The amount of snowfall in millimeters.

    *  (10) clouds_all: The cloud situation.
</details>    
    
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 3. Goal of the Notebook</i></h1>
    
> <h4 style = 'font-family: Times New Roman'>
Goal is to train and test model using GPT, Pytorch. <br><br>
    The Target Column used here is <b>Energy delta[Wh]</b>
</h4>

 

***

<a id="1"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #42c2f5'>1.</b> Import Necessary Libraries and Dataset</b></h1>

In [1]:
import os
import sys
import warnings
warnings.filterwarnings("ignore")

path_append = "../../"
sys.path.append(path_append)  # Go up one directory from where you are.

In [2]:
import pandas  as pd
df = pd.read_csv(path_append + '../data/Renewable Power Generation and weather Conditions/Renewable.csv')
df.head()

Unnamed: 0,Time,Energy delta[Wh],GHI,temp,pressure,humidity,wind_speed,rain_1h,snow_1h,clouds_all,isSun,sunlightTime,dayLength,SunlightTime/daylength,weather_type,hour,month
0,2017-01-01 00:00:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
1,2017-01-01 00:15:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
2,2017-01-01 00:30:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
3,2017-01-01 00:45:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
4,2017-01-01 01:00:00,0,0.0,1.7,1020,100,5.2,0.0,0.0,100,0,0,450,0.0,4,1,1


<a id="2"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #4290f5'>2.</b> Modeling: Preprocess </b></h1>

In [3]:
from preprocessing.data_frame import auto_preprocess_dataframe
target_columns = ['Energy delta[Wh]']
df, description = auto_preprocess_dataframe(df, target_columns) 

description

Column 'isSun' has 2 unique values.
Column 'weather_type' has 5 unique values.


Unnamed: 0,Min,Max,Mean,Std,Null Count,Scaled,Encoded
GHI,-0.034188,4.863248,0.6623192,1.114787,0,Robust,
temp,-3.969044,3.911737,2.4265390000000002e-17,1.202485,0,Minmax,
pressure,-5.175015,4.28502,3.446264e-15,1.295453,0,Minmax,
humidity,-2.818182,0.727273,-0.1904288,0.709294,0,Robust,
wind_speed,-3.937746,10.362254,1.191893e-15,1.821694,0,,
rain_1h,-0.066035,8.023965,1.141051e-17,0.278913,0,,
snow_1h,-0.007148,2.812852,-1.306702e-18,0.06971,0,,
clouds_all,-1.801176,0.928938,-1.850958e-16,1.000003,0,Standard,
sunlightTime,-0.076923,2.538462,0.4659515,0.702313,0,Robust,
dayLength,-1.723228,1.565768,-7.633489000000001e-17,1.124434,0,Minmax,


{'num_features': 23,
 'num_classes': 1,
 'encoded_columns': Index(['day_of_year_cos', 'day_of_year_sin', 'isSun', 'month_cos', 'month_sin',
        'time_scaled', 'weather_type'],
       dtype='object'),
 'one_hot_encoded_columns': Index(['isSun', 'weather_type'], dtype='object'),
 'encoded_datatime_columns': Index(['time_scaled', 'day_of_year_sin', 'day_of_year_cos', 'month_sin',
        'month_cos'],
       dtype='object'),
 'scalers': {'GHI': 'robust',
  'SunlightTime/daylength': 'none',
  'clouds_all': 'standard',
  'dayLength': 'minmax',
  'hour': 'minmax',
  'humidity': 'robust',
  'pressure': 'minmax',
  'rain_1h': 'none',
  'snow_1h': 'none',
  'sunlightTime': 'robust',
  'temp': 'minmax',
  'wind_speed': 'none'}}

In [4]:
import torch
from sklearn.model_selection import train_test_split
from preprocessing.dataset import TemplateDataset

max_seq_len = 32
min_seq_len = 16
train_df, test_df = train_test_split(df, test_size=0.2, shuffle=False)
# predict the next value in the sequence
train_df_x = train_df.iloc[:, :-1] # all columns except the last one
train_df_y = train_df.iloc[:, -1:] # only the last column

test_df_x = test_df.iloc[:, :-1] # all columns except the last one
test_df_y = test_df.iloc[:, -1:] # only the last column

print('train df shape: ', train_df.shape)
print('test df shape: ', test_df.shape)
trainset = TemplateDataset(train_df_x, train_df_y, min_seq_len = min_seq_len, max_seq_len=max_seq_len)
testset = TemplateDataset(test_df_x, test_df_y, min_seq_len = max_seq_len, max_seq_len=max_seq_len)

train df shape:  (157420, 24)
test df shape:  (39356, 24)


<a id="3"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #427bf5'>3.</b> Modeling: Causal Learning</b></h1>

### **3-1. Initialize Causal Learning Class**

<blockquote>

Imports the `CausalLearning` class from the causal_learning module. 

It then sets the computation device to GPU if available, otherwise to CPU. 

Finally, it initializes an instance of the `CausalLearning` class with the machine learning configuration (`ml_config`), data configuration (`data_config`), the specified device, and 

additional options (`use_print` and `use_wandb`) for enabling print statements and logging with *Weights & Biases*.

</blockquote>

In [5]:
from tools.config.data_config import DataConfig
from tools.config.ml_config import MLConfig
from causal_learning import CausalLearning

num_features = description['num_features']
num_classes = description['num_classes']
data_config = DataConfig(dataset_name = 'renewable-power-gen-prediction', task_type='regression', obs_shape=[num_features], label_size=num_classes)

#  Set training configuration from the AlgorithmConfig class, returning them as a Namespace object.
ml_config = MLConfig(model_name = 'gpt')
ml_config.training.error_function = 'mae'
ml_config.training.num_epoch = 20

# Set the device to GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 

# Initialize the CausalLearning class with the training configuration, data configuration, device, and use_print and use_wandb flags
causal_learning = CausalLearning(ml_config, data_config, device, use_print=True, use_wandb= True) 

Trainer Name: causal_trainer


[1mModelConfig Parameters:[0m


Unnamed: 0,d_model,dropout,model_name,num_layers,use_seq_input
0,256,0.05,gpt,5,True


[1mTrainConfig Parameters:[0m


Unnamed: 0,batch_size,error_function,max_seq_len,min_seq_len,num_epoch
0,64,mae,,,20


[1mOptimConfig Parameters:[0m


Unnamed: 0,clip_grad_range,decay_rate_100k,learning_rate,max_grad_norm,scheduler_type
0,,0.05,0.001,1.0,exponential


[1mDataConfig Parameters:[0m


Unnamed: 0,dataset_name,task_type,obs_shape,label_size,explain_size,show_image_indices
0,renewable-power-gen-prediction,regression,[23],1,,








### **3-2. Train Causal Learning**

<blockquote>

Initiates the training process for the causal learning using the `trainset` and `testset`. 

During the training, the model will be trained on the `trainset` and evaluated on the `testset`. 

The results of the training and evaluation will include various metrics. 

</blockquote>

In [6]:
causal_learning.train(trainset, testset)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mjunhopark[0m ([33mccnets[0m). Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Adding directory to artifact (.\..\saved\renewable-power-gen-prediction\causal-learning)... Done. 0.0s


Epochs:   0%|          | 0/20 [00:00<?, ?it/s]

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[0/20][100/2459][Time 12.01]
Unified LR across all optimizers: 0.000996978883189373
CCNet:  Three Gpt
Inf: 0.1146	Gen: 0.4488	Rec: 0.4554	E: 0.1059	R: 0.1196	P: 0.7961

mse: 1151697.0000
mae: 479.4897
rmse: 1073.1715
r2: -0.2379

mse: 1354437.0000
mae: 548.5568
rmse: 1163.8029
r2: -0.2741

[0/20][200/2459][Time 12.18]
Unified LR across all optimizers: 0.0009939966705585644
CCNet:  Three Gpt
Inf: 0.0445	Gen: 0.3053	Rec: 0.3114	E: 0.0378	R: 0.0488	P: 0.5754

mse: 1181797.6250
mae: 494.7833
rmse: 1087.1051
r2: -0.2446

mse: 1197094.7500
mae: 500.2005
rmse: 1094.1183
r2: -0.2481

[0/20][300/2459][Time 11.71]
Unified LR across all optimizers: 0.0009910233784699313
CCNet:  Three Gpt
Inf: 0.0364	Gen: 0.2833	Rec: 0.2874	E: 0.0317	R: 0.0390	P: 0.5355

mse: 1662106.6250
mae: 622.6227
rmse: 1289.2272
r2: -0.2874

mse: 1496898.0000
mae: 589.3663
rmse: 1223.4778
r2: -0.2853

[0/20][400/2459][Time 12.14]
Unified LR across all optimizers: 0.0009880589802399075
CCNet:  Three Gpt
Inf: 0.0348	Gen: 0.269

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[1/20][41/2459][Time 12.06]
Unified LR across all optimizers: 0.0009278146802343803
CCNet:  Three Gpt
Inf: 0.0185	Gen: 0.1097	Rec: 0.1068	E: 0.0217	R: 0.0158	P: 0.1994

mse: 1129925.2500
mae: 501.1025
rmse: 1062.9794
r2: -0.1591

mse: 1307215.6250
mae: 558.0349
rmse: 1143.3353
r2: -0.1875

[1/20][141/2459][Time 12.01]
Unified LR across all optimizers: 0.0009250393549941983
CCNet:  Three Gpt
Inf: 0.0177	Gen: 0.1033	Rec: 0.1010	E: 0.0201	R: 0.0158	P: 0.1889

mse: 1493143.8750
mae: 597.3525
rmse: 1221.9426
r2: -0.1780

mse: 1138658.7500
mae: 509.2680
rmse: 1067.0796
r2: -0.1428

[1/20][241/2459][Time 12.04]
Unified LR across all optimizers: 0.0009222723314443782
CCNet:  Three Gpt
Inf: 0.0193	Gen: 0.0974	Rec: 0.0962	E: 0.0209	R: 0.0195	P: 0.1764

mse: 1390056.1250
mae: 552.3404
rmse: 1179.0065
r2: -0.1298

mse: 1116670.1250
mae: 506.2162
rmse: 1056.7262
r2: -0.1399

[1/20][341/2459][Time 11.85]
Unified LR across all optimizers: 0.0009195135847524926
CCNet:  Three Gpt
Inf: 0.0199	Gen: 0.096

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[2/20][82/2459][Time 12.15]
Unified LR across all optimizers: 0.0008608658697088319
CCNet:  Three Gpt
Inf: 0.0094	Gen: 0.0707	Rec: 0.0681	E: 0.0123	R: 0.0069	P: 0.1313

mse: 432458.0625
mae: 253.5985
rmse: 657.6154
r2: 0.4723

mse: 577721.8750
mae: 335.0959
rmse: 760.0802
r2: 0.4446

[2/20][182/2459][Time 12.02]
Unified LR across all optimizers: 0.0008582908050676784
CCNet:  Three Gpt
Inf: 0.0092	Gen: 0.0690	Rec: 0.0663	E: 0.0123	R: 0.0065	P: 0.1280

mse: 611573.6250
mae: 300.5927
rmse: 782.0317
r2: 0.4422

mse: 750828.7500
mae: 406.8158
rmse: 866.5037
r2: 0.4004

[2/20][282/2459][Time 11.94]
Unified LR across all optimizers: 0.0008557234430874619
CCNet:  Three Gpt
Inf: 0.0090	Gen: 0.0694	Rec: 0.0667	E: 0.0119	R: 0.0064	P: 0.1288

mse: 561122.2500
mae: 310.7600
rmse: 749.0809
r2: 0.4678

mse: 872233.6875
mae: 459.1500
rmse: 933.9345
r2: 0.3914

[2/20][382/2459][Time 12.06]
Unified LR across all optimizers: 0.0008531637607276009
CCNet:  Three Gpt
Inf: 0.0093	Gen: 0.0690	Rec: 0.0660	E: 0

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[3/20][23/2459][Time 12.35]
Unified LR across all optimizers: 0.0008011443422687582
CCNet:  Three Gpt
Inf: 0.0051	Gen: 0.0494	Rec: 0.0486	E: 0.0060	R: 0.0044	P: 0.0945

mse: 443555.5625
mae: 288.8464
rmse: 665.9997
r2: 0.6801

mse: 201464.5000
mae: 205.4699
rmse: 448.8480
r2: 0.7749

[3/20][123/2459][Time 12.28]
Unified LR across all optimizers: 0.0007987479196193923
CCNet:  Three Gpt
Inf: 0.0050	Gen: 0.0497	Rec: 0.0490	E: 0.0058	R: 0.0043	P: 0.0949

mse: 320729.2500
mae: 223.9363
rmse: 566.3297
r2: 0.7170

mse: 414771.9688
mae: 305.3589
rmse: 644.0280
r2: 0.6822

[3/20][223/2459][Time 12.34]
Unified LR across all optimizers: 0.0007963586652681868
CCNet:  Three Gpt
Inf: 0.0049	Gen: 0.0494	Rec: 0.0488	E: 0.0056	R: 0.0043	P: 0.0950

mse: 290988.1875
mae: 225.3537
rmse: 539.4332
r2: 0.7315

mse: 209812.4531
mae: 189.9310
rmse: 458.0529
r2: 0.7738

[3/20][323/2459][Time 12.17]
Unified LR across all optimizers: 0.0007939765577729723
CCNet:  Three Gpt
Inf: 0.0047	Gen: 0.0490	Rec: 0.0484	E: 0

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[4/20][64/2459][Time 12.22]
Unified LR across all optimizers: 0.0007433357497590818
CCNet:  Three Gpt
Inf: 0.0045	Gen: 0.0444	Rec: 0.0438	E: 0.0053	R: 0.0039	P: 0.0850

mse: 169235.9844
mae: 167.9964
rmse: 411.3830
r2: 0.8447

mse: 162822.5312
mae: 177.4315
rmse: 403.5127
r2: 0.8556

[4/20][164/2459][Time 12.09]
Unified LR across all optimizers: 0.0007411122470357627
CCNet:  Three Gpt
Inf: 0.0045	Gen: 0.0438	Rec: 0.0431	E: 0.0053	R: 0.0038	P: 0.0840

mse: 182032.1406
mae: 169.2712
rmse: 426.6523
r2: 0.8314

mse: 174567.0938
mae: 194.5031
rmse: 417.8123
r2: 0.8648

[4/20][264/2459][Time 12.17]
Unified LR across all optimizers: 0.0007388953953639536
CCNet:  Three Gpt
Inf: 0.0045	Gen: 0.0443	Rec: 0.0437	E: 0.0053	R: 0.0039	P: 0.0847

mse: 134207.9062
mae: 166.8429
rmse: 366.3440
r2: 0.9014

mse: 107022.6719
mae: 147.0144
rmse: 327.1432
r2: 0.9122

[4/20][364/2459][Time 12.24]
Unified LR across all optimizers: 0.0007366851748486997
CCNet:  Three Gpt
Inf: 0.0045	Gen: 0.0438	Rec: 0.0432	E: 0

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[5/20][5/2459][Time 12.42]
Unified LR across all optimizers: 0.0006917677320939829
CCNet:  Three Gpt
Inf: 0.0039	Gen: 0.0400	Rec: 0.0394	E: 0.0048	R: 0.0034	P: 0.0767

mse: 75887.4375
mae: 116.7022
rmse: 275.4767
r2: 0.9200

mse: 177981.6250
mae: 178.6613
rmse: 421.8787
r2: 0.8725

[5/20][105/2459][Time 12.30]
Unified LR across all optimizers: 0.0006896984821800471
CCNet:  Three Gpt
Inf: 0.0039	Gen: 0.0398	Rec: 0.0391	E: 0.0046	R: 0.0033	P: 0.0762

mse: 126544.8516
mae: 151.4024
rmse: 355.7314
r2: 0.8605

mse: 159040.3594
mae: 145.2224
rmse: 398.7986
r2: 0.8295

[5/20][205/2459][Time 12.46]
Unified LR across all optimizers: 0.000687635421908974
CCNet:  Three Gpt
Inf: 0.0041	Gen: 0.0404	Rec: 0.0397	E: 0.0053	R: 0.0035	P: 0.0772

mse: 92205.0156
mae: 138.3775
rmse: 303.6528
r2: 0.9107

mse: 172841.0938
mae: 177.4893
rmse: 415.7416
r2: 0.8375

[5/20][305/2459][Time 12.40]
Unified LR across all optimizers: 0.0006855785327659985
CCNet:  Three Gpt
Inf: 0.0038	Gen: 0.0392	Rec: 0.0386	E: 0.004

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[6/20][46/2459][Time 12.21]
Unified LR across all optimizers: 0.0006418514850133148
CCNet:  Three Gpt
Inf: 0.0031	Gen: 0.0376	Rec: 0.0372	E: 0.0041	R: 0.0028	P: 0.0730

mse: 82484.6719
mae: 115.6245
rmse: 287.2014
r2: 0.9179

mse: 115796.7656
mae: 151.3258
rmse: 340.2892
r2: 0.9139

[6/20][146/2459][Time 12.25]
Unified LR across all optimizers: 0.0006399315470507513
CCNet:  Three Gpt
Inf: 0.0028	Gen: 0.0361	Rec: 0.0358	E: 0.0032	R: 0.0025	P: 0.0701

mse: 63194.0664
mae: 106.3186
rmse: 251.3843
r2: 0.9477

mse: 151203.9062
mae: 161.8332
rmse: 388.8495
r2: 0.8738

[6/20][246/2459][Time 12.24]
Unified LR across all optimizers: 0.0006380173521017453
CCNet:  Three Gpt
Inf: 0.0033	Gen: 0.0365	Rec: 0.0361	E: 0.0040	R: 0.0028	P: 0.0708

mse: 53629.8125
mae: 93.8208
rmse: 231.5811
r2: 0.9414

mse: 77843.9531
mae: 122.5627
rmse: 279.0053
r2: 0.9322

[6/20][346/2459][Time 12.35]
Unified LR across all optimizers: 0.0006361088829875093
CCNet:  Three Gpt
Inf: 0.0034	Gen: 0.0365	Rec: 0.0360	E: 0.0040

Iterations:   0%|          | 0/2459 [00:00<?, ?it/s]

[7/20][87/2459][Time 12.36]
Unified LR across all optimizers: 0.0005955370707545932
CCNet:  Three Gpt
Inf: 0.0027	Gen: 0.0342	Rec: 0.0338	E: 0.0032	R: 0.0024	P: 0.0664

mse: 97017.0234
mae: 124.1423
rmse: 311.4756
r2: 0.9101

mse: 204099.3438
mae: 156.5592
rmse: 451.7736
r2: 0.8187

[7/20][187/2459][Time 12.59]
Unified LR across all optimizers: 0.0005937556707626122
CCNet:  Three Gpt
Inf: 0.0029	Gen: 0.0341	Rec: 0.0336	E: 0.0038	R: 0.0024	P: 0.0658

mse: 84936.7734
mae: 122.3980
rmse: 291.4391
r2: 0.9276

mse: 87281.6016
mae: 126.1232
rmse: 295.4346
r2: 0.9217



KeyboardInterrupt: 

### **3-3. Test Causal Learning**

<blockquote>

This function `test` evaluates the performance of the causal_learning on a given dataset. 

It validates the performance, logs the test metrics using *Weights and Biases* (wandb) if `use_wandb` is True, and returns the *test metrics*.

</blockquote>

In [7]:
causal_learning.test(testset)

{'mse': 120771.546875,
 'mae': 138.86721801757812,
 'rmse': 347.5220031738281,
 'r2': 0.8896623849868774}