Author:
        
        KIM, JeongYoong, jeongyoong@ccnets.org
        
    COPYRIGHT (c) 2024. CCNets. All Rights reserved.

<p align="center">
  <img src="https://storage.googleapis.com/kaggle-datasets-images/4956778/8344638/a2a6aa289fce8461958dc287f1dab799/dataset-cover.jpg?t=2024-05-07-09-36-53" alt="IMG">
</p>

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 1. Content</i></h1>

<br>

> <h4 style = 'font-family: Times New Roman'>
This dataset explores how weather conditions impact renewable energy generation. <br><br>Spanning from January 1, 2017, to August 31, 2022, the dataset provides climate data such as temperature, pressure, wind speed, and sunlight duration at 15-minute intervals. <br><br>By incorporating variables like GHI and SunlightTime, it enables the prediction of solar energy production.


- DataSource: https://www.kaggle.com/datasets/pythonafroz/renewable-power-generation-and-weather-conditions/data
  
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 2. About Dataset</i></h1>

<br> 

<details>
    <summary>More Columns Info</summary>
    17 columns in the dataset, below is <u>description of main features</u>:

    *  (1) Time: The timestamp of the recorded data in the format of YYYY-MM-DD HH:MM:SS.
        
    *  (2) Energy delta[Wh]: The difference in energy consumption in Watt-hours (Wh) from the previous timestamp to the current timestamp.
        
    *  (3) GHI: Global Horizontal Irradiance in Watts per square meter (W/m²) measured by a pyranometer.
        
    *  (4) temp: The temperature in degrees Celsius (°C) measured at the same height as the pyranometer.

    *  (5) pressure: The atmospheric pressure in hectopascals (hPa) measured at the same height as the pyranometer.

    *  (6) humidity: The relative humidity in percentage (%) measured at the same height as the pyranometer.

    *  (7) wind_speed: The wind speed in meters per second (m/s) measured at the same height as the pyranometer.

    *  (8) rain_1h: The amount of precipitation in millimeters (mm) measured over the past hour.
    
    *  (9) snow_1h: The amount of snowfall in millimeters.

    *  (10) clouds_all: The cloud situation.
</details>    
    
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 3. Goal of the Notebook</i></h1>
    
> <h4 style = 'font-family: Times New Roman'>
Goal is to train and test model using GPT, Pytorch. <br><br>
    The Target Column used here is <b>Energy delta[Wh]</b>
</h4>

 

***

<a id="1"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #42c2f5'>1.</b> Import Necessary Libraries </b></h1>

In [1]:
import os
import sys
import warnings
warnings.filterwarnings("ignore")

path_append = "../"
sys.path.append(path_append)  # Go up one directory from where you are.

In [2]:
import pandas  as pd
df = pd.read_csv(path_append + '../data/Renewable Power Generation and weather Conditions/Renewable.csv')
df.head()

Unnamed: 0,Time,Energy delta[Wh],GHI,temp,pressure,humidity,wind_speed,rain_1h,snow_1h,clouds_all,isSun,sunlightTime,dayLength,SunlightTime/daylength,weather_type,hour,month
0,2017-01-01 00:00:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
1,2017-01-01 00:15:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
2,2017-01-01 00:30:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
3,2017-01-01 00:45:00,0,0.0,1.6,1021,100,4.9,0.0,0.0,100,0,0,450,0.0,4,0,1
4,2017-01-01 01:00:00,0,0.0,1.7,1020,100,5.2,0.0,0.0,100,0,0,450,0.0,4,1,1


<a id="2"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #4290f5'>2.</b> Modeling: Preprocess </b></h1>

In [3]:
from tools.preprocessing.data_frame import auto_preprocess_dataframe
target_columns = ['Energy delta[Wh]']
df, description = auto_preprocess_dataframe(df, target_columns) 

description

Column 'isSun' has 2 unique values.
Column 'weather_type' has 5 unique values.


Unnamed: 0,Min,Max,Mean,Std,Null Count
GHI,-0.034188,4.863248,0.6623192,1.114787,0
temp,-8.938088,6.823473,-1.0,2.40497,0
pressure,-11.350029,7.57004,-1.0,2.590907,0
humidity,-2.818182,0.727273,-0.1904288,0.709294,0
wind_speed,-3.937746,10.362254,1.191893e-15,1.821694,0
rain_1h,-0.066035,8.023965,1.141051e-17,0.278913,0
snow_1h,-0.007148,2.812852,-1.306702e-18,0.06971,0
clouds_all,-1.801176,0.928938,-1.850958e-16,1.000003,0
sunlightTime,-0.076923,2.538462,0.4659515,0.702313,0
dayLength,-4.446457,2.131536,-1.0,2.248868,0


{'num_features': 21,
 'num_classes': 1,
 'encoded_columns': Index(['day_of_year_cos', 'day_of_year_sin', 'isSun', 'time_scaled',
        'weather_type'],
       dtype='object'),
 'scalers': {'GHI': 'robust',
  'SunlightTime/daylength': 'none',
  'clouds_all': 'standard',
  'dayLength': 'minmax',
  'hour': 'minmax',
  'humidity': 'robust',
  'pressure': 'minmax',
  'rain_1h': 'none',
  'snow_1h': 'none',
  'sunlightTime': 'robust',
  'temp': 'minmax',
  'wind_speed': 'none'}}

In [4]:
import torch
from sklearn.model_selection import train_test_split
from tools.preprocessing.template_dataset import TemplateDataset

max_seq_len = 32
min_seq_len = 16
train_df, test_df = train_test_split(df, test_size=0.2, shuffle=False)
# predict the next value in the sequence
train_df_x = train_df.iloc[:, :-1] # all columns except the last one
train_df_y = train_df.iloc[:, -1:] # only the last column

test_df_x = test_df.iloc[:, :-1] # all columns except the last one
test_df_y = test_df.iloc[:, -1:] # only the last column

print('train df shape: ', train_df.shape)
print('test df shape: ', test_df.shape)
trainset = TemplateDataset(train_df_x, train_df_y, min_seq_len, max_seq_len)
testset = TemplateDataset(test_df_x, test_df_y, max_seq_len, max_seq_len)

train df shape:  (157420, 22)
test df shape:  (39356, 22)


<a id="3"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #427bf5'>3.</b> Modeling</b></h1>

In [5]:
from tools.setting.data_config import DataConfig
from tools.setting.ml_params import MLParameters
from trainer_hub import TrainerHub

num_features = description['num_features']
num_classes = description['num_classes']
data_config = DataConfig(dataset_name = 'renewable-power-gen-prediction', task_type='regression', obs_shape=[num_features], label_size=num_classes)

#  Set training configuration from the AlgorithmConfig class, returning them as a Namespace object.
ml_params = MLParameters(ccnet_network = 'gpt', encoder_network = 'none')
ml_params.algorithm.error_function = 'mae'

# Set the device to GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 

# Initialize the TrainerHub class with the training configuration, data configuration, device, and use_print and use_wandb flags
trainer_hub = TrainerHub(ml_params, data_config, device, use_print=True, use_wandb=False) 

Trainer Name: causal_trainer


ModelParameters(ccnet_network=gpt, encoder_network=none

TrainingParameters(num_epoch=100, max_iters=100000, batch_size=64

OptimizationParameters(learning_rate=0.0002, decay_rate_100k=0.05, scheduler_type=exponential, max_grad_norm=1.0

AlgorithmParameters(enable_diffusion=False, reset_pretrained=False, error_function=mae)

DataConfig(dataset_name=renewable-power-gen-prediction, task_type=regression, obs_shape=[21], label_size=1, explain_size=10)




In [6]:
trainer_hub.train(trainset, testset)

[0/100][600/2459][Time 9.84]
Unified LR across all optimizers: 0.00019643135180405117
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.2966	Gen: 0.4115	Rec: 0.2778	E: 0.4541	R: 0.1650	P: 0.3931
--------------------Test Metrics------------------------
mse: 1329078.0000
mae: 554.3286
r2: -0.2869



In [None]:
trainer_hub.test(testset)