Authors : Jinsu Kim, JunHo Park

ⓒ 2022 CCNets, Inc. All Rights Reserved.

![](https://storage.googleapis.com/kaggle-datasets-images/312121/636393/a5097396fc07cf882d3e0d631b100a36/dataset-cover.jpg?t=2019-08-23-15-00-53)

***

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 1. Content</i></h1>

<br>

> <h4 style = 'font-family: Times New Roman'>
The Dataset is fully dedicated for the developers who want to train the model on Weather Forecasting for Indian climate.<br><br> This dataset provides data from 1st January 2013 to 24th April 2017 in the city of Delhi, India. <br><br>The 4 parameters here are
meantemp, humidity, wind_speed, meanpressure.
    
  
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 2. About Dataset</i></h1>

<br> 
    
5 columns in the dataset, below is <u>description of features</u>:

*  (1) <b>meantemp</b>: Mean temperature averaged out from multiple 3 hour intervals in a day. 
    
*  (2) <b>humidity</b>: Humidity value for the day (units are grams of water vapor per cubic meter volume of air).
    
    
*  (3) <b>wind_speed</b>: Wind speed measured in kmph.
    
    
*  (4) <b>meanpressure</b>: Pressure reading of weather (measure in atm)
    
    
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 3. Goal of the Notebook</i></h1>
    
> <h4 style = 'font-family: Times New Roman'>
Goal is to train and test model using GPT, Pytorch. <br><br>
    The Target Column used here is <b>meantemp</b>
</h4>

 

https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data

***

<a id="1"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #42c2f5'>1.</b> Import Necessary Libraries </b></h1>

In [1]:
import os
import sys
path_append = "../../"
sys.path.append(path_append)  # Go up one directory from where you are.

In [2]:
import pandas  as pd
train_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTrain.csv')
test_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTest.csv')
train_df.head()

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


<a id="2"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #4290f5'>2.</b> Modeling: Preprocess </b></h1>

In [3]:
print('Train set \n\n')
train_df.head()

Train set 




Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


In [4]:
from  tools.preprocessing.data_frame import auto_preprocess_dataframe

target_columns = ['meantemp']
df = pd.concat([train_df, test_df], axis=0)
len_train = len(train_df)
df, description = auto_preprocess_dataframe(df, target_columns)

train_df = df[:len_train]
test_df = df[len_train:]

description

Unnamed: 0,Min,Max,Mean,Std,Null Count,Scaled,Encoded
humidity,-7.86787,4.777889,-1.0,2.480321,0,Minmax,
wind_speed,-1.144013,6.4461,0.096304,0.810917,0,Robust,
meanpressure,-75.954763,500.583698,0.115394,13.151422,0,Robust,
day_of_year_sin,-0.999991,0.999991,0.051243,0.709811,0,,EncodedDateTime
day_of_year_cos,-0.999963,1.0,0.034899,0.702109,0,,EncodedDateTime
meantemp,6.0,38.714286,25.221918,7.345014,0,,


{'num_features': 5,
 'num_classes': 1,
 'encoded_columns': Index(['day_of_year_sin', 'day_of_year_cos'], dtype='object'),
 'one_hot_encoded_columns': Index([], dtype='object'),
 'encoded_datatime_columns': Index(['day_of_year_sin', 'day_of_year_cos'], dtype='object'),
 'scalers': {'humidity': 'minmax',
  'meanpressure': 'robust',
  'wind_speed': 'robust'}}

In [5]:
import torch
from sklearn.model_selection import train_test_split
from tools.preprocessing.template_dataset import TemplateDataset

min_seq_len = 8
max_seq_len = 16
train_df, test_df = train_test_split(df, test_size=0.3, random_state=42, shuffle=False)
# Prepare the training data
train_df_x = train_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
train_df_y = train_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

# Prepare the testing data
test_df_x = test_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
test_df_y = test_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

print('train df shape: ', train_df.shape)
print('test df shape: ', test_df.shape)
trainset = TemplateDataset(train_df_x, train_df_y, min_seq_len = min_seq_len, max_seq_len = max_seq_len)
testset = TemplateDataset(test_df_x, test_df_y, min_seq_len = max_seq_len, max_seq_len = max_seq_len)

train df shape:  (1103, 6)
test df shape:  (473, 6)


<a id="3"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #427bf5'>3.</b> Modeling</b></h1>

In [6]:
from tools.setting.data_config import DataConfig
from tools.setting.ml_params import MLParameters
from trainer_hub import TrainerHub

num_features = description["num_features"]
num_classes = description["num_classes"]

data_config = DataConfig(dataset_name = 'daily-delhi-climate', task_type='regression', obs_shape=[num_features + num_classes], label_size=num_classes)

#  Set training configuration from the AlgorithmConfig class, returning them as a Namespace object.
ml_params = MLParameters(ccnet_network = 'gpt', encoder_network = 'none')

ml_params.num_epoch = 1000

# Set the device to GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 

# Initialize the TrainerHub class with the training configuration, data configuration, device, and use_print and use_wandb flags
trainer_hub = TrainerHub(ml_params, data_config, device, use_print=True, use_wandb=False) 

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


Trainer Name: causal_trainer


[1mModelParameters Parameters:[0m


Unnamed: 0,ccnet_config,ccnet_network,encoder_config,encoder_network
0,See details below,gpt,,none


[3m
Detailed ccnet_config Configuration:[0m


Unnamed: 0,ccnet_config_model_name,ccnet_config_num_layers,ccnet_config_d_model,ccnet_config_dropout,ccnet_config_obs_shape,ccnet_config_condition_dim,ccnet_config_z_dim
0,gpt,5,256,0.05,[6],1,2


[1mTrainingParameters Parameters:[0m


Unnamed: 0,batch_size,max_iters,max_seq_len,min_seq_len,num_epoch
0,64,100000,,,1000


[1mOptimizationParameters Parameters:[0m


Unnamed: 0,clip_grad_range,decay_rate_100k,learning_rate,max_grad_norm,scheduler_type
0,,0.05,0.0002,1.0,exponential


[1mAlgorithmParameters Parameters:[0m


Unnamed: 0,enable_diffusion,error_function,reset_pretrained
0,False,mse,False


[1mDataConfig Parameters:[0m


Unnamed: 0,dataset_name,task_type,obs_shape,label_size,explain_size,explain_layer,state_size,show_image_indices
0,daily-delhi-climate,regression,[6],1,2,tanh,,








In [7]:
trainer_hub.train(trainset, testset)

Epochs:   0%|          | 0/1000 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[5/1000][15/17][Time 11.97]
Unified LR across all optimizers: 0.0001993957766378747
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0726	Gen: 2.8899	Rec: 2.8911	E: 0.0146	R: 0.0220	P: 37.9062
--------------------Test Metrics------------------------
mse: 164.8782
mae: 11.0884
r2: -2.5746



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[11/1000][13/17][Time 16.04]
Unified LR across all optimizers: 0.00019879933411171295
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.1695	Gen: 2.4071	Rec: 2.4359	E: 0.0666	R: 0.2441	P: 26.1391
--------------------Test Metrics------------------------
mse: 103.7774
mae: 8.9569
r2: -1.3469



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[17/1000][11/17][Time 12.03]
Unified LR across all optimizers: 0.00019820467569398644
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0932	Gen: 2.1307	Rec: 2.1401	E: 0.0400	R: 0.0503	P: 20.7418
--------------------Test Metrics------------------------
mse: 77.3103
mae: 7.6090
r2: -0.6199



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[23/1000][9/17][Time 11.71]
Unified LR across all optimizers: 0.00019761179604798148
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0827	Gen: 1.7466	Rec: 1.7507	E: 0.1457	R: 0.1318	P: 13.7832
--------------------Test Metrics------------------------
mse: 69.3782
mae: 7.1709
r2: -0.4945



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[29/1000][7/17][Time 12.32]
Unified LR across all optimizers: 0.0001970206898529479
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0353	Gen: 1.2500	Rec: 1.2533	E: 0.0024	R: 0.0036	P: 7.2865
--------------------Test Metrics------------------------
mse: 45.6473
mae: 5.4386
r2: 0.0838

