Authors : Jinsu Kim, JunHo Park

ⓒ 2022 CCNets, Inc. All Rights Reserved.

![](https://storage.googleapis.com/kaggle-datasets-images/312121/636393/a5097396fc07cf882d3e0d631b100a36/dataset-cover.jpg?t=2019-08-23-15-00-53)

***

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 1. Content</i></h1>

<br>

> <h4 style = 'font-family: Times New Roman'>
The Dataset is fully dedicated for the developers who want to train the model on Weather Forecasting for Indian climate.<br><br> This dataset provides data from 1st January 2013 to 24th April 2017 in the city of Delhi, India. <br><br>The 4 parameters here are
meantemp, humidity, wind_speed, meanpressure.
    
  
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 2. About Dataset</i></h1>

<br> 
    
5 columns in the dataset, below is <u>description of features</u>:

*  (1) <b>meantemp</b>: Mean temperature averaged out from multiple 3 hour intervals in a day. 
    
*  (2) <b>humidity</b>: Humidity value for the day (units are grams of water vapor per cubic meter volume of air).
    
    
*  (3) <b>wind_speed</b>: Wind speed measured in kmph.
    
    
*  (4) <b>meanpressure</b>: Pressure reading of weather (measure in atm)
    
    
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 3. Goal of the Notebook</i></h1>
    
> <h4 style = 'font-family: Times New Roman'>
Goal is to train and test model using GPT, Pytorch. <br><br>
    The Target Column used here is <b>meantemp</b>
</h4>

 

https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data

***

<a id="1"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #42c2f5'>1.</b> Import Necessary Libraries </b></h1>

In [1]:
import os
import sys
path_append = "../../"
sys.path.append(path_append)  # Go up one directory from where you are.

In [2]:
import pandas  as pd
train_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTrain.csv')
test_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTest.csv')
train_df.head()

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


<a id="2"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #4290f5'>2.</b> Modeling: Preprocess </b></h1>

In [3]:
print('Train set \n\n')
train_df.head()

Train set 




Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


In [4]:
from  tools.preprocessing.data_frame import auto_preprocess_dataframe

target_columns = ['meantemp']
df = pd.concat([train_df, test_df], axis=0)
len_train = len(train_df)
df, description = auto_preprocess_dataframe(df, target_columns)

train_df = df[:len_train]
test_df = df[len_train:]

description

Unnamed: 0,Min,Max,Mean,Std,Null Count,Scaled,Encoded
humidity,-3.433935,2.888945,-4.328179e-16,1.24016,0,Minmax,
wind_speed,-1.144013,6.4461,0.09630388,0.810917,0,Robust,
meanpressure,-75.954763,500.583698,0.1153938,13.151422,0,Robust,
day_of_year_sin,-0.999991,0.999991,0.05124306,0.709811,0,,EncodedDateTime
day_of_year_cos,-0.999963,1.0,0.03489912,0.702109,0,,EncodedDateTime
meantemp,6.0,38.714286,25.22192,7.345014,0,,


{'num_features': 5,
 'num_classes': 1,
 'encoded_columns': Index(['day_of_year_sin', 'day_of_year_cos'], dtype='object'),
 'one_hot_encoded_columns': Index([], dtype='object'),
 'encoded_datatime_columns': Index(['day_of_year_sin', 'day_of_year_cos'], dtype='object'),
 'scalers': {'humidity': 'minmax',
  'meanpressure': 'robust',
  'wind_speed': 'robust'}}

In [5]:
import torch
from sklearn.model_selection import train_test_split
from tools.preprocessing.dataset import TemplateDataset

min_seq_len = 8
max_seq_len = 16
train_df, test_df = train_test_split(df, test_size=0.3, random_state=42, shuffle=False)
# Prepare the training data
train_df_x = train_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
train_df_y = train_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

# Prepare the testing data
test_df_x = test_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
test_df_y = test_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

print('train df shape: ', train_df.shape)
print('test df shape: ', test_df.shape)
trainset = TemplateDataset(train_df_x, train_df_y, min_seq_len = min_seq_len, max_seq_len = max_seq_len)
testset = TemplateDataset(test_df_x, test_df_y, min_seq_len = max_seq_len, max_seq_len = max_seq_len)

train df shape:  (1103, 6)
test df shape:  (473, 6)


<a id="3"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #427bf5'>3.</b> Modeling</b></h1>

In [6]:
from tools.setting.data_config import DataConfig
from tools.setting.ml_params import MLParameters
from trainer_hub import TrainerHub

num_features = description["num_features"]
num_classes = description["num_classes"]

data_config = DataConfig(dataset_name = 'daily-delhi-climate', task_type='regression', obs_shape=[num_features + num_classes], label_size=num_classes)

#  Set training configuration from the AlgorithmConfig class, returning them as a Namespace object.
ml_params = MLParameters(model_name = 'gpt')

# Set the device to GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 

# Initialize the TrainerHub class with the training configuration, data configuration, device, and use_print and use_wandb flags
trainer_hub = TrainerHub(ml_params, data_config, device, use_print=True, use_wandb=False) 

Trainer Name: causal_trainer


[1mModelParameters Parameters:[0m


Unnamed: 0,ccnet_config,ccnet_network
0,See details below,gpt


[3m
Detailed ccnet_config Configuration:[0m


Unnamed: 0,ccnet_config_num_layers,ccnet_config_d_model,ccnet_config_dropout,ccnet_config_obs_shape,ccnet_config_reset_pretrained,ccnet_config_network_name,ccnet_config_device,ccnet_config_model_name
0,5,256,0.05,[6],False,gpt,,gpt


[1mTrainingParameters Parameters:[0m


Unnamed: 0,batch_size,max_iters,max_seq_len,min_seq_len,num_epoch
0,64,25600,,,400


[1mOptimizationParameters Parameters:[0m


Unnamed: 0,clip_grad_range,decay_rate_100k,learning_rate,max_grad_norm,scheduler_type
0,,0.05,0.001,1.0,exponential


[1mAlgorithmParameters Parameters:[0m


Unnamed: 0,error_function,reset_pretrained
0,mse,False


[1mDataConfig Parameters:[0m


Unnamed: 0,dataset_name,task_type,obs_shape,label_size,explain_size,show_image_indices
0,daily-delhi-climate,regression,[6],1,2,








In [7]:
trainer_hub.train(trainset, testset)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mjunhopark[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Adding directory to artifact (.\..\saved\daily-delhi-climate\causal-learning)... Done. 0.0s


Epochs:   0%|          | 0/400 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[5/400][15/17][Time 7.54]
Unified LR across all optimizers: 0.000996978883189373
CCNet:  Three Gpt
Inf: 0.0337	Gen: 2.0687	Rec: 2.0768	E: 0.0079	R: 0.0085	P: 20.5786

mse: 63.7157
mae: 7.1813
rmse: 7.9822
r2: -0.1358

mse: 65.7471
mae: 7.0582
rmse: 8.1085
r2: -0.4254



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[11/400][13/17][Time 7.26]
Unified LR across all optimizers: 0.0009939966705585644
CCNet:  Three Gpt
Inf: 0.3669	Gen: 1.0359	Rec: 1.1754	E: 0.1849	R: 1.2708	P: 4.5720

mse: 5.6623
mae: 1.8017
rmse: 2.3796
r2: 0.9061

mse: 20.1957
mae: 3.0302
rmse: 4.4940
r2: 0.5433



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[17/400][11/17][Time 7.37]
Unified LR across all optimizers: 0.0009910233784699313
CCNet:  Three Gpt
Inf: 0.2480	Gen: 0.6975	Rec: 0.6958	E: 0.1708	R: 0.1549	P: 1.6267

mse: 4.6091
mae: 1.7670
rmse: 2.1469
r2: 0.9159

mse: 9.4194
mae: 2.2941
rmse: 3.0691
r2: 0.8026



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[23/400][9/17][Time 7.20]
Unified LR across all optimizers: 0.0009880589802399075
CCNet:  Three Gpt
Inf: 0.2593	Gen: 0.6890	Rec: 0.6425	E: 0.2235	R: 0.1100	P: 1.3888

mse: 2.4500
mae: 1.2333
rmse: 1.5652
r2: 0.9583

mse: 6.8945
mae: 1.9404
rmse: 2.6257
r2: 0.8515



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[29/400][7/17][Time 7.21]
Unified LR across all optimizers: 0.000985103449264739
CCNet:  Three Gpt
Inf: 0.2520	Gen: 0.6554	Rec: 0.5794	E: 0.2482	R: 0.0772	P: 1.1676

mse: 2.3561
mae: 1.2062
rmse: 1.5349
r2: 0.9509

mse: 7.0229
mae: 1.6128
rmse: 2.6501
r2: 0.8590



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[35/400][5/17][Time 7.25]
Unified LR across all optimizers: 0.0009821567590202554
CCNet:  Three Gpt
Inf: 0.2590	Gen: 0.6438	Rec: 0.5547	E: 0.2692	R: 0.0721	P: 1.0680

mse: 3.1679
mae: 1.3530
rmse: 1.7799
r2: 0.9332

mse: 3.1606
mae: 1.3925
rmse: 1.7778
r2: 0.9230



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[41/400][3/17][Time 7.59]
Unified LR across all optimizers: 0.000979218883061626
CCNet:  Three Gpt
Inf: 0.2510	Gen: 0.6229	Rec: 0.5345	E: 0.2548	R: 0.0636	P: 0.9841

mse: 2.9456
mae: 1.2827
rmse: 1.7163
r2: 0.9486

mse: 2.7749
mae: 1.2598
rmse: 1.6658
r2: 0.9362



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[47/400][1/17][Time 7.28]
Unified LR across all optimizers: 0.0009762897950231208
CCNet:  Three Gpt
Inf: 0.2570	Gen: 0.6111	Rec: 0.5102	E: 0.2897	R: 0.0553	P: 0.9046

mse: 3.0573
mae: 1.3919
rmse: 1.7485
r2: 0.9408

mse: 3.5034
mae: 1.4788
rmse: 1.8717
r2: 0.9296



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[52/400][16/17][Time 7.26]
Unified LR across all optimizers: 0.0009733694686178784
CCNet:  Three Gpt
Inf: 0.2705	Gen: 0.6224	Rec: 0.5110	E: 0.3202	R: 0.0563	P: 0.9032

mse: 3.8027
mae: 1.5176
rmse: 1.9500
r2: 0.9241

mse: 3.4577
mae: 1.4266
rmse: 1.8595
r2: 0.9274



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[58/400][14/17][Time 6.88]
Unified LR across all optimizers: 0.0009704578776376673
CCNet:  Three Gpt
Inf: 0.2717	Gen: 0.6140	Rec: 0.5016	E: 0.3175	R: 0.0558	P: 0.8672

mse: 2.8419
mae: 1.2508
rmse: 1.6858
r2: 0.9503

mse: 3.3043
mae: 1.4053
rmse: 1.8178
r2: 0.9307



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[64/400][12/17][Time 6.98]
Unified LR across all optimizers: 0.0009675549959526509
CCNet:  Three Gpt
Inf: 0.2630	Gen: 0.6026	Rec: 0.4935	E: 0.3084	R: 0.0517	P: 0.8411

mse: 2.8267
mae: 1.2307
rmse: 1.6813
r2: 0.9484

mse: 2.6673
mae: 1.2417
rmse: 1.6332
r2: 0.9389



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[70/400][10/17][Time 6.92]
Unified LR across all optimizers: 0.0009646607975111544
CCNet:  Three Gpt
Inf: 0.2829	Gen: 0.6111	Rec: 0.4826	E: 0.3559	R: 0.0505	P: 0.8003

mse: 4.5573
mae: 1.6167
rmse: 2.1348
r2: 0.9086

mse: 3.6081
mae: 1.4417
rmse: 1.8995
r2: 0.9261



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[76/400][8/17][Time 7.32]
Unified LR across all optimizers: 0.0009617752563394297
CCNet:  Three Gpt
Inf: 0.2854	Gen: 0.6067	Rec: 0.4788	E: 0.3652	R: 0.0535	P: 0.7793

mse: 2.7199
mae: 1.2842
rmse: 1.6492
r2: 0.9497

mse: 3.8355
mae: 1.4568
rmse: 1.9584
r2: 0.9134



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

[82/400][6/17][Time 7.08]
Unified LR across all optimizers: 0.0009588983465414223
CCNet:  Three Gpt
Inf: 0.2808	Gen: 0.6020	Rec: 0.4654	E: 0.3643	R: 0.0425	P: 0.7653

mse: 2.6511
mae: 1.1915
rmse: 1.6282
r2: 0.9506

mse: 2.4197
mae: 1.1837
rmse: 1.5555
r2: 0.9449



Iterations:   0%|          | 0/17 [00:00<?, ?it/s]

Iterations:   0%|          | 0/17 [00:00<?, ?it/s]