Authors : Jinsu Kim, JunHo Park

ⓒ 2022 CCNets, Inc. All Rights Reserved.

![](https://storage.googleapis.com/kaggle-datasets-images/312121/636393/a5097396fc07cf882d3e0d631b100a36/dataset-cover.jpg?t=2019-08-23-15-00-53)

***

<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 1. Content</i></h1>

<br>

> <h4 style = 'font-family: Times New Roman'>
The Dataset is fully dedicated for the developers who want to train the model on Weather Forecasting for Indian climate.<br><br> This dataset provides data from 1st January 2013 to 24th April 2017 in the city of Delhi, India. <br><br>The 4 parameters here are
meantemp, humidity, wind_speed, meanpressure.
    
  
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 2. About Dataset</i></h1>

<br> 
    
5 columns in the dataset, below is <u>description of features</u>:

*  (1) <b>meantemp</b>: Mean temperature averaged out from multiple 3 hour intervals in a day. 
    
*  (2) <b>humidity</b>: Humidity value for the day (units are grams of water vapor per cubic meter volume of air).
    
    
*  (3) <b>wind_speed</b>: Wind speed measured in kmph.
    
    
*  (4) <b>meanpressure</b>: Pressure reading of weather (measure in atm)
    
    
<h1 style = 'font-family: Times New Roman'> <b>|</b><i> 3. Goal of the Notebook</i></h1>
    
> <h4 style = 'font-family: Times New Roman'>
Goal is to train and test model using GPT, Pytorch. <br><br>
    The Target Column used here is <b>meantemp</b>
</h4>

 

https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data

***

<a id="1"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #42c2f5'>1.</b> Import Necessary Libraries </b></h1>

In [1]:
import os
import sys
path_append = "../"
sys.path.append(path_append)  # Go up one directory from where you are.

In [2]:
import pandas  as pd
train_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTrain.csv')
test_df = pd.read_csv(path_append + '../data/Daily Climate/DailyDelhiClimateTest.csv')
train_df.head()

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


<a id="2"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #4290f5'>2.</b> Modeling: Preprocess </b></h1>

In [3]:
print('Train set \n\n')
train_df.head()

Train set 




Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


In [4]:
from  tools.preprocessing.data_frame import auto_preprocess_dataframe

target_columns = ['meantemp']
df = pd.concat([train_df, test_df], axis=0)
len_train = len(train_df)
df, description = auto_preprocess_dataframe(df, target_columns)

train_df = df[:len_train]
test_df = df[len_train:]

description

Unnamed: 0,Min,Max,Mean,Std,Null Count
humidity,-0.555678,2.308781,1.0,0.561831,0
wind_speed,-1.144013,6.4461,0.096304,0.810917,0
meanpressure,-75.954763,500.583698,0.115394,13.151422,0
day_of_year_sin,-0.999991,0.999991,0.051243,0.709811,0
day_of_year_cos,-0.999963,1.0,0.034899,0.702109,0
meantemp,6.0,38.714286,25.221918,7.345014,0


{'num_features': 5,
 'num_classes': 1,
 'encoded_columns': Index(['cnets_processed_day_of_year_sin', 'cnets_processed_day_of_year_cos'], dtype='object'),
 'scalers': {'humidity': 'minmax',
  'meanpressure': 'robust',
  'wind_speed': 'robust'}}

In [5]:
import torch
from sklearn.model_selection import train_test_split
from tools.preprocessing.template_dataset import TemplateDataset

min_seq_len = 8
max_seq_len = 16
train_df, test_df = train_test_split(df, test_size=0.3, random_state=42, shuffle=False)
# Prepare the training data
train_df_x = train_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
train_df_y = train_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

# Prepare the testing data
test_df_x = test_df.iloc[:-1, :]  # Select all rows except the last one and all columns (present day's data) including the target column(meantemp)
test_df_y = test_df.iloc[1:, -1:]  # Select all rows starting from the second row and only the last column (next day's data) only for the target column(meantemp)

print('train df shape: ', train_df.shape)
print('test df shape: ', test_df.shape)
trainset = TemplateDataset(train_df_x, train_df_y, min_seq_len, max_seq_len)
testset = TemplateDataset(test_df_x, test_df_y, min_seq_len, max_seq_len)

train df shape:  (1103, 6)
test df shape:  (473, 6)


<a id="3"></a>
> <h1 style = 'font-family: Times New Roman'><b> <b style = 'color: #427bf5'>3.</b> Modeling</b></h1>

In [6]:
from tools.setting.data_config import DataConfig
from tools.setting.ml_params import MLParameters
from trainer_hub import TrainerHub

num_features = description["num_features"]
num_classes = description["num_classes"]

data_config = DataConfig(dataset_name = 'daily-delhi-climate', task_type='regression', obs_shape=[num_features + num_classes], label_size=num_classes)

#  Set training configuration from the AlgorithmConfig class, returning them as a Namespace object.
ml_params = MLParameters(ccnet_network = 'gpt', encoder_network = 'none')

ml_params.num_epoch = 1000

# Set the device to GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 

# Initialize the TrainerHub class with the training configuration, data configuration, device, and use_print and use_wandb flags
trainer_hub = TrainerHub(ml_params, data_config, device, use_print=True, use_wandb=False) 

In [7]:
trainer_hub.train(trainset, testset)

Epochs:   0%|          | 0/1000 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[6/1000][4/16][Time 8.42]
Unified LR across all optimizers: 0.0001993957766378747
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0209	Gen: 2.9045	Rec: 2.9028	E: 0.0223	R: 0.0171	P: 39.2928
--------------------Test Metrics------------------------
mse: 368.0903
mae: 17.9389
r2: -7.0621



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[12/1000][8/16][Time 7.82]
Unified LR across all optimizers: 0.00019879933411171295
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0002	Gen: 2.1260	Rec: 2.1260	E: 0.0000	R: 0.0000	P: 22.7940
--------------------Test Metrics------------------------
mse: 377.6312
mae: 18.1813
r2: -6.7423



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[18/1000][12/16][Time 7.73]
Unified LR across all optimizers: 0.00019820467569398644
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0006	Gen: 1.7741	Rec: 1.7742	E: 0.0000	R: 0.0000	P: 16.0498
--------------------Test Metrics------------------------
mse: 351.4143
mae: 17.1978
r2: -6.1870



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[25/1000][0/16][Time 7.78]
Unified LR across all optimizers: 0.00019761179604798148
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0007	Gen: 1.7314	Rec: 1.7313	E: 0.0016	R: 0.0002	P: 13.8822
--------------------Test Metrics------------------------
mse: 336.7461
mae: 17.3567
r2: -8.5801



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[31/1000][4/16][Time 7.81]
Unified LR across all optimizers: 0.0001970206898529479
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0007	Gen: 1.6221	Rec: 1.6221	E: 0.0000	R: 0.0000	P: 11.9782
--------------------Test Metrics------------------------
mse: 307.0878
mae: 16.0770
r2: -5.4333



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[37/1000][8/16][Time 7.61]
Unified LR across all optimizers: 0.00019643135180405117
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0011	Gen: 1.5397	Rec: 1.5399	E: 0.0000	R: 0.0000	P: 11.1031
--------------------Test Metrics------------------------
mse: 307.5473
mae: 15.9135
r2: -4.6548



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[43/1000][12/16][Time 7.67]
Unified LR across all optimizers: 0.00019584377661232514
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0013	Gen: 1.5276	Rec: 1.5280	E: 0.0000	R: 0.0000	P: 10.9440
--------------------Test Metrics------------------------
mse: 255.2017
mae: 14.3993
r2: -4.3469



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[50/1000][0/16][Time 7.86]
Unified LR across all optimizers: 0.00019525795900462422
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0015	Gen: 1.5294	Rec: 1.5299	E: 0.0000	R: 0.0000	P: 11.0687
--------------------Test Metrics------------------------
mse: 251.3260
mae: 14.1918
r2: -4.0221



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[56/1000][4/16][Time 7.60]
Unified LR across all optimizers: 0.00019467389372357586
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0018	Gen: 1.5213	Rec: 1.5219	E: 0.0000	R: 0.0000	P: 11.0796
--------------------Test Metrics------------------------
mse: 210.8228
mae: 12.7970
r2: -3.3734



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[62/1000][8/16][Time 7.63]
Unified LR across all optimizers: 0.00019409157552753375
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0026	Gen: 1.5275	Rec: 1.5285	E: 0.0000	R: 0.0000	P: 11.0896
--------------------Test Metrics------------------------
mse: 191.0482
mae: 12.1885
r2: -3.2469



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[68/1000][12/16][Time 8.07]
Unified LR across all optimizers: 0.00019351099919053054
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0043	Gen: 1.5392	Rec: 1.5410	E: 0.0000	R: 0.0001	P: 11.2138
--------------------Test Metrics------------------------
mse: 149.6744
mae: 10.3904
r2: -2.2521



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[75/1000][0/16][Time 8.20]
Unified LR across all optimizers: 0.00019293215950223126
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0102	Gen: 1.5160	Rec: 1.5220	E: 0.0000	R: 0.0005	P: 10.8477
--------------------Test Metrics------------------------
mse: 86.8401
mae: 7.9695
r2: -0.7410



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[81/1000][4/16][Time 8.09]
Unified LR across all optimizers: 0.00019235505126788632
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.1716	Gen: 1.4920	Rec: 1.5140	E: 0.2179	R: 0.1681	P: 10.6362
--------------------Test Metrics------------------------
mse: 38.0393
mae: 4.7678
r2: 0.1367



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[87/1000][8/16][Time 8.08]
Unified LR across all optimizers: 0.0001917796693082847
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.0648	Gen: 1.5278	Rec: 1.5564	E: 0.0097	R: 0.4787	P: 10.8340
--------------------Test Metrics------------------------
mse: 68.2063
mae: 6.4076
r2: -0.4274



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[93/1000][12/16][Time 8.18]
Unified LR across all optimizers: 0.00019120600845970806
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.1642	Gen: 1.3153	Rec: 1.3407	E: 0.1578	R: 0.1854	P: 7.8011
--------------------Test Metrics------------------------
mse: 47.4802
mae: 5.1092
r2: -0.0076



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

[100/1000][0/16][Time 8.22]
Unified LR across all optimizers: 0.0001906340635738838
--------------------Training Metrics--------------------
CCNet:  Three Gpt
Inf: 0.2636	Gen: 1.0066	Rec: 1.0748	E: 0.2456	R: 0.6145	P: 4.3615
--------------------Test Metrics------------------------
mse: 31.6750
mae: 3.6705
r2: 0.3671



Iterations:   0%|          | 0/16 [00:00<?, ?it/s]

Iterations:   0%|          | 0/16 [00:00<?, ?it/s]