
## Time series weather forecasting[link text](https://)

importing the dependencies

In [None]:
import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout


The weather dataset
We are using [weather time series dataset] from Max Planck Institute for Biogeochemistry.

This dataset contains 14 different features such as air temperature, atmospheric pressure, and humidity. With frequency of 10 minutes, beginning in 2003. from François Chollet for his book Deep Learning with Python.

In [None]:

zip_path = tf.keras.utils.get_file( origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
    fname='jena_climate_2009_2016.csv.zip',
    extract=True)
csv_path, _ = os.path.splitext(zip_path)

In [None]:
csv_path

'/root/.keras/datasets/jena_climate_2009_2016.csv'

In [None]:
df = pd.read_csv(csv_path)

In [None]:
df.head()

Unnamed: 0,Date Time,p (mbar),T (degC),Tpot (K),Tdew (degC),rh (%),VPmax (mbar),VPact (mbar),VPdef (mbar),sh (g/kg),H2OC (mmol/mol),rho (g/m**3),wv (m/s),max. wv (m/s),wd (deg)
0,01.01.2009 00:10:00,996.52,-8.02,265.4,-8.9,93.3,3.33,3.11,0.22,1.94,3.12,1307.75,1.03,1.75,152.3
1,01.01.2009 00:20:00,996.57,-8.41,265.01,-9.28,93.4,3.23,3.02,0.21,1.89,3.03,1309.8,0.72,1.5,136.1
2,01.01.2009 00:30:00,996.53,-8.51,264.91,-9.31,93.9,3.21,3.01,0.2,1.88,3.02,1310.24,0.19,0.63,171.6
3,01.01.2009 00:40:00,996.51,-8.31,265.12,-9.07,94.2,3.26,3.07,0.19,1.92,3.08,1309.19,0.34,0.5,198.0
4,01.01.2009 00:50:00,996.51,-8.27,265.15,-9.04,94.1,3.27,3.08,0.19,1.92,3.09,1309.0,0.32,0.63,214.3


In both the following tutorials, the first 300,000 rows of the data will be the training dataset, and there remaining will be the validation dataset. This amounts to ~2100 days worth of training data.

In [None]:
TRAIN_SPLIT = 350000

Setting seed to ensure reproducibility.

In [None]:
tf.random.set_seed(13)

In [None]:
uni_data = df['T (degC)']
uni_data.index = df['Date Time']


In [None]:
uni_data_train = uni_data.values[:TRAIN_SPLIT]
uni_data_test = uni_data.values[TRAIN_SPLIT:]

In [None]:
print(uni_data_train[:5])

[-8.02 -8.41 -8.51 -8.31 -8.27]


In [None]:
uni_train_mean = uni_data_train.mean()
uni_train_std = uni_data_train.std()
uni_data_train = (uni_data_train-uni_train_mean)/uni_train_std

In [None]:
uni_test_mean = uni_data_test.mean()
uni_test_std = uni_data_test.std()
uni_data_test = (uni_data_test-uni_test_mean)/uni_test_std

In [None]:
uni_data_train[:5]

array([-2.02968527, -2.07521719, -2.08689204, -2.06354234, -2.0588724 ])

As you can see in the data, an observation is recorded every 10 minutes. This means that, for a single hour, you will have 6 observations. Similarly, a single day will contain 144 (6x24) observations.

Given a specific time, let's say you want to predict the temperature 6 hours in the future. In order to make this prediction, you choose to use 5 days of observations. Thus, you would create a window containing the last 720(5x144) observations to train the model. Many such configurations are possible, making this dataset a good one to experiment with.

The function below returns the above described windows of time for the model to train on. The parameter history_size is the size of the past window of information. The target_size is how far in the future does the model need to learn to predict. The target_size is the label that needs to be predicted.

In [None]:
x_train=[]
y_train=[]

i=720

while i< len(uni_data_train):
    x_train.append(uni_data_train[i-720:i])
    y_train.append(uni_data_train[i])
    i=i+720
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

In [None]:
x_test=[]
y_test=[]

i=720

while i< len(uni_data_test):
    x_test.append(uni_data_test[i-720:i])
    y_test.append(uni_data_test[i])
    i=i+720
x_test, y_test = np.array(x_test), np.array(y_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

In [None]:
print(x_train[:5,719])
print(y_train[:5])
print("\nshape x_train = ",x_train.shape)
print("shape y_train = ",y_train.shape)
print("\nshape x_test = ",x_test.shape)
print("shape y_test = ",y_test.shape)

[[-2.56789584]
 [-2.34840867]
 [-1.56385878]
 [-1.43893789]
 [-1.47629741]]
[-2.58540811 -2.34023627 -1.56502626 -1.45061274 -1.47629741]

shape x_train =  (486, 720, 1)
shape y_train =  (486,)

shape x_test =  (97, 720, 1)
shape y_test =  (97,)


Defing the model

In [None]:
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 720, 50)           10400     
                                                                 
 dropout (Dropout)           (None, 720, 50)           0         
                                                                 
 lstm_1 (LSTM)               (None, 720, 50)           20200     
                                                                 
 dropout_1 (Dropout)         (None, 720, 50)           0         
                                                                 
 lstm_2 (LSTM)               (None, 50)                20200     
                                                                 
 dropout_2 (Dropout)         (None, 50)                0         
                                                                 
 dense (Dense)               (None, 1)                 5

In [None]:
model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(x_train, y_train, epochs=50, batch_size=32)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7ba775b7f430>

In [None]:
scores = model.evaluate(x_test, y_test)
print(f'Test loss: {scores}')

Test loss: 0.0035949500743299723
