<img width="800px" src="../fidle/img/00-Fidle-header-01.svg"></img>

# <!-- TITLE --> [SYNOP2] - Try a prediction
<!-- DESC --> Episode 1 : Data analysis and creation of a usable dataset
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->

## Objectives :
 - Undestand the data
 - cleanup a usable dataset


SYNOP meteorological data, available at: https://public.opendatasoft.com

## What we're going to do :

 - Read the data
 - Cleanup and build a usable dataset

## Step 1 - Import and init
### 1.1 - Python

In [65]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
import h5py, json
import os,time,sys

from importlib import reload

sys.path.append('..')
import fidle.pwk as ooo

ooo.init()

def np_print(*args):
    with np.printoptions(formatter={'float':'{:8.2f}'.format}, linewidth=np.inf):
        for a in args:
            print(a)    


FIDLE 2020 - Practical Work Module
Version              : 0.4.3
Run time             : Saturday 29 February 2020, 15:36:57
TensorFlow version   : 2.0.0
Keras version        : 2.2.4-tf


### 1.2 - Where are we ? 

In [13]:
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/SYNOP',
                                       'IDRIS'  : f'{os.getenv("WORK","")}/datasets/SYNOP',
                                       'HOME'   : f'{os.getenv("HOME","")}/datasets/SYNOP'} )

Well, we should be at HOME !
We are going to use: /home/pjluc/datasets/SYNOP


## Step 2 - Read and prepare dataset
### 2.1 - Read it

In [74]:
dataset_filename = 'synop-LYS.csv'
schema_filename  = 'synop.json'
train_len        = 25000

df = pd.read_csv(f'{dataset_dir}/{dataset_filename}', header=0, sep=';')
display(df.head(15))

x_train = df.loc[ :train_len-1, ['tend', 'cod_tend', 'dd', 'ff', 'td', 'u', 'ww', 'pres', 'rafper', 'rr1', 'rr3', 'tc'] ].to_numpy()
y_train = x_train

x_test  = df.loc[train_len:].to_numpy()
y_test  = df.loc[train_len:, ['pmer','td','u','rr3','tc'] ].to_numpy()

print('Dataset : ',df.shape)
print('x_train : ',x_train.shape)
print('y_train : ',y_train.shape)
print('x_test  : ',x_test.shape)
print('y_test  : ',y_test.shape)

Unnamed: 0,date,pmer,tend,cod_tend,dd,ff,td,u,ww,pres,rafper,rr1,rr3,tc
0,2010-01-01T01:00:00+01:00,99080.0,-120.0,6.0,0.0,0.0,278.75,88.0,60.0,96250.0,4.1,0.0,0.0,7.5
1,2010-01-01T04:00:00+01:00,98940.0,-150.0,6.0,60.0,1.0,278.65,93.0,61.0,96100.0,2.6,0.2,0.6,6.6
2,2010-01-01T07:00:00+01:00,98950.0,10.0,3.0,280.0,2.1,278.85,95.0,58.0,96110.0,2.6,0.0,0.4,6.4
3,2010-01-01T10:00:00+01:00,99180.0,230.0,3.0,310.0,2.6,279.15,96.0,50.0,96340.0,5.7,0.0,3.0,6.6
4,2010-01-01T13:00:00+01:00,99480.0,280.0,1.0,330.0,4.6,278.15,94.0,21.0,96620.0,8.7,0.4,0.8,5.9
5,2010-01-01T16:00:00+01:00,99980.0,480.0,3.0,350.0,5.1,276.95,91.0,60.0,97100.0,8.2,0.2,0.4,5.2
6,2010-01-01T19:00:00+01:00,100550.0,530.0,2.0,350.0,3.1,274.05,83.0,21.0,97630.0,7.2,0.0,0.0,3.5
7,2010-01-01T22:00:00+01:00,101030.0,450.0,2.0,340.0,6.2,272.15,81.0,2.0,98080.0,9.3,0.0,0.0,1.9
8,2010-01-02T01:00:00+01:00,101330.0,280.0,1.0,320.0,6.2,270.15,74.0,2.0,98360.0,10.3,0.0,0.0,1.1
9,2010-01-02T04:00:00+01:00,101560.0,220.0,1.0,290.0,2.6,269.65,72.0,2.0,98580.0,5.1,0.0,0.0,1.0


Dataset :  (29165, 14)
x_train :  (25000, 12)
y_train :  (25000, 5)
x_test  :  (4165, 14)
y_test  :  (4165, 5)


### 2.2 - Prepare data generator

In [76]:
sequence_len = 10
batch_size   = 32
n_features   = x_train.shape[1]

train_generator = TimeseriesGenerator(x_train, y_train, length=sequence_len, batch_size=batch_size)

# ---- About

print(f'Nombre de batchs disponibles : ', len(train_generator))
x,y=train_generator[0]
print('batch x shape : ',x.shape)
print('batch y shape : ',y.shape)

Nombre de batchs disponibles :  781
batch x shape :  (32, 10, 12)
batch y shape :  (32, 5)


## Step 3 - Create a model

In [83]:
model = keras.models.Sequential()
model.add(keras.layers.LSTM(100, activation='relu', input_shape=(sequence_len, n_features)))
model.add(keras.layers.Dense(5))
model.compile(optimizer='adam', loss='mse')

model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_5 (LSTM)                (None, 100)               45200     
_________________________________________________________________
dense_5 (Dense)              (None, 5)                 505       
Total params: 45,705
Trainable params: 45,705
Non-trainable params: 0
_________________________________________________________________


# Step 4 - Compile and run

In [84]:
model.compile(optimizer='adam', loss='mse')

In [85]:
model.fit_generator(train_generator, epochs=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fcadb275710>

---
<img width="80px" src="../fidle/img/00-Fidle-logo-01.svg"></img>