# Weather AI

My own attempt to making an AI, this is the first itteration where we look if it is raining right now or not. Kinda useless but useful for me learn.

## Getting data

The training data has been collected from KNMI via https://www.daggegevens.knmi.nl/klimatologie/uurgegevens (thanks William for the link!). The raw data had some data missing in the rainfall column, the rows with missing values have been deleted.

In [None]:
import pandas as pd

In [None]:
raw_data = pd.read_csv('data/knmi.csv')

# Drop columns with missing values
raw_data.dropna(inplace=True)

# Replace rainfall column is right now a float (for no reason), lets change that to a int.
raw_data['rainfall'] = raw_data['rainfall'].astype(int);

### Looking data (can be skipped)

In [None]:
raw_data.head()

In [None]:
raw_data.describe().T

In [None]:
raw_data.info()

## Preping data

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
features = raw_data.drop(['rainfall', 'station', 'precipitation_duration', 'precipitation_amount', 'date'], axis=1)
target = raw_data['rainfall']

features.head()

In [None]:
x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.1, random_state=56)

x_train.shape, x_test.shape

## Creating the model

In [None]:
import tensorflow as tf

layers = tf.keras.layers

In [None]:
model = tf.keras.Sequential([
    layers.Input(shape=(4,), dtype=tf.int32),
    layers.Dense(16, activation='relu'),
    layers.Dense(8, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.summary()

In [None]:
model.fit(
    x_train, y_train,
    epochs=50,
    verbose=1,
    validation_data=(x_test, y_test)
)

In [None]:
model.evaluate(x_test, y_test)

In [None]:
model.save("model.keras")

## Test Results

### Base case
```
Training data: hour temperature pressure humidity
network:
  layers.Input(shape=(4,), dtype=tf.int32),
  layers.Dense(16, activation='relu'),
  layers.Dense(8, activation='relu'),
  layers.Dense(1, activation='sigmoid')
```

### First test
This was the basecase, which ended with an accuracy of ~79% (0.7860)

### Second test
Another layer of 16 was added, the network now looks like:
```
layers.Input(shape=(4,), dtype=tf.int32),
layers.Dense(16, activation='relu'),
layers.Dense(16, activation='relu'),
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid')
```
This test ended with the result of exactly the same

### Third test
Move the 8 between the 16s, and the result was more of the same. Looking at the training also made me conclude that adding more epoch is not going to have an effect.

### Fourth test
Removing time from the model, did had once again, no effect. For fun I also added the date to see if it had any effect, and this once again had no effect. I've decided for now to leave the time in but leave the date out. At this point I also went back to the base case. And I found that now from epoch 3 it was at the max of ~79%.

### Test 5
Trying out different lose functions

## Mass testing

In [None]:
# losses = [
#     'BinaryCrossentropy',
#     'CategoricalCrossentropy',
#     'SparseCategoricalCrossentropy',
#     'Poisson',
#     'KLDivergence',
#     'MeanSquaredError',
#     'MeanAbsoluteError',
#     'MeanAbsolutePercentageError',
#     'MeanSquaredLogarithmicError',
#     'CosineSimilarity',
#     'Huber',
#     'LogCosh',
#     'Hinge',
#     'SquaredHinge',
#     'CategoricalHinge'
# ]

# for thing in losses:
#     try:
#         model = tf.keras.Sequential([
#             layers.Input(shape=(4,), dtype=tf.int32),
#             layers.Dense(16, activation='relu'),
#             layers.Dense(8, activation='relu'),
#             layers.Dense(1, activation='sigmoid')
#         ])

#         model.compile(optimizer='adam', loss='thing', metrics=['accuracy'])
#         model.fit(
#             x_train, y_train,
#             epochs=50,
#             verbose=0,
#             validation_data=(x_test, y_test)
#         )

#         print(thing, "=", model.evaluate(x_test, y_test, verbose=0))
#     except:
#         print(thing, "did not work...")


In [None]:
# methods = [
#     'elu',
#     'exponential',
#     'gelu',
#     'hard_sigmoid',
#     'linear',
#     'mish',
#     'relu',
#     'selu',
#     'sigmoid',
#     'softmax',
#     'softplus',
#     'softsign',
#     'swish',
#     'tanh'
# ]

# for thing in methods:
#     try:
#         model = tf.keras.Sequential([
#             layers.Input(shape=(4,), dtype=tf.int32),
#             layers.Dense(16, activation=thing),
#             layers.Dense(8, activation=thing),
#             layers.Dense(1, activation='sigmoid')
#         ])

#         model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
#         model.fit(
#             x_train, y_train,
#             epochs=50,
#             verbose=0,
#             validation_data=(x_test, y_test)
#         )

#         print(thing, "=", model.evaluate(x_test, y_test, verbose=0))
#     except:
#         print(thing, "did not work...")
