# Emerging Technologies Project

The following notebook trains a model that can be used to predict power output from wind speed values, as defined in the `powerproduction` dataset.

## Preamble

I'll begin by importing the necessary packages and reading in the dataset.

In [None]:
import tensorflow.keras as kr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Plot style
plt.style.use("ggplot")

# Plot size
plt.rcParams["figure.figsize"] = [14, 8]

In [None]:
# Read in the dataset
df = pd.read_csv("./powerproduction.csv")

# Print the first few rows
df.head(8)

Next I'll partition the dataset it into two subsets. The first will be used to fit the model and is referred to as the training dataset, while the second will instead be used to make predictions based on the training data. The objective of splitting the data is to provide a way to estimate the performance of the machine learning model when it is presented with new data, i.e. data that wasn't used to train the model [1].

The training set will make up 80% of the total items in the overall dataset.

In [None]:
# Create train and test sets
# Reference: TensorFlow documentation
# https://www.tensorflow.org/tutorials/keras/regression#split_the_data_into_train_and_test
train = df.sample(frac=0.8, random_state=0)
test = df.drop(train.index)

print("Training:")
print(train.head())
print("\nTesting:")
print(test.head())

In order to better visualise the training and testing data sets we can plot them both using Matplotlib.

In [None]:
plt.plot(
    train.speed,
    train.power,
    "o",
    label="training"
)

plt.plot(
    test.speed, 
    test.power,
    "o",
    label="testing"
)

plt.xlabel("Speed")
plt.ylabel("Power")

plt.legend();

## Creating a Linear Model

Now I'll construct a linear modal and attempt to use it in order to make predictions. This is done below using Keras' `Sequential` class, which represents a linear grouping of layers [2]. The model created below contains a single dense layer. A dense layer is a layer in neural network that’s fully connected, meaning all the neurons in one layer are connected to all other neurons in the next layer [2].

In [None]:
# Create a neural network with one neuron
model = kr.models.Sequential()

# Add a single dense layer
model.add(
    kr.layers.Dense(
        1,
        input_shape=(1,),
        activation="linear",
        kernel_initializer="ones",
        bias_initializer="zeros"
    )
)

# Compile the model
model.compile("adam", loss="mean_squared_error")

In [None]:
# Train the neural network on the training data
model.fit(
    train.speed,
    train.power,
    epochs=200,
    batch_size=10
)

### Analysis

Now that the training process is complete, the model can be used to make predictions. First I'll visualise the results by plotting the values from the `train` dataset alongside the predicted values of each point in the `test` dataset.

In [None]:
# Plot the training dataset
plt.plot(
    train.speed,
    train.power,
    "o",
    label="actual"
)

# Plot the predictions from the `test` dataset
plt.plot(
    test.speed, 
    model.predict(test.speed),
    label="predictions"
)

plt.xlabel("Speed")
plt.ylabel("Power")

plt.legend();

We can also predict individual values using Keras' `model.predict()` function. Below I find the predicted power output if the speed is 20, and then show where the result appears on the plot.

In [None]:
# Take a hand-picked speed value and view its predicted power
pred_speed = 20.0
pred_power = model.predict([pred_speed])[0]

pred_power

In [None]:
def plot_predictions():
    # Plot the predictions of the values we trained on
    plt.plot(
        train.speed,
        train.power,
        "o",
        label="actual"
    )

    plt.plot(
        test.speed, 
        model.predict(test.speed),
        label="predictions"
    )

    plt.xlabel("Speed")
    plt.ylabel("Power")

    # Show the prediction location on the plot
    # Ref: https://www.mathworks.com/matlabcentral/answers/430336-draw-lines-from-both-axis-to-point-in-plot
    plt.plot(pred_speed, pred_power, "ko")
    plt.plot([pred_speed, pred_speed], [0, pred_power], "k-")
    plt.plot([0, pred_speed], [pred_power, pred_power], "k-") 

    plt.legend();

In [None]:
plot_predictions()

As illustrated above, if we attempt to predict the power when the wind speed is 20, we get a value that is much lower than what we would expect. This is because the data in our dataset is not linear, and therefore linear regression is not the best form of regression to perform if we want to make accurate predictions.

## Non-Linear Regression

Linear regression assumes that the relationship between an independent variable $x$ and a dependent variable $y$ can be best expressed with a line [3]. However, because the `powerproduction` dataset is non-linear, it would make more sense to do polynomial regression. Polynomial regression expresses the relationship between two variables as a polynomial curve [3].

Below I'll rebuild the model, this time using the *sigmoid* activation function. An activation function is simply a mathematical function that takes in an input and produces an output [4]. The function is activated when the computed result reaches a specified threshold and the output is then passed onto the neurons on the subsequent layer [4]. The sigmoid activation function is "S" shaped and can add non-linearity to the output [4].

In [None]:
# Re-build the model
model = kr.models.Sequential()

model.add(
    kr.layers.Dense(
        64,
        input_shape=(1,),
        activation="sigmoid",
        kernel_initializer="glorot_uniform",
        bias_initializer="glorot_uniform"
    )
)

model.add(kr.layers.Dense(1, activation="linear"))

model.compile(kr.optimizers.Adam(lr=0.001), loss="mean_squared_error")

In [None]:
# Fit the data
model.fit(
    train.speed,
    train.power,
    epochs=300,
    batch_size=10
)

### Analysis

Once again, we can use this newly created model to make predictions. As done previously, I'll plot the values of the `train` dataset along with the predicted values of each point in the `test` dataset. Doing so, we'll find the predicted values appear to form a curved line.

In [None]:
plt.plot(
    train.speed,
    train.power,
    "o",
    label="actual"
)

plt.plot(
    test.speed,
    model.predict(test.speed),
    label="prediction"
)

plt.xlabel("Speed")
plt.ylabel("Power")

plt.legend();

Again, we can provide an input speed of 20 to Keras' `model.predict()` function and plot the result.

In [None]:
pred_power = model.predict([pred_speed])[0]

pred_power

In [None]:
plot_predictions()

We see that the result when using non-linear regression is much more accurate compared to the linear model tested previously.

## Saving the Model

We can use `model.save()` to easily save the model for later use in the web application.

In [None]:
model.save("power_prod.h5")

## References

1. [Train-Test Split for Evaluating Machine Learning Algorithms](https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/). Jason Brownlee. Machine Learning Mastery.
2. [Linear Regression using Keras and Python](https://heartbeat.fritz.ai/linear-regression-using-keras-and-python-7cee2819a60c). Dhiraj K. Heartbeat.
3. [Polynomial Regression using tf.keras](https://medium.com/@anigasan637/polynomial-regression-using-tf-keras-17eaac771256). Ananya Gangavarapu. Medium.
4. [Neural Network Activation Function Types](https://medium.com/fintechexplained/neural-network-activation-function-types-a85963035196). Farhad Malik. Medium.