## Table of contents
1. [Introduction](#introduction)

2. [Linear Regression](#linreg)

    2.1 [Preprocessing](#preprocessing)
    
    2.2 [Algorithm](#algorithm)
    
    2.3 [Predictions](#predictions)

3. [Tensorflow](#tensorflow)
    
4. [References](#references)

## Introduction
The goal of this project is to make 2 predictive models which predict wind turbine power from wind turbine speed. The two models are X and Y. 

The first model is a simple linear regression model. The next model is a neural network model.

## Linear regression
This project will use the the linear_model function from the sklearn package to perform the linear regerssion. 

This function takes the input values (x) and generates a predicted value (y) using the formula y = ax + b, where a and b are coefficients that are fit using Ordinary Least Squares.

Ordinary Least Squares works by choosing the coefficients that minimise the difference between the observed and predicted values of the dataset [1] https://scikit-learn.org/stable/modules/linear_model.html. The observed values are the actual target values of y and the predicted alues are the values generated by the formula y = ax + b.

### Preprocessing 
First the data is imported and preprocessed. The preprocessing consists in splitting up the data into feature data (x values) and labels (y values) and rehsaping the X values as the LinearRegression() from linear_model function does not take a 1D array for the X values.

In [17]:
# import data 
import pandas as pd
lin_data = pd.read_csv('powerproduction.csv')
lin_data.head()

# X and y values for regression
X = lin_data.iloc[:, 0].values
y = lin_data.iloc[:, 1].values

# The X values are reshaped as 
# they only contain one feature
X = X.reshape(-1, 1)

### Algorithm
The train_test_split function is defined usnig the model_selection module from the sklearn package. This is used to randomly split the data into training and testing data.
The LinearRegression function is defined and used to fit a regressor to the X_train and y_train datasets.

In [18]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)

-13.603433993820211
[4.89542079]


### Predictions
The predicted values are calculated using the regressor and the X_test testing data. Then various metrics (mean absolute error, mean squared error and root mean squared error) are calculated to test the efficacy of the model. In addition, the coefficient of variation (the root mean squared error RMSE as a percentage of the mean of the observed valeus) is calculated.

In [29]:
from sklearn import metrics
import numpy as np
y_pred = regressor.predict(X_test)
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('Mean of observed y values:', np.mean(y))
# coefficient of variation 
print('Coefficient of variation:', 100*np.sqrt(metrics.mean_squared_error(y_test, y_pred))/np.mean(y))

Mean Absolute Error: 15.371033053882327
Mean Squared Error: 496.3930965626669
Root Mean Squared Error: 22.279880981788637
Mean of observed y values: 48.01458399999999
Coefficient of variation: 46.40232014045699


The coefficient of variation is 46.4%. A good coefficient of variation is considered to be less than 25% [2]. https://www.kw-engineering.com/how-to-assess-a-regressions-predictive-power-energy-use/ so the linear regression model is not very accurate at predicting power.