# Machine Learing

## Introduction

The purpose of our analysis was to predict the amount of electricity produced by a windfarm, as a
function of the information gathered in a number of physical sensors (e.g. speed of
the wind, temperature, ...).


##Importing all the needed libraries.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
import numpy as np

## Download dataset

In [2]:
url1 = "./inputs.npy"
url2 = "./labels.npy"
inputs = np.load(url1)
labels = np.load(url2)
dataset = pd.DataFrame(inputs)

First we need to split our dataset into two separate dataset, train and validation.

We will use the train set to train our model, then use the validation one to test our model.

X and Y represente respectively the symptoms and the result, wich is having a heart disease or not.

In [3]:
array = dataset.values
X = array[:,0:]
y = array[:,-1]
X_train, X_validation, Y_train, Y_validation = train_test_split(inputs, labels[:,0], test_size=0.20, random_state=2, shuffle=True)

## Choice of algorithm

We compile together all the algorithm we will test to determine the one with the better accuracy.

In [4]:
models = []
models.append(('LR', LinearRegression()))

In [5]:
regr = LinearRegression()
regr.fit(X_train, Y_train)

In [6]:
regr.score(X_validation, Y_validation)

0.8357371863343

As we can see, the score of the model is already high, but not enough. We will then have to determine some hyperparameter.

##Hyperparameter

Before training it, we need to determine the hyperpameter.

We split again the dataset but we wil just remove the last data to verify the result.

In [7]:
data = dataset.values
X, y = data[:, :-1], data[:, -1]

We have to know what parameters can be tune, then choose the ones we want to test.

In [9]:
regr.get_params()

{'copy_X': True,
 'fit_intercept': True,
 'n_jobs': None,
 'normalize': 'deprecated',
 'positive': False}

Then we define the parameter that will be tested, which are 'copy_X', 'fit_intercept' and 'positive'.

In [10]:
grid = dict()
grid['copy_X'] = [True, False]
grid['fit_intercept'] = [True, False]
grid['positive'] = [True, False]

To do the search we will use a grid search, which is the exhaustive way to doing it

In [11]:
search = GridSearchCV(estimator=Ridge(), param_grid=grid, scoring='r2', verbose=1, n_jobs=-1)

Then we can see the result of the search

In [12]:
result = search.fit(inputs, labels[:,0])
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)

Fitting 5 folds for each of 8 candidates, totalling 40 fits
Best Score: 0.886182548352332
Best Hyperparameters: {'copy_X': True, 'fit_intercept': False, 'positive': True}


We can see that the best hyperparameter are True for 'copy_x' and 'positive' and False for 'fit_intercept'.

##Training

We will now train the chosen model with the hyperparameter and the training dataset.

In [16]:
model = LinearRegression(copy_X=True, fit_intercept=False, positive=True)
model.fit(X_train, Y_train)
training = model.predict(X_validation)

In [17]:
model.score(X_validation, Y_validation)

0.9161388831483339

##Conclusion

After training the model, it was able to determine with a precision of 92%, the ammount of electricity produce by windfarm.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=2b6819e1-3620-4a2d-954b-b839c543cab4' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>