<p align="center">
    <img src="https://github.com/FRI-Energy-Analytics/energyanalytics/blob/main/EA_logo.jpg?raw=true" width="240" height="240" />
</p>

# Perceptrons

## Freshman Research Initiative Energy Analytics CS 309


We are going to use the same packages and data organization for the rest of the semester

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
import seaborn as sns; sns.set()
%matplotlib inline

In [2]:
data = pd.read_csv(r'well_data.csv') #read it in
data.tail()

Unnamed: 0,DEPT,AHT10,AHT20,AHT30,AHT60,AHT90,AHTCO60,AHTCO90,DPHZ,DSOZ,...,ITT,NPOR,PEFZ,RSOZ,RXOZ,SDEV,SP,SPHI,RHOZ,TOP
5466,1914.0,1.6167,3.0335,7.5475,8.5244,9.1691,117.3103,109.0619,-0.4129,0.5048,...,0.2649,0.4847,10.0,0.0348,2.9599,1.0538,1.625,0.7009,3.3313,MATANUSKA
5467,1913.5,1.6164,3.0324,7.5492,8.5195,9.183,117.3782,108.8963,-0.6763,0.3208,...,0.265,0.476,10.0,0.0,1.7452,1.077,10.9375,0.6161,3.7659,MATANUSKA
5468,1913.0,1.6163,3.0317,7.5488,8.5243,9.1852,117.3116,108.8711,-0.9772,0.2371,...,0.2651,0.4754,10.0,0.0,0.3407,1.0509,43.8125,0.5991,4.2624,MATANUSKA
5469,1912.5,1.6162,3.0311,7.5493,8.5248,9.1936,117.3051,108.7711,-1.1748,0.212,...,0.2652,0.4853,10.0,0.0,0.2168,0.8236,79.5,0.6521,4.5884,MATANUSKA
5470,1912.0,1.6161,3.0305,7.5496,8.5289,9.1974,117.2483,108.7263,-1.1654,0.208,...,0.2652,0.4471,9.9845,0.0,0.1797,0.7958,108.5,0.6699,4.5729,MATANUSKA


We want to balance our classes so that the classifier doesn't learn to majority vote for the predictions

In [3]:
groups = data.groupby('TOP')
balanced = groups.apply(lambda x: x.sample(groups.size().min()).reset_index(drop=True))
balanced = balanced.reset_index(level=1, drop=True)
data = balanced

Remember we want to label encode our formation tops that we want to predict

In [4]:
from sklearn import preprocessing #for label encoding
#label encode our formation data
le = preprocessing.LabelEncoder()
top_names = data.TOP
le.fit(data.TOP)
tops = le.transform(data.TOP)
tops[tops == 0] = -1 #remember perceptrons are a -1 or +1 classification scheme

In [5]:
data.drop('TOP', inplace=True, axis=1)

And we also need to split our data into train and test subsets

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, tops, test_size=0.2, random_state=86)

Now let's define the prediction part of our perceptron

In [7]:
def predict(X, weights):
    n_samples = X.shape[0]
    # Add column of 1 on the feature tensor for the bias term
    X = np.concatenate([X, np.ones((n_samples, 1))], axis=1)
    y = np.matmul(X, weights)
    y = np.vectorize(lambda val: 1 if val > 0 else -1)(y)
    return y

And let's also define a function that we will use to fit the perceptron to our data. the number of `epochs` is the number of times we want to run through our training dataset. One time through the entire dataset is one epoch.

In [8]:
def fit(features, target, epochs, learning_rate):
    n_samples, n_features = features.shape
    lr = learning_rate
    weights = np.zeros((n_features+1))
    # this adds the bias term as an input vector on the feature tensor
    features = np.concatenate([features, np.ones((n_samples, 1))], axis=1)
    for e in range(epochs):
        for j in range(n_samples):
            # add your code in so that the function loops though every sample
            # calculates the error, and then updates the weights
            # error = y - y_hat
            # weights = previous weights + learning rate * error * features
            error = target[j]-np.dot(weights, features[j,:])
            if error !=0:
                weights+=lr*error*features[j,:]
        if e % (epochs / 10) == 0:
            y_hat = predict(X_test, weights)
            print(f"Epoch: {e:^5}{'=====':^5} Accuracy: {accuracy_score(y_test, y_hat):.2f}")
            print("__________")
    print("Finished training!")
    return weights

Lastly we need to declare how many epochs to train for

In [9]:
epochs = 100
weight = fit(X_train, y_train, epochs, 0.000000001) #0.00000001
y_hat = predict(X_test, weight)

Epoch:   0  ===== Accuracy: 0.49
__________
Epoch:  10  ===== Accuracy: 0.55
__________
Epoch:  20  ===== Accuracy: 0.75
__________
Epoch:  30  ===== Accuracy: 0.87
__________
Epoch:  40  ===== Accuracy: 0.91
__________
Epoch:  50  ===== Accuracy: 0.93
__________
Epoch:  60  ===== Accuracy: 0.94
__________
Epoch:  70  ===== Accuracy: 0.96
__________
Epoch:  80  ===== Accuracy: 0.96
__________
Epoch:  90  ===== Accuracy: 0.96
__________
Finished training!


Finally, let's check and see how close our perceptron is in accuracy

In [11]:
print(f'The perceptrons accuracy is {round(accuracy_score(y_test, y_hat),2)}')

The perceptrons accuracy is 0.96


In [12]:
from sklearn.metrics import confusion_matrix

In [13]:
confusion_matrix(y_test, y_hat)


array([[142,  11],
       [  0, 145]], dtype=int64)

In [14]:
from sklearn.linear_model import Perceptron
perc = Perceptron()
perc.fit(X_train, y_train)
perc.score(X_test, y_test)


0.9664429530201343

In [15]:
confusion_matrix(y_test, perc.predict(X_test))


array([[145,   8],
       [  2, 143]], dtype=int64)