In [1]:
from matplotlib import pyplot as plt
import turicreate as tc
import pandas as pd
import numpy as np

## False positive or false negative: assuming that we want the zeros, and they are working, here we want to minimize the false positive, predicted as 1, but 0, because they are not working, but seen as working. In this case we need to mantain precision, so beta stays near 0 (we chose 0.07 for the Fb Score). 

In [2]:
# Naming data variable and getting the data from CSV file in a TuriCreate SFrame data structure
data = tc.SFrame('0382949_data.csv')
# Printing dataset
data

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,float,float,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


Condition,Voltage,Current,Temperature
1,102.4441126133345,489.164512034988,60.49123260389594
1,103.47024648368676,481.6396167938656,35.32103669668187
1,103.4647267620901,486.8053511626812,31.452368871757493
1,103.70083848416236,482.2591346937867,48.50095374502988
1,103.29952272898116,486.6640754212922,61.61425058460568
1,101.98279161829043,488.6468046742295,47.8043705344421
1,103.1069981966328,482.3743872525397,34.192342307064635
1,103.29212407453647,486.0067979741111,58.244213361641656
0,100.0699377253576,480.97700676303214,57.42543634764111
1,102.55515084540993,484.4324437254198,41.60329776520041


**The below line shows how the original dataset is parted into two parts:**

_1. Train_Data contains approximately 80% of the data_


_Rest_data contains the remaining 20% of the data_

_2. The Test_Data holds 50% of the remaining data (10% the original data)_


_Validation_Data also contains the other 50% of the remaining data (10% of the original data)_


_Meanwhile, the random split is done with the same seed value (0) for consistency._

In [3]:
train_data, rest_data = data.random_split(0.8, seed=0)
test_data, validation_data = rest_data.random_split(0.5, seed=0)

### By default feature rescaling is set to True , so the coefficients are rescaled (normalized) given, avoiding features with large values to influence the model.

On this line our code initialize to predict the condition based on features in the dataset and the validation set helps assess its performance.

It has three components. Thus:

data, from the csv file

target, setting 'Condition' column as label, and the other columns become features

validation_set, we splitted so there is no risk that training and validation datasets interferes on test dataset

The model also estimates class probabilities using a logistic function (sigmoid)


In [4]:
perceptron = tc.logistic_classifier.create(data, target='Condition', validation_set=validation_data)

For the second perceptron, we tried to test some hyperparameters values, specifying L1 and L2 penalties and disabling feature rescaling.

In [5]:
perceptron_hyper = tc.logistic_classifier.create(data, target='Condition', validation_set=validation_data,
                                                 feature_rescaling=False, l1_penalty=0.01, l2_penalty=1,
                                                )

**We use this lines of code to assess how well the Perceptron model performs on both the training and validation datasets by calculating their respective accuracies.**
We set two variables, train and validation accuracy, then got the perceptron model and used evaluate function from TuriCreate. Then we selected from the function to get the accuracy from the train data and validation data.

In [6]:
train_accuracy = perceptron.evaluate(train_data)['accuracy']
validation_accuracy = perceptron.evaluate(validation_data)['accuracy']
print("Training accuracy for Perceptron model:", train_accuracy)
print("Validation accuracy for Perceptron model:", validation_accuracy)

Training accuracy for Perceptron model: 0.8946047678795483
Validation accuracy for Perceptron model: 0.912621359223301


The same as above but for the perceptron hyper model.

In [7]:
train_accuracy = perceptron_hyper.evaluate(train_data)['accuracy']
validation_accuracy = perceptron_hyper.evaluate(validation_data)['accuracy']
print("Training accuracy for Hyper Perceptron:", train_accuracy)
print("Validation accuracy for Hyper Perceptron:", validation_accuracy)

Training accuracy for Hyper Perceptron: 0.506900878293601
Validation accuracy for Hyper Perceptron: 0.5728155339805825


## Original Perceptron Confusion Matrix

**By examining the confusion matrix, one can understand where the model is making errors and how it's performing in terms of correctly classifying instances.**
We set a variable confusion matrix, evaluated with evaluate function from TuriCreate, with validation data we got the confusion matrix

In [8]:
confusion_matrix = perceptron.evaluate(validation_data)['confusion_matrix']
print(confusion_matrix)

+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        1        |   8   |
|      1       |        1        |   43  |
|      0       |        0        |   51  |
|      1       |        0        |   1   |
+--------------+-----------------+-------+
[4 rows x 3 columns]



### Perceptron Hyper Confusion Matrix

Same as above, for the perceptron hyper model

In [9]:
confusion_matrix = perceptron_hyper.evaluate(validation_data)['confusion_matrix']
print(confusion_matrix)

+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |   59  |
|      1       |        0        |   44  |
+--------------+-----------------+-------+
[2 rows x 3 columns]



**By calculating Precision and Recall, we can evaluate how well the perceptron model performs in terms of both the accuracy of positive predictions and its ability to identify positive instances in the data.**  
We name the perceptron evalute as model, from them we get the precision and recall

In [10]:
model = perceptron.evaluate(data)
precision = model['precision']
recall = model['recall']
print('precision:', precision)
print('recall:', recall)

precision: 0.878
recall: 0.9032921810699589


**This line of code calculates the Fb-score using the values of precision and recall, with b as a constant.**

**The Fb-score is a measure of a model's accuracy that considers both the precision and recall of the model.**

We set the value of beta, then we calculate with the formula, using the variables we got from the TuriCreate classifier

In [11]:
b = 0.07
f = (1 + b**2) * (precision * recall) / ((b**2 * precision) + recall)
f

0.8781198905851211

Same code to get the precsion and recall, for the perceptron hyper model

In [12]:
model_hyper = perceptron_hyper.evaluate(data)
precision_hyper = model_hyper['precision']
recall_hyper = model_hyper['recall']
print('precision:', precision_hyper)
print('recall:', recall_hyper)

precision: None
recall: 0.0


Same as above for the perceptron hyper we got the Fbeta Score

In [13]:
b = 0.07
if precision_hyper or recall_hyper != 0:
    f = (1 + b**2) * (precision_hyper * recall_hyper) / ((b**2 * precision_hyper) + recall_hyper)
else:
    print("precision or recall are equal 0")

precision or recall are equal 0


### The perceptron model works fine, within default hyperparameters. When hyperparameters changes manually we have negative results. There are three available solvers: Newton-Raphson, LBFGS and Fista, within the data set the one that works better is the Newton, because we have few features and a reasonable amount of data. Limited memory BFGS works good for a wide dataset, which is not the case, and Fista is good for a L1 regularization, meaning that if you have a lot of features, it will get the most important features, as we only have 3 features, this is not worthful. If we compare accuracy, precision and recall, we can see that the second model do not get precision or recall, as it didn't learn from the data, and the confusion matrix shows that in the model_hyper was not possible to get true positives and false negatives.

**Evaluating the model on this data helps assess how well it generalizes to new, unseen examples. Accuracy, in this context, measures the proportion of correctly classified instances in the test dataset.**  
With test accuracy variable we use evaluate function on the test data to get its accuracy

In [14]:
test_accuracy = perceptron.evaluate(test_data)['accuracy']
print("Test accuracy:", test_accuracy)

Test accuracy: 0.85


**Confusion matrix, can understand where the model is making errors and how it's performing in terms of correctly classifying instances.**  
Setting a confusion matrix variable, getting the confusion matrix from the evaluate function

In [15]:
confusion_matrix_test = perceptron.evaluate(test_data)['confusion_matrix']
print(confusion_matrix_test)

+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      1       |        1        |   44  |
|      0       |        1        |   10  |
|      1       |        0        |   5   |
|      0       |        0        |   41  |
+--------------+-----------------+-------+
[4 rows x 3 columns]



Repeating steps that were made above, but instead of using training or validation data, we are getting the test data, to compare our model

In [16]:
test_model = perceptron.evaluate(test_data)
test_precision = test_model['precision']
test_recall = test_model['recall']
print('Precision:', test_precision)
print('Recall:', test_recall)

Precision: 0.8148148148148148
Recall: 0.8979591836734694


Calculating the Fbeta Score, within the test data

In [17]:
b = 0.07
f = (1 + b**2) * (test_precision * test_recall) / ((b**2 * test_precision) + test_recall)
f

0.8151828628634533