## Now let's represent theoretical background of the classifier.

At first, let's represent the sigmoid function for our model.

Sigmoid allows our neural network to transform values to probabilities in [0,1], for correct weights displaying.

In [1]:
def nonlin(x, deriv=False):
    if deriv:
        return f(x) * (1 - f(x))

    return 1 / (1 + np.exp(-x))

In general, our neuron is going to look llike this, with the remark, that it is going to be packed into numpy and cv2 abstractions:

In [3]:
class Neuron:
    def __init__(self, name, input_data, weights, output):
        self.name = name
        self.input_data = input_data
        self.weights = weights
        self.output = output
pass

Also, for informational purposes, I'll demonstrate functions, that will name neurons for non-automatic creation of the architecture of the neurons.

In that case, each neuron would have its own instance of a class:

In [5]:
outcome_neuron_names = []


def naming_of_hlaa():
    new_neuron = Neuron()
    new_neuron.name = "is_HLAA"
    input_neuron_web = [new_neuron.name]
    return input_neuron_web


def naming_of_mhc_neurons():
    mhc_names_list = ["mhc_st_", "mhc_nd_", "mhc_rd_"]  
    input_neuron_web = []
    for number in range(0, 9):
        new_list = []
        for j in range(0, 3):
            new_neuron = Neuron()
            new_neuron.name = mhc_names_list[j] + str(number)
            new_list.append(new_neuron)
        input_neuron_web.append(new_list)
    return input_neuron_web


def naming_of_sequence_neurons():
    letters_list = ["a", "b", "c", "d", "e", "f", "g",
                    "h", "i", "k", "l", "m", "n", "p",
                    "q", "r", "s", "t", "v", "w", "y"]
    input_neuron_web = []
    for number in range(0, 9):
        new_list = []
        for j in range(0, 21):
            new_neuron = Neuron()
            new_neuron.name = letters_list[j] + str(number)
            new_list.append(new_neuron)
        input_neuron_web.append(new_list)
    return input_neuron_web


def naming_of_meas():
    new_neuron = Neuron()
    new_neuron.name = "meas"
    input_neuron_web = [new_neuron.name]
    return input_neuron_web
list_of_hlaa = naming_of_hlaa()  

outcome_neuron_names.extend(list_of_hlaa)  
list_of_mhc = naming_of_mhc_neurons()  

for a in range(0, 9):
    for b in range(0, 3):
        outcome_neuron_names.append(list_of_mhc[a][b].name)
list_of_sequence = naming_of_sequence_neurons()  
sequence

for a in range(0, 9):
    for b in range(0, 21):
        outcome_neuron_names.append(list_of_sequence[a][b].name)
list_of_meas = naming_of_meas() 
outcome_neuron_names.extend(list_of_meas)  
finalized_neuron_names = outcome_neuron_names  
print(len(outcome_neuron_names))

But, as we can create comfortamble abstractions, we'll just use a simple neural network with a single hidden layer:

In [7]:
import cv2
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
mlp = cv2.ml.ANN_MLP_create()
mlp.setLayerSizes(np.array([189, 189, 1]))
mlp.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM, 2.5, 1.0)
mlp.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)
mlp.setBackpropWeightScale(0.00001)
term_mode = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS)
term_max_iter = 10
term_eps = 0.01
mlp.setTermCriteria((term_mode, term_max_iter, term_eps))
mlp.train(X_train_pre, cv2.ml.ROW_SAMPLE, y_train_pre)
_, y_hat_train = mlp.predict(X_train_pre)
accuracy_score(y_hat_train.round(), y_train_pre)
...

Now we'll find the number of TP, TN, FP and FN responses:

In [16]:
train_csv = pd.read_csv("mhc_test.csv")

listed_csv_pep = pd.Series(train_csv["pep_class"]).tolist()

listed_csv_meas = pd.Series(train_csv["meas"]).tolist()

length_of_train_csv = len(train_csv)

def predict(x):
    if x > 0.5:
        return 1
    else:
        return 0


def accuracy_test():
    real_predict = list()

    for i in range(0, length_of_train_csv):
        real_predict.append(listed_csv_pep[i])  # Получаем истинные ответы

    TP = 0  # True positive
    TN = 0  # True negative
    FP = 0  # False positive
    FN = 0  # False negative

    for i in range(0, length_of_train_csv):
        if predict(listed_csv_meas[i]) == 1 and real_predict[i] == 1:
            TP += 1
        elif predict(listed_csv_meas[i]) == 0 and real_predict[i] == 0:
            TN += 1
        elif predict(listed_csv_meas[i]) == 1 and real_predict[i] == 0:
            FP += 1
        elif predict(listed_csv_meas[i]) == 0 and real_predict[i] == 1:
            FN += 1

    return (TP, TN, FP, FN)


print("TP,     TN,   FP,   FN")
print(accuracy_test())

TP,     TN,   FP,   FN
(5176, 14495, 0, 754)


All the quality metrics are represented in 'quallityMetrics.hs' on github.