# Assignment 09
***

# 과제 정의
***

Build a binary classifier to classify digit 0 against all the other digits at MNIST dataset.

Let $ x = (x_1, x_2, ... , x_m) $ be a vector representing an image in the dataset.

The prediction function f_w(x) is defined by the linear combination of data (1, x) and the model parameter w:
$f_w(x) = w_0 * 1 + w_1 * x_1 + w_2 * x_2 + ... + w_m * x_m $
where $ w = (w_0, w_1, ... , w_m)$

The prediction function $ f_w(x)$ should have the following values:


$f_w(x) = +1$   if   $label(x) = 0$

$f_w(x) = -1$   if   $label(x) is not 0$

The optimal model parameter w is obtained by minimizing the following objective function:
\sum_i ( f_w(x^(i) - y^(i) )^2

1. Compute an optimal model parameter using the training dataset
2. Compute (1) True Positive, (2) False Positive, (3) True Negative, (4) False Negative based on the computed optimal model parameter using (1) training dataset and (2) testing dataset.

---
### 모듈 정의

In [23]:
import numpy as np
import collections

---

## 1. Compute an optimal model parameter using the training dataset

In [40]:
file_data   = "mnist_train.csv"
handle_file = open(file_data, "r")
data        = handle_file.readlines()
handle_file.close()

size_row    = 28    # height of the image
size_col    = 28    # width of the image

num_image   = len(data)
count       = 0     # count for the number of images


def normalize(data):
    data_normalized = (data - min(data)) / (max(data) - min(data))
    return(data_normalized)


list_label  = np.empty(num_image, dtype=int)

zero_data = []
zero_data_y = []
for line in data:
    line_data   = line.split(',')
    label       = line_data[0]
    im_vector   = np.asfarray(line_data[1:])
    im_vector   = normalize(im_vector)
    im_vector = np.insert(im_vector, 0, 1)
    zero_data.append(im_vector)
    list_label[count]       = label
    if label == '0':
        zero_data_y.append(1.0);
    else:
        zero_data_y.append(-1.0);
    count += 1


xn = np.array(zero_data,dtype=float)
yn = np.array(zero_data_y,dtype=float)
X = np.dot(np.linalg.pinv(xn) , yn)
print(X)


[-6.84406870e-01  1.02756547e-12 -2.53897628e-12 -1.43117440e-12
 -8.09054705e-13 -1.00663532e-12 -2.86102582e-12 -1.32965371e-12
 -7.89867287e-13 -1.49647371e-12  4.59362118e-13  7.43197493e-13
  1.89601442e-12  1.53458664e-01  1.98220986e-01 -1.53345669e-01
 -6.38940288e-03  4.47361378e-12 -1.15733488e-12  9.92574581e-13
  6.00965476e-14  1.85045849e-12  1.29649017e-12  4.87411594e-15
  2.31356154e-12  5.71223233e-14 -2.14432285e-12  2.01999570e-12
  7.60771616e-13 -1.56596542e-13  9.99917224e-13 -2.31901119e-12
  7.25859287e-15 -4.06289060e+00  1.05099744e+00  1.23101922e-01
  2.11306669e-01  6.51508445e-02  2.55392530e-03  5.52498152e-02
  1.15990558e-01 -7.23390275e-02  2.72683304e-01 -1.63118357e-01
  6.19145739e-02  3.85382309e-05  7.70688345e-02 -6.15820985e-02
  1.52778049e-02  8.53863822e-02 -1.41982541e-01  2.38270481e-01
  3.58333722e-02  1.43063859e-12 -1.45557527e-12 -6.86825758e-12
  2.23255856e-12 -3.05791426e-12  6.38880534e-12 -8.36059853e+00
  9.77593181e+00  5.34144

## 2. Compute True Positive, False Positive, True Negative, False Negative based on the computed optimal model parameter using training dataset

In [41]:
truthZeroCount = collections.Counter(list_label)[0]
truthNotZeroCount = len(list_label) - truthZeroCount
answerZeroCount_y = 0
answerZeroCount_n = 0
answerNotZeroCount_y = 0
answerNotZeroCount_n = 0


for x in range(len(xn)):
    value = np.dot(xn[x],X)
    if value >= 0.0:
        if list_label[x] == 0:
            answerZeroCount_y = answerZeroCount_y + 1
        else:
            answerZeroCount_n = answerZeroCount_n + 1
    else:
        if list_label[x] != 0:
            answerNotZeroCount_y = answerNotZeroCount_y + 1
        else:
            answerNotZeroCount_n = answerNotZeroCount_n + 1

TP = answerZeroCount_y/truthZeroCount
FP = answerZeroCount_n/truthNotZeroCount
FN = answerNotZeroCount_n/truthZeroCount
TN = answerNotZeroCount_y/truthNotZeroCount

print("TP : " + str(TP))
print("FP : " + str(FP))
print("FN : " + str(FN))
print("TN : " + str(TN))

TP : 0.8723619787269965
FP : 0.003310094864729922
FN : 0.12763802127300355
TN : 0.9966899051352701


## 2. Compute True Positive, False Positive, True Negative, False Negative based on the computed optimal model parameter using training dataset

In [42]:
file_data   = "mnist_test.csv"
handle_file = open(file_data, "r")
data        = handle_file.readlines()
handle_file.close()
count = 0
num_image   = len(data)
list_label  = np.empty(num_image, dtype=int)
zero_data = []
for line in data:
    line_data   = line.split(',')
    label       = line_data[0]
    im_vector   = np.asfarray(line_data[1:])
    im_vector   = normalize(im_vector)
    im_vector = np.insert(im_vector, 0, 1)
    zero_data.append(im_vector)
    list_label[count]       = label
    count += 1
xn = np.array(zero_data,dtype=float)

truthZeroCount = collections.Counter(list_label)[0]
truthNotZeroCount = len(list_label) - truthZeroCount
answerZeroCount_y = 0
answerZeroCount_n = 0
answerNotZeroCount_y = 0
answerNotZeroCount_n = 0

for x in range(len(xn)):
    value = np.dot(xn[x],X)
    if value >= 0.0:
        if list_label[x] == 0:
            answerZeroCount_y = answerZeroCount_y + 1
        else:
            answerZeroCount_n = answerZeroCount_n + 1
    else:
        if list_label[x] != 0:
            answerNotZeroCount_y = answerNotZeroCount_y + 1
        else:
            answerNotZeroCount_n = answerNotZeroCount_n + 1

TP = answerZeroCount_y/truthZeroCount
FP = answerZeroCount_n/truthNotZeroCount
FN = answerNotZeroCount_n/truthZeroCount
TN = answerNotZeroCount_y/truthNotZeroCount

print("TP : " + str(TP))
print("FP : " + str(FP))
print("FN : " + str(FN))
print("TN : " + str(TN))

TP : 0.8836734693877552
FP : 0.004767184035476719
FN : 0.11632653061224489
TN : 0.9952328159645233
