# Assignment11

## Name: YangMyungCheol (양명철)
## ID: 20122776
## Submission Time: 2019.06.11. 18:05

Build a binary classifier based on k random features for each digit against all the other digits at MNIST dataset.

Let x = $(x_1, x_2, ... , x_m)$ be a vector representing an image in the dataset.

The prediction function $f_d(x; w)$ is defined by the linear combination of input vector x and the model parameter w for each digit d :

$f_d(x; w) = w_0 * 1 + w_1 * g_1 + w_2 * g_2 + ... + w_k * g_k $

where w = $(w_0, w_1, ... , w_k)$ and the basis function $g_k$ is defined by the inner product of random vector $r_k$ and input vector x. 

You may want to try to use $g_k$ = max( inner production( $r_k$, x ), 0 ) to see if it improves the performance.

The prediction function f_d(x; w) should have the following values:

$f_d(x; w)$ = +1 if label(x) = d
$f_d(x; w)$ = -1 if label(x) is not d

The optimal model parameter w is obtained by minimizing the following objective function for each digit d :
$\sum_i ( f_d(x^(i); w) - y^(i) )^2$

and the label of input x is given by:

$argmax_d f_d(x; w)$

1. Compute an optimal model parameter using the training dataset for each classifier $f_d(x, w)$
2. Compute (1) true positive rate, (2) error rate using (1) training dataset and (2) testing dataset.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy.linalg as lin
import copy 
import random
from sklearn.metrics import confusion_matrix

# train set

In [2]:
tr = pd.read_csv("mnist_train.csv", header=-1)

In [3]:
tr_data = np.array(tr)

In [4]:
tr_y, tr_x = np.split(tr_data, [1], axis=1)
tr_y_bin = copy.copy(tr_y)

In [5]:
number= 60000
tr_x_short = tr_x[:number]
yy = tr_y_bin[:number]

In [6]:
def min_max(data):
    result = (data-data.min()) / (data.max() - data.min())
    return result

In [7]:
tr_x_scale = min_max(tr_x_short)

In [8]:
def random_array(k):
    r_a = np.zeros((k,784))
    for i in range(k):
        for j in range (784):
            r_a[i][j] = np.random.normal(0, 2)
            
    return r_a

In [9]:
aa = random_array(1024)
print(aa)

[[ 0.16718117  2.33697363  0.09928906 ... -1.2831629   1.03461766
   2.2791477 ]
 [-2.5368328  -1.73690939  0.9539526  ... -1.66389915  2.27111837
  -0.1056068 ]
 [-1.74532837  3.32412085 -0.92245299 ... -4.04030082 -1.18541428
   1.34487052]
 ...
 [ 0.06410326  0.48757782 -1.3757681  ... -2.09029987  0.91268835
  -0.01466575]
 [-2.61534795  3.09261799 -0.28570716 ...  3.0392785  -0.08492525
   4.54717347]
 [ 3.31715881  0.94884237 -2.17896365 ... -0.38151747 -2.06677172
  -0.96716698]]


In [10]:
new_A = np.dot(tr_x_scale, aa.T)

In [11]:
def random_array_sign(A):
    for i in range(A.shape[0]):
        for j in range (A.shape[1]):
            if A[i][j] < 0:
                A[i][j] = 0
            
    return A

In [12]:
new_A_sign = random_array_sign(new_A)

In [13]:
print(new_A_sign)

[[ 0.         13.02364076  0.         ...  0.          0.
   0.        ]
 [ 0.         11.17808605  0.         ...  0.          0.
   0.        ]
 [ 1.16223899  0.          0.         ...  0.          5.978483
  16.05406721]
 ...
 [ 9.15504217 34.76248681  0.         ...  0.          0.
   0.        ]
 [14.70890575  0.          0.         ...  0.          0.
   3.74390344]
 [11.3635804   7.97119556  0.         ...  0.          0.
   0.        ]]


In [14]:
def add_bias(B):
    one = np.ones((B.shape[0], 1))
    A = np.concatenate((one, B),axis = 1)
    return A

In [15]:
new_A_bias = add_bias(new_A_sign)

In [16]:
new_A_bias.shape

(60000, 1025)

In [17]:
print(new_A_bias)

[[ 1.          0.         13.02364076 ...  0.          0.
   0.        ]
 [ 1.          0.         11.17808605 ...  0.          0.
   0.        ]
 [ 1.          1.16223899  0.         ...  0.          5.978483
  16.05406721]
 ...
 [ 1.          9.15504217 34.76248681 ...  0.          0.
   0.        ]
 [ 1.         14.70890575  0.         ...  0.          0.
   3.74390344]
 [ 1.         11.3635804   7.97119556 ...  0.          0.
   0.        ]]


In [18]:
def classifier(digit):
    
    for i in range(tr_y_bin.shape[0]):
        if tr_y[i][0] == digit:
            tr_y_bin[i][0] =  1
        else:
            tr_y_bin[i][0] = -1

        
    yy = tr_y_bin[:number] 

    Beta = np.dot(np.dot(lin.inv(np.dot(new_A_bias.T, new_A_bias)), new_A_bias.T), yy)

    return Beta

    

In [19]:
Beta0 = classifier(0)
Beta1 = classifier(1)
Beta2 = classifier(2)
Beta3 = classifier(3)
Beta4 = classifier(4)
Beta5 = classifier(5)
Beta6 = classifier(6)
Beta7 = classifier(7)
Beta8 = classifier(8)
Beta9 = classifier(9)

1. Compute an optimal model parameter using the training dataset for each classifier $f_d(x, w)$


In [20]:
# optimal model parameter for 0~9
print(Beta0)
print(Beta1)
print(Beta2)
print(Beta3)
print(Beta4)
print(Beta5)
print(Beta6)
print(Beta7)
print(Beta8)
print(Beta9)

[[-9.80082513e-01]
 [ 7.21329946e-04]
 [-1.64672440e-03]
 ...
 [ 2.18516470e-03]
 [-3.24329618e-03]
 [-5.23699927e-04]]
[[-6.28656680e-01]
 [-8.47668985e-05]
 [ 1.74743768e-03]
 ...
 [-3.58556644e-04]
 [-7.40909840e-04]
 [-1.62172568e-04]]
[[-0.84726041]
 [-0.00164361]
 [ 0.00461456]
 ...
 [-0.00089012]
 [-0.00126408]
 [-0.00096525]]
[[-1.03116992e+00]
 [ 1.44100886e-03]
 [-7.30771137e-04]
 ...
 [-4.92399544e-03]
 [ 8.74749040e-04]
 [-2.75820697e-06]]
[[-6.47406019e-01]
 [ 2.52344651e-03]
 [-1.61817731e-03]
 ...
 [ 2.59439564e-03]
 [-1.25696941e-03]
 [-1.55498302e-05]]
[[-5.89585197e-01]
 [-1.45097308e-03]
 [-1.05216672e-03]
 ...
 [-1.43146642e-03]
 [ 1.24811219e-03]
 [-2.48862311e-04]]
[[-1.01139294e+00]
 [ 2.33280081e-03]
 [-1.67563063e-03]
 ...
 [-9.33661799e-04]
 [-3.25654821e-04]
 [ 1.64154831e-03]]
[[-6.81006629e-01]
 [-6.67964010e-04]
 [ 3.27117050e-03]
 ...
 [ 1.14591246e-03]
 [ 4.97266873e-03]
 [ 8.23575879e-04]]
[[-7.49044966e-01]
 [-5.80454175e-04]
 [ 5.57875332e-04]
 ...
 [

In [21]:
pr0 = np.dot(new_A_bias, Beta0)
pr1 = np.dot(new_A_bias, Beta1)
pr2 = np.dot(new_A_bias, Beta2)
pr3 = np.dot(new_A_bias, Beta3)
pr4 = np.dot(new_A_bias, Beta4)
pr5 = np.dot(new_A_bias, Beta5)
pr6 = np.dot(new_A_bias, Beta6)
pr7 = np.dot(new_A_bias, Beta7)
pr8 = np.dot(new_A_bias, Beta8)
pr9 = np.dot(new_A_bias, Beta9)

In [22]:
arg = np.column_stack((pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7, pr8, pr9))

In [23]:
max_index_list =[]
for i in range(new_A_bias.shape[0]):
    max_index_list.append(np.argmax(arg[i]))

In [24]:
np.unique(max_index_list)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

In [25]:
max_index_list

[5,
 0,
 4,
 1,
 9,
 2,
 1,
 3,
 1,
 4,
 3,
 5,
 3,
 6,
 1,
 7,
 2,
 8,
 6,
 9,
 4,
 0,
 9,
 1,
 7,
 2,
 4,
 3,
 7,
 7,
 3,
 8,
 6,
 9,
 0,
 5,
 6,
 0,
 7,
 6,
 1,
 8,
 7,
 9,
 3,
 9,
 8,
 5,
 5,
 3,
 3,
 0,
 7,
 4,
 4,
 8,
 0,
 9,
 4,
 1,
 4,
 4,
 6,
 0,
 4,
 5,
 6,
 1,
 0,
 0,
 1,
 7,
 1,
 6,
 3,
 0,
 2,
 1,
 1,
 7,
 8,
 0,
 2,
 6,
 7,
 8,
 3,
 9,
 0,
 4,
 6,
 7,
 4,
 6,
 8,
 0,
 7,
 8,
 3,
 1,
 5,
 7,
 1,
 7,
 1,
 1,
 6,
 3,
 0,
 2,
 9,
 3,
 1,
 1,
 0,
 4,
 9,
 2,
 0,
 0,
 2,
 0,
 2,
 7,
 1,
 8,
 6,
 4,
 1,
 6,
 3,
 4,
 1,
 9,
 5,
 3,
 3,
 8,
 5,
 4,
 7,
 7,
 4,
 2,
 8,
 5,
 8,
 6,
 4,
 3,
 4,
 6,
 1,
 9,
 9,
 6,
 0,
 3,
 7,
 2,
 8,
 2,
 9,
 4,
 4,
 6,
 4,
 9,
 7,
 0,
 9,
 2,
 7,
 5,
 1,
 5,
 9,
 1,
 2,
 3,
 1,
 3,
 5,
 9,
 1,
 7,
 6,
 2,
 8,
 2,
 2,
 5,
 0,
 7,
 4,
 9,
 7,
 8,
 3,
 2,
 1,
 1,
 5,
 3,
 6,
 1,
 0,
 3,
 1,
 0,
 0,
 1,
 1,
 2,
 7,
 3,
 0,
 4,
 6,
 5,
 2,
 6,
 4,
 7,
 8,
 8,
 9,
 9,
 5,
 0,
 7,
 1,
 0,
 2,
 0,
 3,
 5,
 4,
 6,
 5,
 1,
 6,
 3,
 7,
 5,
 8,
 0,
 9,
 1,
 0,


In [26]:
acutal_y = tr_y.tolist()
data = {'y_Predicted': max_index_list,
        'y_Actual':    acutal_y
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
print (df)

      y_Actual  y_Predicted
0          [5]            5
1          [0]            0
2          [4]            4
3          [1]            1
4          [9]            9
5          [2]            2
6          [1]            1
7          [3]            3
8          [1]            1
9          [4]            4
10         [3]            3
11         [5]            5
12         [3]            3
13         [6]            6
14         [1]            1
15         [7]            7
16         [2]            2
17         [8]            8
18         [6]            6
19         [9]            9
20         [4]            4
21         [0]            0
22         [9]            9
23         [1]            1
24         [1]            7
25         [2]            2
26         [4]            4
27         [3]            3
28         [2]            7
29         [7]            7
...        ...          ...
59970      [2]            2
59971      [2]            2
59972      [0]            0
59973      [9]      

In [27]:
confusion_matrix_tr = confusion_matrix(acutal_y,max_index_list)

In [28]:
confusion_matrix_tr

array([[5799,    1,    6,   10,   10,   18,   30,    6,   39,    4],
       [   0, 6625,   42,    7,   19,   10,    9,   10,   15,    5],
       [  37,   27, 5581,   44,   43,    5,   31,   74,  100,   16],
       [  13,   20,   86, 5687,    5,  105,   11,   45,   92,   67],
       [   9,   38,   21,    0, 5503,    5,   45,    9,   30,  182],
       [  34,   13,   17,  109,   25, 5042,   79,   15,   52,   35],
       [  23,   12,   15,    2,   26,   68, 5755,    0,   16,    1],
       [  16,   55,   51,   21,   66,    4,    1, 5932,   11,  108],
       [  19,   74,   59,   92,   33,  100,   34,   15, 5333,   92],
       [  23,   21,   13,   77,  149,   27,    6,  111,   57, 5465]],
      dtype=int64)

In [29]:
TP_tr =0
r_sum = np.sum(confusion_matrix_tr, axis = 1)
print(r_sum)

for i in range(confusion_matrix_tr.shape[0]):
    TP_tr += confusion_matrix_tr[i][i] / r_sum[i]

TP_tr /= 10
TP_tr

[5923 6742 5958 6131 5842 5421 5918 6265 5851 5949]


0.9447488657951189

In [30]:
TP_tr

0.9447488657951189

In [31]:
Error_tr = 1- np.trace(confusion_matrix_tr)/ sum(sum(confusion_matrix_tr))

In [32]:
Error_tr

0.05463333333333331

2. Compute (1) true positive rate, (2) error rate using (1) training dataset

In [33]:
print("train TP Rate : " + str(TP_tr))
print("train Error Rate : " + str(Error_tr))


train TP Rate : 0.9447488657951189
train Error Rate : 0.05463333333333331


# Test set

In [34]:
ts = pd.read_csv("mnist_test.csv", header=-1)

In [35]:
ts_data = np.array(ts)

In [36]:
ts_y, ts_x = np.split(ts_data, [1], axis=1)
ts_y_bin = copy.copy(ts_y)

In [37]:
number= 10000
ts_x_shortort = ts_x[:number]
yy_s = ts_y_bin[:number]

In [38]:
ts_x_scale = min_max(ts_x_shortort)

In [39]:
print(aa)

[[ 0.16718117  2.33697363  0.09928906 ... -1.2831629   1.03461766
   2.2791477 ]
 [-2.5368328  -1.73690939  0.9539526  ... -1.66389915  2.27111837
  -0.1056068 ]
 [-1.74532837  3.32412085 -0.92245299 ... -4.04030082 -1.18541428
   1.34487052]
 ...
 [ 0.06410326  0.48757782 -1.3757681  ... -2.09029987  0.91268835
  -0.01466575]
 [-2.61534795  3.09261799 -0.28570716 ...  3.0392785  -0.08492525
   4.54717347]
 [ 3.31715881  0.94884237 -2.17896365 ... -0.38151747 -2.06677172
  -0.96716698]]


In [40]:
new_A_ts = np.dot(ts_x_scale, aa.T)

In [41]:
new_A_sign_ts = random_array_sign(new_A_ts)

In [42]:
new_A_bias_ts = add_bias(new_A_sign_ts)

In [43]:
new_A_bias_ts.shape

(10000, 1025)

In [44]:
Beta0.shape

(1025, 1)

In [45]:
np.unique(ts_y)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

In [46]:
pr_s0 = np.dot(new_A_bias_ts, Beta0)
pr_s1 = np.dot(new_A_bias_ts, Beta1)
pr_s2 = np.dot(new_A_bias_ts, Beta2)
pr_s3 = np.dot(new_A_bias_ts, Beta3)
pr_s4 = np.dot(new_A_bias_ts, Beta4)
pr_s5 = np.dot(new_A_bias_ts, Beta5)
pr_s6 = np.dot(new_A_bias_ts, Beta6)
pr_s7 = np.dot(new_A_bias_ts, Beta7)
pr_s8 = np.dot(new_A_bias_ts, Beta8)
pr_s9 = np.dot(new_A_bias_ts, Beta9)

In [47]:
arg_s = np.column_stack((pr_s0, pr_s1, pr_s2, pr_s3, pr_s4, pr_s5, pr_s6, pr_s7, pr_s8, pr_s9))

In [48]:
max_index_list_s =[]
for i in range(new_A_bias_ts.shape[0]):
    max_index_list_s.append(np.argmax(arg_s[i]))

In [49]:
np.unique(max_index_list_s)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

In [50]:
acutal_y_s = ts_y.tolist()
data_s = {'y_Predicted': max_index_list_s,
        'y_Actual':    acutal_y_s
        }

df_s = pd.DataFrame(data_s, columns=['y_Actual','y_Predicted'])
#print (df_s)

In [51]:
confusion_matrix_ts = confusion_matrix(acutal_y_s,max_index_list_s)

In [52]:
confusion_matrix_ts

array([[ 960,    0,    0,    0,    1,    2,    7,    2,    7,    1],
       [   0, 1122,    4,    1,    1,    0,    4,    1,    2,    0],
       [  11,    4,  957,    8,   11,    1,    6,   12,   17,    5],
       [   0,    1,   14,  944,    0,   20,    2,   11,   12,    6],
       [   2,    2,    4,    0,  923,    1,    6,    1,    8,   35],
       [   7,    0,    0,   18,    7,  826,   14,    7,   11,    2],
       [   6,    3,    2,    1,   10,   14,  919,    1,    2,    0],
       [   1,   14,   17,    3,    8,    1,    0,  960,    2,   22],
       [   7,    1,    6,   12,    8,   17,   10,    7,  893,   13],
       [   6,    6,    1,   10,   24,    6,    1,   17,    6,  932]],
      dtype=int64)

In [53]:
TP_ts =0
r_sum_s = np.sum(confusion_matrix_ts, axis = 1)
print(r_sum_s)

for i in range(confusion_matrix_ts.shape[0]):
    TP_ts += confusion_matrix_ts[i][i] / r_sum_s[i]

TP_ts /= 10
TP_ts

[ 980 1135 1032 1010  982  892  958 1028  974 1009]


0.9429711570140371

In [54]:
Error_ts = 1 - np.trace(confusion_matrix_ts)/ sum(sum(confusion_matrix_ts))

In [55]:
Error_ts

0.056400000000000006

2. Compute (1) true positive rate, (2) error rate using (2) testing dataset.

In [56]:
print("test TP Rate : " + str(TP_ts))
print("test Error Rate : " + str(Error_ts))


test TP Rate : 0.9429711570140371
test Error Rate : 0.056400000000000006
