## PA 3 - Linear Classification with Perceptrons and Logistic Regression

Name: Viswesh Uppalapati 

PID: A15600068

#### Problem 3.1 Perceptron Errors

| # of Passes | 2 | 3 | 4 |
| :- | :- | :- | :- |
| Training Error | 0.03761467889908257 | 0.02018348623853211 | 0.01926605504587156 |
| Testing Error | 0.0610079575596817 | 0.04509283819628647 | 0.04509283819628647 |

#### Problem 3.2 Logistic Regression Errors

| Iters of GD Algorithm | 10 | 50 | 100 |
| :- | :- | :- | :- |
| Training Error | 0.29541284403669726 | 0.03853211009174312 | 0.01926605504587156 |
| Testing Error | 0.29708222811671087 | 0.0610079575596817 | 0.04509283819628647 |

#### Problem 3.3 Words Associated Most with Positive and Negative Classes - Perceptron Algorithm

Words associated most with the positive class for the perceptron algorithm are: file, program, line

Words associated most with the negative class for the perceptron algorithm are: he, team, game

#### Problem 3.4 Words Associated with Most Positive and Negative Classes - Logistic Regression

Words associated most with the positive class for Logistic Regression are: window, file, use

Words associated most with the positive class for Logistic Regression are: he, game, they

#### Problem 3.5 Confusion Matrix Analysis

The confusion matrix is displayed as output in the last cell of this notebook.

a) The highest accuracy for examples belong to class 5. (i and j = 5)

b) The lowest accuracy for examples belong to class 3. (i and j = 3)

c) Examples of class 6 are most often mistakenly classified as belonging to class 5. (i=5 and j=6)

#### Imports and Data

In [460]:
# needed imports

import pandas as pd
import numpy as np
import os

In [461]:
# reading in the train data, test data, and the word-key dictionary that maps a
# word to each entry of an x-feature vector by index

train_data = pd.read_csv('pa3train.txt', sep = " ", header = None)
test_data = pd.read_csv('pa3test.txt', sep = " ", header = None)
word_dict = pd.read_csv('pa3dictionary.txt', sep = " ", header = None).drop(1, axis = 1)
word_dict

Unnamed: 0,0
0,resources
1,last
2,of
3,freedom
4,from
...,...
814,ndet
815,ucontext
816,newmask
817,weick


#### Perceptron Algorithm

In [462]:
# Getting and splitting the data into train and test sets for labels 1 and 2

dim_features = 819

train_temp = train_data[(train_data[dim_features] == 1) | (train_data[dim_features] == 2)]
test_temp = test_data[(test_data[dim_features] == 1) | (test_data[dim_features] == 2)]

X_train = train_temp.drop(dim_features, axis = 1).to_numpy()
Y_train = train_temp[dim_features].to_numpy()

X_test = test_temp.drop(dim_features, axis = 1).to_numpy()
Y_test = test_temp[dim_features].to_numpy()

In [463]:
# maps label 2 to -1 and label 1 to 1 for train and test set

Y_train = np.array(list(map(lambda x: -1 if x == 2 else 1, Y_train)))
Y_test = np.array(list(map(lambda x: -1 if x == 2 else 1, Y_test)))

In [464]:
# This function computes yt * <wt, xt> for the perceptron algorithm check
def check_dot(x_i, y_i, w_i):
    return y_i * (w_i @ x_i)

# This function passes through the perceptron algorithm once
def one_pass_perceptron(X_data, Y_data, w):
    
    # Loop through all the X_data that is passed in order
    for i in range(len(X_data)):
        
        # save the old w and get the current y label and x vector
        temp = w
        x_i = X_data[i, :]
        y_i = Y_data[i]
        
        # perform the perceptron check and update w accordingly
        if check_dot(x_i, y_i, w) <= 0:
            w = temp + y_i*x_i
        else:
            w = temp
    
    # return w after one pass through the algorithm
    return w
    

In [465]:
# This function predicts the y_label for one x vector
def predict_perceptron(w, x_i):
    dot = w @ x_i
    
    # check the dot product and return label accordingly
    # if the dot product is 0, randomly tie break
    if dot > 0:
        return 1
    elif dot < 0:
        return -1
    else:
        return np.random.choice(a = [1, -1], size = 1)

# This function uses the above function to predict for an entire set
# of X_data and return the corresponding labels based on w provided
def predict(w, X_data):
    return np.array(list(map(lambda x: predict_perceptron(w, x), X_data)))

In [466]:
# Initial w_0 is the zero vector with the same dimension as the x-vectors
w_0 = np.zeros(dim_features)

# call one pass of the perceptron alg and store the resulting w
w_one_pass = one_pass_perceptron(X_train, Y_train, w_0)

In [467]:
# Training error check after one pass of alg, should be ~ 0.04
one_pass_preds = predict(w_one_pass, X_train)
one_pass_error = np.mean(one_pass_preds != Y_train)
one_pass_error

0.04036697247706422

In [468]:
# Train and Test errors after two passes of the perceptron algorithm

w_two_pass = one_pass_perceptron(X_train, Y_train, w_one_pass)

train_preds = predict(w_two_pass, X_train)
test_preds = predict(w_two_pass, X_test)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after Two Passes of Perceptron Algorithm: " + str(train_error))
print("Testing Error after Two Passes of Perceptron Algorithm: " + str(test_error))

Training Error after Two Passes of Perceptron Algorithm: 0.03761467889908257
Testing Error after Two Passes of Perceptron Algorithm: 0.0610079575596817


In [469]:
# Train and Test errors after three passes of the perceptron algorithm

w_three_pass = one_pass_perceptron(X_train, Y_train, w_two_pass)

train_preds = predict(w_three_pass, X_train)
test_preds = predict(w_three_pass, X_test)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after Three Passes of Perceptron Algorithm: " + str(train_error))
print("Testing Error after Three Passes of Perceptron Algorithm: " + str(test_error))

Training Error after Three Passes of Perceptron Algorithm: 0.02018348623853211
Testing Error after Three Passes of Perceptron Algorithm: 0.04509283819628647


In [470]:
# Train and Test errors after three passes of the perceptron algorithm

w_four_pass = one_pass_perceptron(X_train, Y_train, w_three_pass)

train_preds = predict(w_four_pass, X_train)
test_preds = predict(w_four_pass, X_test)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after Four Passes of Perceptron Algorithm: " + str(train_error))
print("Testing Error after Four Passes of Perceptron Algorithm: " + str(test_error))

Training Error after Four Passes of Perceptron Algorithm: 0.01926605504587156
Testing Error after Four Passes of Perceptron Algorithm: 0.04509283819628647


#### Logistic Regression Algorithm

In [471]:
# This calculates one component of the gradient of the logistic regression
# loss function
def gradient_calc(x_i, y_i, w_i):
    return (y_i * x_i) / (1 + np.exp(y_i*(w_i @ x_i)))

# This function runs the gradient descent algorithm based on a given number
# of iteration
def logistic_regression(X_data, Y_data, w, num_iterations):
    
    # initialize learning rate
    learning_rate = 0.001
    
    # runs for num_iterations steps of gradient descent
    for a in range(num_iterations):
        gradient = 0
        temp = w
        
        # calculate the gradient based on X_data
        for i in range(len(X_data)):
            
            x_i = X_data[i]
            y_i = Y_data[i]
            
            gradient += gradient_calc(x_i, y_i, w)
        
        # update step for w based on learning rate
        w = temp + learning_rate * gradient
            
    # return final w after given number of iterations
    return w

In [472]:
# predicts for one x-feature vector based on given w
def predict_log_reg(w, x_i):
    # check the value of the conditional probability
    check = 1 / (1 + np.exp(-1 * w @ x_i))
    
    # Return the corresponding label based on value of
    # check, if it's exactly 0.5, tie break randomly
    if check > 0.5:
        return 1
    elif check < 0.5:
        return -1
    else:
        return np.random.choice(a = [1, -1], size = 1)
        

# This function applies predict_log_reg to every x-feature vector
# and returns predictions
def predict_logistic(X_data, w):
    return np.array(list(map(lambda x: predict_log_reg(w, x), X_data)))

In [473]:
# Baseline check of 2 gradient descent iterations on w starting at zero vector

w_0 = np.zeros(819)
w_2_iters = logistic_regression(X_train, Y_train, w_0, 2)

In [474]:
# Baseline check for 2 iterations of gradient descent alg, train erro
# should be ~ 0.497

train_preds = predict_logistic(X_train, w_2_iters)
np.mean(train_preds != Y_train)

  check = 1 / (1 + np.exp(-1 * w @ x_i))


0.4954128440366973

In [475]:
# Train and Test errors after 10 iterations of gradient descent algorithm

w_10_iters = logistic_regression(X_train, Y_train, w_0, 10)

train_preds = predict_logistic(X_train, w_10_iters)
test_preds = predict_logistic(X_test, w_10_iters)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after 10 iterations of gradient descent algorithm: " + str(train_error))
print("Testing Error after 10 iterations of gradient descent algorithm: " + str(test_error))

  return (y_i * x_i) / (1 + np.exp(y_i*(w_i @ x_i)))


Training Error after 10 iterations of gradient descent algorithm: 0.29541284403669726
Testing Error after 10 iterations of gradient descent algorithm: 0.29708222811671087


In [476]:
# Train and Test errors after 50 iterations of gradient descent algorithm

w_50_iters = logistic_regression(X_train, Y_train, w_0, 50)

train_preds = predict_logistic(X_train, w_50_iters)
test_preds = predict_logistic(X_test, w_50_iters)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after 50 iterations of gradient descent algorithm: " + str(train_error))
print("Testing Error after 50 iterations of gradient descent algorithm: " + str(test_error))

  return (y_i * x_i) / (1 + np.exp(y_i*(w_i @ x_i)))


Training Error after 50 iterations of gradient descent algorithm: 0.03853211009174312
Testing Error after 50 iterations of gradient descent algorithm: 0.0610079575596817


In [477]:
# Train and Test errors after 100 iterations of gradient descent algorithm

w_100_iters = logistic_regression(X_train, Y_train, w_0, 100)

train_preds = predict_logistic(X_train, w_100_iters)
test_preds = predict_logistic(X_test, w_100_iters)

train_error = np.mean(train_preds != Y_train)
test_error = np.mean(test_preds != Y_test)

print("Training Error after 100 iterations of gradient descent algorithm: " + str(train_error))
print("Testing Error after 100 iterations of gradient descent algorithm: " + str(test_error))

  return (y_i * x_i) / (1 + np.exp(y_i*(w_i @ x_i)))


Training Error after 100 iterations of gradient descent algorithm: 0.01926605504587156
Testing Error after 100 iterations of gradient descent algorithm: 0.04509283819628647


#### Classifier Anaylsis

In [478]:
# Words most associated with the positive and negative class for w given by perceptron
# algorithm after 3 passes

# combine the word_dict with w indexwise
w_data = pd.DataFrame(w_three_pass)
perceptron_words = w_data.merge(word_dict, on =  word_dict.index).drop('key_0', axis = 1)

# find most positive and most negative w-values by sorting and print words
most_positive = perceptron_words.sort_values('0_x', ascending = False)
most_negative = perceptron_words.sort_values('0_x', ascending = True)
print("Words most associated with positive class: " + str(list(most_positive.iloc[0:3]['0_y'])))
print("Words most associated with negative class: " + str(list(most_negative.iloc[0:3]['0_y'])))

Words most associated with positive class: ['file', 'program', 'line']
Words most associated with negative class: ['he', 'team', 'game']


In [479]:
# Words most associated with the positive and negative class for w given by logistic regression
# algorithm after 50 iteration

# combine the word_dict with w indexwise
w_data = pd.DataFrame(w_50_iters)
log_reg_words = w_data.merge(word_dict, on =  word_dict.index).drop('key_0', axis = 1)

# find most positive and most negative w-values by sorting and print words
most_positive = log_reg_words.sort_values('0_x', ascending = False)
most_negative = log_reg_words.sort_values('0_x', ascending = True)
print("Words most associated with positive class: " + str(list(most_positive.iloc[0:3]['0_y'])))
print("Words most associated with negative class: " + str(list(most_negative.iloc[0:3]['0_y'])))

Words most associated with positive class: ['window', 'file', 'use']
Words most associated with negative class: ['he', 'game', 'they']


#### One vs. All Classifier and Confusion Matrix Analysis

In [480]:
# maps 1 to the provided label, all other labels are mapped to -1
def partition_mapper(label, y_data):
    return np.array(list(map(lambda x: 1 if x == label else -1, y_data)))

# computes the list of one vs all classifier
def one_v_all(X_data, Y_data):
    classifiers = []
    num_labels = 6
    
    # call one pass of perceptron on each label to build
    # each classifier
    for i in range(num_labels):
        label = i + 1
        w_0 = np.zeros(len(X_data[0]))
        y = partition_mapper(label, Y_data)
        C_i = one_pass_perceptron(X_data, y, w_0)
        classifiers.append(C_i)
    
    # return the list of classifiers
    return classifiers
    

In [481]:
# process training data and compute one vs all classifiers

X_one_v_all = train_data.drop(819, axis = 1).to_numpy()
Y_one_v_all = train_data[819].to_numpy()
classifiers = one_v_all(X_one_v_all, Y_one_v_all)

In [482]:
# predicts using the list of classifiers for one point
def predict_one_v_all(classifiers, x):
    # compute <w, x> for each w in classifiers 
    # through matrix multiplication
    checks = classifiers @ x
    
    count = 0
    result = 0
    
    # output the label of the one vs all classifier
    for x in range(len(checks)):
        label = x + 1
        
        if checks[x] > 0:
            count += 1
            result = label
    
    # if exactly one label had <w, x> > 0, return
    # that label, otherwise prediction is 0 (undefined/unsure)
    if count == 1:
        return result
    else:
        return 0

# predicts the label for each X-vector by using predict_one_v_all
def predict_all(classifiers, X_data):
    return np.array(list(map(lambda x: predict_one_v_all(classifiers, x), X_data)))

In [483]:
# builds the confusion matrix of one vs all classifier on test data
def confusion_matrix(classifiers, X_data, Y_data):
    mat = np.zeros([7, 6])
    # get all the predictions
    preds = predict_all(classifiers, X_data)
    
    # compute the matrix
    for i in range(len(mat)):
        check = i + 1
        if i == 6:
            check = 0
        for j in range(len(mat[0])):
            label = j + 1
            n_j = len(Y_test_one_v_all[Y_test_one_v_all == label])
            count = 0
            
            # compute each entry based on the preds that were
            # label as i, when they were supposed to be j
            for x in range(len(Y_data)):
                if (preds[x] == check) and (Y_test_one_v_all[x] == label):
                    count += 1

            # each entry is the proportion of mistakes made by the classifier
            mat[i][j] = count / n_j
    
    # return the confusion matrix
    return mat

In [484]:
# Process test data and construct the confusion matrix

X_test_one_v_all = test_data.drop(819, axis = 1).to_numpy()
Y_test_one_v_all = test_data[819].to_numpy()
conf_mat = confusion_matrix(classifiers, X_test_one_v_all, Y_test_one_v_all)
print(conf_mat)

[[0.71891892 0.00520833 0.03428571 0.02173913 0.         0.        ]
 [0.01081081 0.65625    0.03428571 0.02717391 0.01282051 0.01851852]
 [0.         0.015625   0.37142857 0.         0.         0.02777778]
 [0.01621622 0.00520833 0.         0.69021739 0.         0.        ]
 [0.01621622 0.03125    0.07428571 0.00543478 0.80128205 0.12037037]
 [0.00540541 0.01041667 0.03428571 0.         0.07051282 0.49074074]
 [0.23243243 0.27604167 0.45142857 0.25543478 0.11538462 0.34259259]]
