> ## Make a copy of this notebook (File menu -> Make a Copy...)

### Homework Question 1

This question will explore the difference between classical Gram-Schmidt (GS) and Modified Gram-Schmidt. We use two matrices, one random matrix generated by you, and a second matrix designed to highlight the difference.

1. In lab, you wrote code for MGS. Write a similar function for GS. Show a test on a small matrix. You should get the same result as your MGS function.<br><br>
1. The *$n\times n$ Hilbert matrix* has entry $\frac{1}{i+j-1}$ in the $(i,j)$ position (where $i$ and $j$ are numbered from 1 to $n$). Write a function `hilbert(n)` that returns this matrix. You can use loops if you like. Careful to translate from NumPy (0 to $n-1$) numbering into classical numbering. Show that you get the correct matrix for $n=3$.<br><br>
1. GS and MGS produce orthogonal matrices. If $Q$ is an orthogonal matrix, what is $Q^TQ$?
> We will test how good our functions are by testing how far away from the desired result our actual result is. To do this, we will take our result, $Q$, subtract $Q^TQ$ from the matrix we 'should' get (the answer to Part 3 of this question), and find the *matrix norm* of the result. The matrix norm is simply the square root of the sum of the entries of the matrix, and can be computed using `np.linalg.norm(A)`. Call this number the *error*.
1. Generate a random $200\times 200$ matrix using `np.random.rand(200,200)`. Compute the errors we get using each of GS and MGS.<br><br>
1. Consider the matrix `0.00001*np.eye(n)+hilbert(n)`. Compute the errors for this matrix using each of GS and MGS.<br><br>
1. Comment briefly on the results in the previous two parts of this question.

**Note:** While a complete error analysis is beyond the scope of this class, see [here](https://www.math.uci.edu/~ttrogdon/105A/html/Lecture23.html) if you're interested in why MGS is so much better.

### Homework Question 2

Choose at least one of the following to extend your handwriting recognition work. Doing both well will get extra credit. If you do both, you will have created a handwritten digit recognition system that is as good as can be given only the tools we have.

#### Extending your code to compute binary classifiers for all ten digits
1. Compute binary classifiers for each of the digits between 0 and 9.<br><br>
1. Compute error rates for each digit, both on the training set and the test set.<br><br>
1. For each image in the test set, use all ten classifiers to see how many handwritten digits your classifiers give unique answers to. That is, if your classifiers determine that a particular image may be a 1 or a 2, then you cannot classify that particular image. How many images were not recognized as any digit at all?<br><br>
1. Compute your overall success rate on the test images. That is, compute how many images were correctly and uniquely classified. Also compute the rate of *false positives* (that is, images that were identified as digits, but whose label was wrong), and *false negatives* (images whose digit couldn't be assigned). This is how good your handwriting recognition is!
    
#### Optimizing your classifier(s)
1. We decided that a negative result from the model mean 'this image is not 0', and non-negative results means 'this image is 0'. Write code that searches for a better threshold than 0. That is, find the threshold that gives you the lowest error rate on the test set.<br><br>
1. What error rate does your optimized threshold give on the test set?<br><br>
1. Why can't you use the test set to optimize the threshold?
    
##### Note: if you do choose to do both, the thresholds may well not be the same for each digit!

In [6]:
from MNISTHandwriting import readimgs
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import numpy.linalg as LA
tol = 10**-10

#setting up images and matrices for all training-set classifiers
imagesTrain =  readimgs('./data/train-images-idx3-ubyte')[0].astype('float')
labelsTrain = readimgs('./data/train-labels-idx1-ubyte')[0].astype('float')

A = np.reshape(imagesTrain,(60000,784))
sumOfCols = A.sum(axis=0)
index = np.nonzero(sumOfCols)[0]

B = np.ones((60000,718))
B[:,:-1] = A[:,index]

#setting up images and matrices for testing-set classifiers
imagesTest = readimgs('./data/t10k-images-idx3-ubyte')[0].astype('float')
labelsTest = readimgs('./data/t10k-labels-idx1-ubyte')[0].astype('float')

C = np.reshape(imagesTest,(10000,784))
C = C[:,index]
D = np.ones((10000,718))
D[:,:-1] = C

def makeClassifier(num):
    #making an array with -1 and 1 depending on label for training set
    classifiedLabelTrain = np.ones(60000)
    classifiedLabelTrain[labelsTrain != num] = -1
    
    #making an array with -1 and 1 depending on label for testing set
    classifiedLabelTest = np.ones(10000)
    classifiedLabelTest[labelsTest != num] = -1
    
    #calculating the coefficients of our predictor
    ans = LA.lstsq(B,classifiedLabelTrain,rcond=tol)[0]
    
    #calculating A@xhat for training and testing sets
    evalTrain = B@ans
    evalTest = D@ans
    
    #creating array of -1 and 1 depending on the training set results
    modelresultsTrain = np.ones(60000)
    modelresultsTrain[evalTrain <= 0] = -1
    
    #creating array of -1 and 1 depending on the testing set results
    modelresultsTest = np.ones(10000)
    modelresultsTest[evalTest <= 0] = -1
    return modelresultsTrain, classifiedLabelTrain, modelresultsTest, classifiedLabelTest


def errorRates(learnedLabels,correctLabels,size):
    #compare is a 1x10000 boolean array, where True if our model correctly identified the digit
    compare = (correctLabels == learnedLabels)
    return (len(compare[compare == False])/size), compare

#make a 10x10000 array of zeros; insert modelresTest into corresponding rows of resUnique, which will
#be used to compute the number of uniquely identified images
dummy = np.zeros((10000))
resUnique = np.tile(dummy,10).reshape((10,10000))

#make a 10x10000 array of falses; insert compare into corresponding rows of resCorrect, which will
#be used to compute the number of correctly identified images
resCorrect = np.ones((10,10000),dtype=bool)

#make a 10x10000 array of zeros; insert modelresTest into corresponding rows of modelResults, which will
#be used to compute the error
modelResults = np.tile(dummy,10).reshape((10,10000))

#make a 10x10000 array of zeros; insert modelresTest into corresponding rows of classifiedResults, which will
#be used to compute the error
classifiedResults = np.tile(dummy,10).reshape((10,10000))

for i in range(10):
    modelresTrain,classLabelTrain,modelresTest,classLabelTest = makeClassifier(i)
    modelResults[i] = modelresTest
    classifiedResults[i] = classLabelTest
    resUnique[i] = modelresTest
    print("Training error rate for "+ str(i))
    print(errorRates(modelresTrain,classLabelTrain,60000)[0])
    print("Testing error rate for " + str(i))
    errorTests, errorArray = errorRates(modelresTest,classLabelTest,10000)
    print(errorTests)
    resCorrect[i] = errorArray

Training error rate for 0
0.015566666666666666
Testing error rate for 0
0.0157
Training error rate for 1
0.019016666666666668
Testing error rate for 1
0.0166
Training error rate for 2
0.03766666666666667
Testing error rate for 2
0.0416
Training error rate for 3
0.04223333333333333
Testing error rate for 3
0.0398
Training error rate for 4
0.03378333333333333
Testing error rate for 4
0.0335
Training error rate for 5
0.05556666666666667
Testing error rate for 5
0.0536
Training error rate for 6
0.021666666666666667
Testing error rate for 6
0.026
Training error rate for 7
0.03398333333333333
Testing error rate for 7
0.0354
Training error rate for 8
0.05135
Testing error rate for 8
0.051
Training error rate for 9
0.05563333333333333
Testing error rate for 9
0.052


In [19]:
sumOfCols = resUnique.sum(axis=0)

#classifier uniquely idenitifies image: only one classifier returns 1 for that image and all others
#return -1, producing a sum of -8
worksCorrectly = sumOfCols[sumOfCols == -8]
print("The number of recognized images: ")
print(len(worksCorrectly))

#classifiers cannot recognize image at all: all classifiers return -1, producing a sum of -9
unrecognized = sumOfCols[sumOfCols == -9]
print("The number of unrecognized images: ")
print(len(unrecognized))

The number of recognized images: 
7050
The number of unrecognized images: 
0


In [78]:
#boolean array of correctly identified images
resCorrectCollapsed = resCorrect.all(axis=0)

#boolean array of uniquely identified images
uniques = np.zeros(10000,dtype=bool)
uniques[sumOfCols == -8] = True

#and the arrays of correctly identified and uniquely identified images
#count of Trues in resulting array is number of correctly AND uniquely identified images
uniqueAndCorrect = np.stack((resCorrectCollapsed,uniques)).all(axis=0)

print("This is our overall success rate: ")
print(len(uniqueAndCorrect[uniqueAndCorrect==True])/10000)

#calculating false positives
totalfp = 0
totalfn = 0

for i in range(10):
    #default boolean array of falses
    fpModel = np.zeros(10000,dtype=bool)
    fpClass = np.zeros(10000,dtype=bool)
    fnModel = np.zeros(10000,dtype=bool)
    fnClass = np.zeros(10000,dtype=bool)
    
    #check where the model returns 1 and where the classified label returns -1
    fpModel[modelResults[i] == 1] = True
    fpClass[classifiedResults[i] == -1] = True
    
    #check where the model returns -1 and where the classified label returns 1
    fnModel[modelResults[i] == -1] = True
    fnClass[classifiedResults[i] == 1] = True
    
    #where fpModel and fpClass are both True, we have a false positive
    intersectionfp = np.stack((fpModel,fpClass)).all(axis=0)
    totalfp += len(intersectionfp[intersectionfp == True])
    
    #where fnModel and fnClass are both True, we have a false negative
    intersectionfn = np.stack((fnModel,fnClass)).all(axis=0)
    totalfn += len(intersectionfn[intersectionfn == True])
    
print("Total number of false positives: ")
print(totalfp)

print("Total number of false negatives: ")
print(totalfn)

This is our overall success rate: 
0.6711
Total number of false positives: 
520
Total number of false negatives: 
3132


In [79]:
def configureMatrices(digit):
    #configuring -1 and 1 depending on training labels
    classifiedLabelTrain = np.ones(60000)
    classifiedLabelTrain[labelsTrain != digit] = -1
    
    #solving for xhat and D@xhat based on training set
    ans = LA.lstsq(B,classifiedLabelTrain,rcond=tol)[0]
    evalTrain = D@ans
    
    #initializing array of -1 and 1 depending on testing labels
    classifiedLabelTest= np.ones(10000)
    classifiedLabelTest[labelsTest != digit] =-1
    
    #initializing boolean arrays for false positives and negatives
    fpClass = np.zeros(10000,dtype=bool)
    fpClass[classifiedLabelTest == -1] = True
    fnClass = np.zeros(10000,dtype=bool)
    fnClass[classifiedLabelTest == 1] = True 
    return evalTrain,fpClass,fnClass

def classifyGuess(guess,evalTest):
    #assigning -1 and 1 depending on our threshold guess
    modelresultsTest = np.ones(10000)
    modelresultsTest[evalTest <= guess] = -1
    return modelresultsTest

#search algorithm that finds and saves the optimal threshold
def threshold(begin, end, digit):
    curr = begin
    leastError = curr
    totalFalse = 10000 
    
    evalTest,fpClass,fnClass = configureMatrices(digit)
    while(curr <= end):
        falseCounter = 0
        
        #default boolean array of false
        fpModel = np.zeros(10000,dtype=bool)
        fnModel = np.zeros(10000,dtype=bool)
        
        modelresultsTest = classifyGuess(curr,evalTest)
        
        #check where the model returns 1 and where the classified label returns -1
        fpModel[modelresultsTest == 1] = True
    
        #check where the model returns -1 and where the classified label returns 1
        fnModel[modelresultsTest == -1] = True
    
        #where fpModel and fpClass are both True, we have a false positive
        intersectionfp = np.stack((fpModel,fpClass)).all(axis=0)
        falseCounter += len(intersectionfp[intersectionfp == True])
    
        #where fnModel and fnClass are both True, we have a false negative
        intersectionfn = np.stack((fnModel,fnClass)).all(axis=0)
        falseCounter += len(intersectionfn[intersectionfn == True])

        if(falseCounter < totalFalse):
            leastError = curr
            totalFalse = falseCounter
        curr += 0.001
    return leastError, totalFalse

for i in range(10):
    thresh,error = threshold(-1,1,i)
    print("This is the threshold with the lowest error rate for " + str(i) + ": ")
    print(thresh)
    print("This is our error rate: ")
    print(str((error/10000) * 100) + "%", end="\n\n")
    

This is the threshold with the lowest error rate for 0: 
-0.08099999999999918
This is our error rate: 
1.3599999999999999%

This is the threshold with the lowest error rate for 1: 
-0.010999999999999122
This is our error rate: 
1.6500000000000001%

This is the threshold with the lowest error rate for 2: 
-0.24699999999999933
This is our error rate: 
3.1399999999999997%

This is the threshold with the lowest error rate for 3: 
-0.14099999999999924
This is our error rate: 
3.4099999999999997%

This is the threshold with the lowest error rate for 4: 
-0.16499999999999926
This is our error rate: 
2.8000000000000003%

This is the threshold with the lowest error rate for 5: 
-0.3339999999999994
This is our error rate: 
3.7800000000000002%

This is the threshold with the lowest error rate for 6: 
-0.0949999999999992
This is our error rate: 
2.33%

This is the threshold with the lowest error rate for 7: 
-0.1009999999999992
This is our error rate: 
3.18%

This is the threshold with the lowest 

In [None]:
#You can't use the test set to optimize the threshold because if the test set is used, then we would be
#overfitting the model, producing high bias for this test set. An accurate predictor would ideally be 
#consistent across multiple test sets, but if you optimize and overfit for this test set, it would be
#useless for any data set in the real-world. Thus, the purpose of a training set is to find an ideal
#model that works for multiple test sets.