# Problem 1:

Note for no layers, we have 784 inputs going to 10 outputs, and we assume a bias term per each.  This will yield 
$10 \cdot 784 $ parameters, as we have each pixel weighted, and 10 outputs, with a further 10 bias terms, one for each 
output node.  Therefore there is a total of $7840 + 10 = 7850$ parameters when we don't include a hidden layer.

If a model has $k$ hidden layers with $m$ nodes a piece we have the following calculation: For the first layer we have $784m + m$ parameters for intaking from the image to the first layer.  For the next $k-1$ layers, for each layer we have $m^2$ 
interconnects, and $m$ bias terms yielding $(k-1)(m^2 + m)$ parameters.  Finally we have $10m$ connections into the output layer, with an additionally $10$ bias.  Therefore, $$Params(m,k) = 784m + m + (k-1)(m^2 + m) + 10(m+1)$$.

# Problem 2:

For a given number of parameters P, we have the equation $784m + m + (k-1)(m^2 + m) + 10(m+1) = P.$  We can rearrange the equation to find that $k = \frac{P-10}{m(m+1)} - \frac{795}{m+1} + 1$.  In order to maximize/minimize the function 
we can take the m derivative, and set the numerator of the fraction equal to zero.  In this scenerio 
we have to solve the equation $0=-795m^2 + 2(P-10)m + (P-10)$.  This has the resulting $m$ values of 
$$ m = \frac{1}{795} \left(P-10 \pm \sqrt{(P-10)^2+795(P-10)} \right) $$.  Note that for sufficently large $P$ that 
the square root will always be positive, thus we will always have two roots, a maximum and a minimum.  However, 
for the smaller of the two roots, the limit approaches -0.5, and since we know that by our model we have to have at least 1 parameter per layer implies that solving for $m=1$ will yield $k_p$.  For the smallest $k$, we know for the smaller root $m_b$ that $\lim_{P \to \infty} k(m_b) = -1$.  Thus 
the smallest possible $k$ would have to be $k=1$.  Note that if $k_p$ is not an integer then one should round $k_p$ down since $m$ is already the 
correct integer of 1 then the actual number of used parameters falls "under budget".  

# Problem 3:

In [52]:
def m_num(k,P):
    if k == 1:
        return (P-10) // 795
    else:
        return (int)(math.sqrt((794+k)**2 + 4*(P-10)*(k-1)))// (2*(k-1))

In [42]:
import torch
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.optim as optim
import random
import math

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Using device: cuda


In [4]:
trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
testset = datasets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz


9.9%

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100.0%


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw


100.0%


Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz
Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz



100.0%
100.0%

Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz
Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






In [17]:
test_x = torch.Tensor( testset.data ) / 256.0 - 0.5
test_x = test_x.to(device)
test_y = torch.Tensor( testset.targets ).long()
test_y = test_y.to(device)
train_x = torch.Tensor( trainset.data ) / 256.0 - 0.5
train_x = train_x.to(device)
train_y = torch.Tensor( trainset.targets ).long()
train_y = train_y.to(device)

In [7]:
def get_batch(x, y, batch_size):
    n = x.shape[0]

    batch_indices = random.sample( [ i for i in range(n) ], k = batch_size )

    x_batch = x[ batch_indices ]
    y_batch = y[ batch_indices ]

    return x_batch, y_batch

In [33]:
class layerTesting(nn.Module):
    def __init__(self,k,m):
        super(layerTesting, self).__init__()

        self.layer_input = torch.nn.Linear( in_features = 28*28*1, out_features = m, bias=True )
        self.layer_output = torch.nn.Linear( in_features = m, out_features = 10, bias=True )
        self.linears = nn.ModuleList([nn.Linear(layers_size, layers_size) for i in range(k-1)])
        self.normalize = nn.LayerNorm(m)

    def forward(self, input_tensor):
        output = nn.Flatten()( input_tensor )
        output = self.layer_input(output)
        output = nn.ReLU()(output)
        output = self.normalize(output)
        for l in self.linears:
            output = l(output)
            output = nn.ReLU()(output)
            output = self.normalize(output)
        output = self.layer_output(output)
        return output

In [19]:
def confusion_matrix( model, x, y ):
    identification_counts = np.zeros( shape = (10,10), dtype = np.int32 )
    
    logits = model.forward( x )
    predicted_classes = torch.argmax( logits, dim = 1 )

    n = x.shape[0]

    for i in range(n):
        actual_class = int( y[i].item() )
        predicted_class = predicted_classes[i].item()
        identification_counts[actual_class, predicted_class] += 1

    return identification_counts

In [55]:
print(m_num(3,100000))
model = layerTesting(1,m_num(1,100000))
model.to(device)
loss_function = torch.nn.CrossEntropyLoss()

print("Initial Confusion Matrix")
print( confusion_matrix( model, test_x, test_y ) )

299.0
Initial Confusion Matrix
[[  9  47 415   0   3 147   4   0 355   0]
 [ 49 837 135   0   0 112   0   0   1   1]
 [ 33 218 315   0  32  98  93   0 228  15]
 [  5 271 318   2  11 252  13   0 136   2]
 [  0 322 486   1   0  13   0   4 126  30]
 [  1 306 293   0   4  59   6   4 213   6]
 [  3 137 389   0   8  50   0   0 362   9]
 [  0 500 281   0   0  90   0   0 157   0]
 [  2 309 176   0   5 158  15   4 292  13]
 [  0 321 487   1   0  17   0   5 157  21]]
