1.

Consider a convolutional neural network applied to an RGB input image of size $N\times N$ where,
for simplicity of analysis, $N$ is a power of 2. Suppose that


• the convolutions each cover $k \times k$ pixels,


• there are $d$ different convolutions per convolution layer,


• padding is used so that convolutions do not cause image shrinkage, and


• after each convolution layer there is a max pooling layer applied over non-overlapping
$2 \times 2$ pixel regions.

Suppose also that there $L$ convolution layers, followed by $F$ fully-connected layers with $h$
nodes per layer, and $n^{(o)}$ nodes in the output layer. Derive an expression for the number
of learnable convolution parameters and the number of learnable parameters in the fullyconnected and output layers.


Each convolution has $$k * k + 1$$ parameters since we need 1 convolution kernel (of size $k\times k$) that is used on every pixel of each image. We need the extra +1 because of the bias parameter.

Originally, there are 3 input channels (RGB), but it immediately goes to $d$ output channels. From then on, there is always $d$ input and output channels. As a result, there are
$$(3*k*k + 1)*d$$
parameters in the first layer, and 
$$(d*k*k + 1)*d$$
parameters in every subsequent layers.

This means we have 
$$(3*k*k + 1)*d + L* \left(d*k*k+1\right)*d$$
total convolutional parameters

Note that pooling doesn't affect the convolutional parameters because every pixel of every image uses the same kernel regardless of how many pixels there are.

Then we have to get the number of fully-connected parameters. The number of inputs to the first fully-connected layer is the number of outputs of the convolutional layers. 
The number of outputs of the convolutional layer is 
$$d*\frac{N}{2^L}*\frac{N}{2^L}$$
because it goes through $L$ pooling layers that halve the image dimensions.
This means the first fully-connected layer has 
$$d*\frac{N}{2^L}*\frac{N}{2^L}*h+h$$
parameters since it has $d*\frac{N}{2^L}*\frac{N}{2^L}$ inputs, $h$ outputs, and $h$ bias parameters.

Every subsequent fully connected layer has $h$ inputs, $h$ outputs, and $h$ bias parameters:
$$h*h+h$$
and there are $F-1$ of them (maybe there are $F-2$? It's ambiguous). This means there are
$$d*\frac{N}{2^L}*\frac{N}{2^L}*h+h+(F-1)(h*h+h)$$
total fully connected parameters.

Finally, the last layer has $h$ inputs, $n^{(o)}$ outputs, and $n^{(0)}$ biases
$$h*n^{(o)}+n^{(o)}$$

This means we have a grand total of
$$(3*k*k + 1)*d + L* \left(d*k*k+1\right)*d+d*\frac{N}{2^L}*\frac{N}{2^L}*h+h+(F-1)(h*h+h)+h*n^{(o)}+n^{(o)}$$
parameters

2.

 Let B be a NumPy array storing the confusion matrix for a multiclass classifier. Write code to compute
(a) The overall accuracy.

(b) The macro average precision.

(c) The micro average recall.

In [None]:
import numpy as np

def generate_confusion_matrix(y_true, y_pred, num_classes):
    """
    Generate a confusion matrix as a NumPy array.

    Parameters:
    - y_true: list or np.array of true labels
    - y_pred: list or np.array of predicted labels
    - num_classes: int, number of classes

    Returns:
    - np.array: Confusion matrix of shape (num_classes, num_classes)
    """
    cm = np.zeros((num_classes, num_classes), dtype=int)
    
    for t, p in zip(y_true, y_pred):
        cm[t, p] += 1
    
    return cm

# Example usage
y_true = [0, 1, 2, 2, 0, 1, 2, 0, 1, 0]  # True labels
y_pred = [0, 2, 2, 2, 0, 0, 1, 1, 1, 1]  # Predicted labels
num_classes = 3

cm = generate_confusion_matrix(y_true, y_pred, num_classes)
print("Confusion Matrix:")
# Prediction on the x-axis, True on the y-axis
print(cm)

# Question starts here (all the previous code was just to generate the confusion matrix)

# Note: accuracy = (true positives + true negatives)/(all)
#       precision = (true positives) / (true positives + false positives)
#       recall = (true positives) / (true positives + false negatives)

# Macro = average the precision/recall of each class (equally weigh each class)
# Micro = calculate precision/recall by total appearance number (don't equally weight each class)
# Micro is stupid cuz it's all just overall accuracy

accuracy = np.trace(cm) / np.sum(cm)

precision_per_class = np.diag(cm) / np.sum(cm, axis=0)
#print(precision_per_class)
macro_avg_precision = np.mean(precision_per_class)

# Note that micro recall and micro precision are both the exact same as overall accuracy lmao
micro_recall = np.trace(cm) / np.sum(cm)


print(f"Accuracy: {accuracy:.4f}")
print(f"Macro Avg Precision: {macro_avg_precision:.4f}")
print(f"Micro Avg Recall: {micro_recall:.4f}")

Confusion Matrix:
[[2 2 0]
 [1 1 1]
 [0 1 2]]
[0.66666667 0.25       0.66666667]
Accuracy: 0.5000
Macro Avg Precision: 0.5278
Micro Avg Recall: 0.5000


3.

Suppose a convolutional neural network has convolutions that cover $k\times k$ pixels. After $L$ levels
of convolutions, how many pixels are influenced by the color values at pixel location $(x, y)$
in the input image.  Ignore any boundary effects. Here, the feature values at location $(u, v)$
are “influenced by” location $(x, y)$ if a change in the R,G,B values at $(x, y)$ could potentially
change the activations at $(u, v)$.

Suppose a $2\times2$ max pooling operation is inserted after every 2nd convolution layer. How does
this change the answer.

Within 1 convolution, $(x,y)$ affects $k*k$ pixels, or every surrounding pixel that uses it in their convolution.

Within 2 convolutions, $(x,y)$ affects $2k*2k$ pixels.

We extend this to $L$ convolutions, where $(x,y)$ affects $Lk*Lk$ pixels.

If pooling is inserted after every 2nd convolution layer, we just divide the number of affected pixels by 4 every 2nd convolution layer. The only thing that affects the number of pixels is the reducing of the dimensionality. The pooling operation itself (the max function) doesn't affect the number of affected pixel.

4. 
Suppose we didn’t keep a separate validation set and only used the training set to monitor
training performance. What can go wrong during training?

Overfitting. The model will learn to predict the noise in the training set, regardless of whether it is reflective of data outside of the training set.

The validation set ensures that we have a representative sample of data outside of the training set.

5.
Let $B$ be a 2D NumPy array of size $N \times K$ where $N$ is the number of image retrieval queries, $K$
is the number of images retrieved for each query, and $B[q, j]$ is 1 if the $j$-th retrieved image
for query $q$ is “correct” — has the correct category. Write code to compute (a) the average
precision for row q and then (b) the mean average precision across all queries.


In [10]:
def generate_B(N, K, correctness_prob=0.5):
    """
    Generate a random B matrix of size N x K where each entry is 1 with probability correctness_prob.
    
    :param N: Number of queries/classes
    :param K: Number of retrieved images per query
    :param correctness_prob: Probability that a retrieved image is correct (1)
    :return: NumPy array B of shape (N, K)
    """
    return (np.random.rand(N, K) < correctness_prob).astype(int)

# Example usage
N, K = 5, 10  # Adjust as needed
B = generate_B(N, K)
print(B)

average_precisions_per_row = np.mean(B, axis = 1)

average_precision = np.mean(average_precisions_per_row)
print(average_precisions_per_row)
print(average_precision)


[[1 0 1 1 0 0 0 1 1 1]
 [1 0 0 0 0 1 1 0 1 1]
 [0 1 0 0 0 1 1 1 1 0]
 [0 1 1 0 1 0 1 1 1 1]
 [0 1 1 0 1 1 0 0 1 1]]
[0.6 0.5 0.5 0.7 0.6]
0.58
