# Google Colab Setup for Assignment

### Introduction to Google Colab
Google Colab is a free, cloud-based platform that allows you to write and execute Python code through your browser. It provides free access to computing resources including GPUs, making it an excellent tool for machine learning and deep learning projects. We'll be using Colab for our assignments this semester due to its ease of use and consistent environment across all students.

### Setting Up Your Assignment Environment
To ensure that everyone has access to the necessary files and can save their work, we'll be using Google Drive integration with Colab. The code below mounts your Google Drive to the Colab environment and sets up the correct working directory for the assignment.

Please run the following code, replacing the `FOLDERNAME` with the correct path to your assignment folder in Google Drive:

In [None]:
from google.colab import drive
import os

# 1. Mounting Google Drive: This allows Colab to access files in your Google Drive
drive.mount('/content/drive')

# 2. Tell Colab where to find your assignment files and where to save your work

# TODO: Enter the relative path in your Google Drive of the assignment.
FOLDERNAME = "CS 7643/ps0/" # e.g. 'cs7643/ps0/'

assert FOLDERNAME is not None, "[!] Enter the foldername."

assert os.path.exists("/content/drive/MyDrive/" + FOLDERNAME), "Make sure your FOLDERNAME is correct"

Mounted at /content/drive


# NumPy Exercises


---


## In this assignment we are going to do some basic coding exercises using NumPy.

### **IMPORTANT**: Remember to keep all output visible. We will grade a PDF export of this notebook.

In [None]:
import numpy as np

### 1. Create a zero vector of size 10

In [None]:
a = None
###################
a = np.zeros(10)
###################
print(a)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


### 2. Create a **int64** matrix of size (10, 10) with the diagonal values set to -1

In [None]:
a = None
###################
a = np.zeros((10, 10), dtype=np.int64)
np.fill_diagonal(a, -1)
###################
print(a)
print(a.dtype)

[[-1  0  0  0  0  0  0  0  0  0]
 [ 0 -1  0  0  0  0  0  0  0  0]
 [ 0  0 -1  0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0  0  0  0]
 [ 0  0  0  0 -1  0  0  0  0  0]
 [ 0  0  0  0  0 -1  0  0  0  0]
 [ 0  0  0  0  0  0 -1  0  0  0]
 [ 0  0  0  0  0  0  0 -1  0  0]
 [ 0  0  0  0  0  0  0  0 -1  0]
 [ 0  0  0  0  0  0  0  0  0 -1]]
int64


 ### 3. Create a 10x10 matrix and fill it with a checkerboard pattern

In [None]:
a = None
###################
a = np.zeros((10, 10), dtype=np.int64)
a[::2, ::2] = 1
a[1::2, 1::2] = 1
###################
print(a)

[[1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]]


### 4. Randomly place five 1's in a matrix

In [None]:
a = np.zeros((8, 8))
###################
indices = np.random.choice(a.size, 5, replace=False)
a.flat[indices] = 1
###################
print(a)

[[0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 1.]]


### 5. Channel conventions

Most image data is given in a *channel-last convention*, e.g., an image is represented as a tensor of shape `(224, 224, 3)`, with the last dimension as the RGB channel dimension. In some deep learning frameworks such as PyTorch, however, images are often taken as input in *channel-first convention*. Now you are given a channel-last image tensor `im`. Convert it to a *channel-first* image tensor.

In [None]:
im = np.random.randn(224, 224, 3)
print(im.shape)

###################
im = np.moveaxis(im, -1, 0)
###################

print(im.shape)


(224, 224, 3)
(3, 224, 224)


### 6. Color channels
Now given the channel-first RGB image `im`, swap its color channel to BGR.

**Note**: You don't need to print anything for this question.

In [None]:
###################
im = im[::-1, :, :]
###################

### 7. Given a 1D array, negate all elements which are between 3 and 8, in place.

In [None]:
a = np.arange(11)
###################
a[(a > 3) & (a < 8)] *= -1
###################
print(a)

[ 0  1  2  3 -4 -5 -6 -7  8  9 10]


###8. Convert a `float64` array to an `uint8` array

In [None]:
a = np.zeros(10)
print(a.dtype)
###################
a = (a * 255).astype(np.uint8)
###################
print(a.dtype)

float64
uint8


### 9. Subtract the mean of each row of a matrix

In [None]:
a = np.random.randn(3,5)
###################
a = a - a.mean(axis=1, keepdims=True)
###################
print(a)

[[-1.21146082  1.01782617 -0.20347257 -0.17403946  0.57114668]
 [ 0.26129697  0.10594407  0.2496439  -1.03796076  0.42107581]
 [-1.37466534  0.10700021 -0.16776447  2.27377404 -0.83834444]]


### 10. If you did Q9 with a loop, can you do it without loop?
Feel free to ignore this question if you already did it in Q9.

In [None]:
a = np.random.randn(3, 5)
###################
# Your code here
###################
print(a)

[[-0.99041376 -0.70638972  0.78690337 -0.22535341  1.23769789]
 [-0.68688421  1.39441425 -0.80613746  0.18362066 -0.69620401]
 [-0.16733512 -0.9773473  -1.14995055 -0.88776314 -0.7242936 ]]


### 11. Sort a matrix `a` by its second column

In [None]:
a = np.random.randint(low=0, high=5, size=(5, 2))
###################
a = a[a[:, 1].argsort()]
###################
print(a)

[[2 0]
 [2 1]
 [3 2]
 [4 2]
 [0 3]]


### 12. One-hot encoding
One-hot is a commonly-used representation in Deep Learning. For example, an integer `5` is represented as a size-*N* array where the 5th element is `1` and all other elements are all zeros. For example, if `N=10`, we have:

`[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]`

Convert the following array of Integer to its one-hot encoding with `N=10`. The result should be a matrix (2D array) of shape `(5, 10)`.

In [None]:
a = np.arange(5)
###################
N = 10
one_hot = np.zeros((a.size, N))
one_hot[np.arange(a.size), a-1] = 1
###################
print(a)
print(one_hot)

[0 1 2 3 4]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]


### 13. Broadcasting
In a single expression, multiply the n-th row of matrix `a` with the n-th element in vector `b`.

In [None]:
a = np.ones((5, 5))
b = np.arange(5)
###################
a = a * b[:, np.newaxis]
###################

### 14. Padding
Without using `np.pad`, pad the following array with a border of zeros, i.e.,



```
array([[0., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 0., 0., 0., 0., 0.]])
```



In [None]:
a = np.ones((4, 4))
###################
padded = np.zeros((a.shape[0] + 2, a.shape[1] + 2))
padded[1:-1, 1:-1] = a
###################

### 14. Numerical and Analytical Differentiation

**Background**

In deep learning and optimization, we often need to compute derivatives of complex functions that involve vectors and matrices. While automatic differentiation is commonly used in modern frameworks, understanding numerical differentiation is crucial for grasping the underlying concepts and debugging.

One simple method for numerical differentiation is the finite difference method. This method approximates the derivative of a function by calculating the rate of change over a small interval.

For a function $f(\mathbf{x})$ where $\mathbf{x}$ is a vector, the $i$-th partial derivative can be approximated using the central difference formula:

$$\frac{\partial f}{\partial x_i} \approx \frac{f(\mathbf{x} + h\mathbf{e}_i) - f(\mathbf{x} - h\mathbf{e}_i)}{2h}$$

Where:

- $f(\mathbf{x})$ is the function we want to differentiate
- $\mathbf{x}$ is the input vector
- $h$ is a small step size
- $\mathbf{e}_i$ is the $i$-th standard basis vector (a vector with 1 in the $i$-th position and 0 elsewhere)

Your task is to implement a function that can numerically differentiate an arbitrary Python function that takes a vector input, using the central difference method. You'll then apply this to various functions involving vectors and matrices.

### 14.a Implement Numerical Differentiation for Vector Inputs

Write a function `numerical_gradient(func, x, h=1e-5)` that takes:

- `func`: A Python function that takes a numpy array as input and returns a scalar
- `x`: The point at which to evaluate the gradient (a numpy array)
- `h`: The step size (default to 1e-5)

The function should return the approximate gradient of `func` at `x` as a numpy array.

Make sure to use vectorized operation to make your computation efficient.

In [None]:
def numerical_gradient(func, x, h=1e-5):
    grad = None
    ###################
    # grad = func(x + h * np.eye(x.size)).T - func(x - h * np.eye(x.size)).T
    # grad = grad / (2 * h)

    X_plus = x + h * np.eye(x.size)
    X_minus = x - h * np.eye(x.size)

    f_plus = np.array([func(x_i) for x_i in X_plus])
    f_minus = np.array([func(x_i) for x_i in X_minus])

    grad = (f_plus - f_minus) / (2 * h)
    ###################
    return grad

### 14.b Differentiating Simple Functions

Apply your numerical_gradient function to the following functions and compare the results with their analytical gradients:

- $f(\mathbf{x}) = \mathbf{x}^T \mathbf{x}$
- $f(\mathbf{x}) = \sin(x_1) + \cos(x_2) + \cos(x_3)$
- $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}$, where $A$ is a 3x3 matrix with random entries

For each function, calculate the numerical gradient at $\mathbf{x} = [1, 1, 1]$ functions. Compare these to the true gradients.

In [None]:
# Function 1: f(x) = x^T x (now 3D)
def f1(x):
    return np.dot(x, x)

def analytical_gradient_f1(x):
    ###################
    return 2 * x
    ###################

# Function 2: f(x) = sin(x_1) + cos(x_2) + cos(x_3)
def f2(x):
    return np.sin(x[0]) + np.cos(x[1]) + np.cos(x[2])

def analytical_gradient_f2(x):
    ###################
    return np.array([np.cos(x[0]), -np.sin(x[1]), -np.sin(x[2])])
    ###################

# Function 3: f(x) = x^T A x (now 3D)
def f3(x, A):
    return np.dot(x, np.dot(A, x))

def analytical_gradient_f3(x, A):
    ###################
    return np.dot(A.T, x) + np.dot(A, x)
    ###################

# Test point (now 3D for all functions)
x_3d = np.array([1.0, 1.0, 1.0])

# Random 3x3 matrix for function 3
A = np.random.randn(3, 3)

# Compute and compare gradients
def compare_gradients(f, analytical_grad, x, *args):
    numerical_grad = numerical_gradient(lambda x: f(x, *args), x)
    analytical_grad = analytical_grad(x, *args)

    print(f"Numerical gradient: {numerical_grad}")
    print(f"Analytical gradient: {analytical_grad}")
    print(f"Difference: {np.linalg.norm(numerical_grad - analytical_grad)}")
    print()

# Test function 1
print("Function 1: f(x) = x^T x")
compare_gradients(f1, analytical_gradient_f1, x_3d)

# Test function 2
print("Function 2: f(x) = sin(x_1) + cos(x_2) + cos(x_3)")
compare_gradients(f2, analytical_gradient_f2, x_3d)

# Test function 3
print("Function 3: f(x) = x^T A x")
compare_gradients(f3, analytical_gradient_f3, x_3d, A)

Function 1: f(x) = x^T x
Numerical gradient: [2. 2. 2.]
Analytical gradient: [2. 2. 2.]
Difference: 2.2694036439624267e-11

Function 2: f(x) = sin(x_1) + cos(x_2) + cos(x_3)
Numerical gradient: [ 0.54030231 -0.84147098 -0.84147098]
Analytical gradient: [ 0.54030231 -0.84147098 -0.84147098]
Difference: 2.8592656090443453e-11

Function 3: f(x) = x^T A x
Numerical gradient: [2.83304539 2.93370436 2.41300163]
Analytical gradient: [2.83304539 2.93370436 2.41300163]
Difference: 2.1535225970175988e-11



### 14.c: Differentiate a Complex Vector Function
Consider the following more complex function:
$$f(\mathbf{x}) = (\mathbf{x}^T A \mathbf{x}) * \sin(\mathbf{x}^T \mathbf{b}) + e^{-\mathbf{x}^T \mathbf{x}}$$
Where:

- $\mathbf{x}$ is an $n$-dimensional vector (use $n=5$ for this assignment)
- $A$ is an $n \times n$ matrix
- $\mathbf{b}$ is an $n$-dimensional vector


Implement this function in Python, generating random values for $A$ and $\mathbf{b}$.
Use your numerical_gradient function to compute its gradient at $\mathbf{x} = [1, 1, 1, 1, 1]$.

In [None]:
# Complex vector function
def complex_function(x, A, b):
    return (x.T @ A @ x) * np.sin(x.T @ b) + np.exp(-x.T @ x)

# Generate random A and b
n = 5
np.random.seed(42)  # for reproducibility
A = np.random.randn(n, n)
b = np.random.randn(n)

# Compute gradient at x = [1, 1, 1, 1, 1]
x = np.ones(n)
numerical_grad = numerical_gradient(lambda x: complex_function(x, A, b), x)

print(f"Gradient at x = {x}:")
print(numerical_grad)

Gradient at x = [1. 1. 1. 1. 1.]:
[-3.0171028  -1.86919823  2.25118     6.7941431   4.03785221]


### 14.d: Compare with Analytical Gradient Computed by PyTorch

PyTorch is a popular open-source machine learning library that we'll be using throughout this semester. It offers dynamic computational graphs, which allow for flexible model design, and provides automatic differentiation capabilities for efficient gradient computations. PyTorch can leverage GPU to massively accelerate training and inference complex models. One of its key features is the ability to compute analytical gradients automatically, which is crucial for training deep neural networks.

For a gentle introduction to PyTorch, let's compare the analytical gradient computed by PyTorch with the numerical gradient we implemented earlier. If you're running this notebook locally, make sure you have PyTorch (`torch`) installed. If you're using Google Colab, PyTorch should already be available.

In [None]:
import torch

# Complex vector function (PyTorch version)
def complex_function_torch(x, A, b):
    return (x @ A @ x) * torch.sin(x @ b) + torch.exp(-x @ x)

# Function to compute analytical gradient using PyTorch
def compute_analytical_gradient(x_np, A_np, b_np):
    x = torch.tensor(x_np, requires_grad=True, dtype=torch.float32)
    A = torch.tensor(A_np, dtype=torch.float32)
    b = torch.tensor(b_np, dtype=torch.float32)

    y = complex_function_torch(x, A, b)
    # compute gradient is as simple as calling .backward() on the output of your function!
    y.backward()

    return x.grad.numpy()

# Let's compare gradients!
analytical_grad = compute_analytical_gradient(x, A, b)

print("Analytical gradient (PyTorch):")
print(analytical_grad)
print("\nNumerical gradient (Finite Difference):")
print(numerical_grad)
print("\nDifference (L2 norm):")
print(np.linalg.norm(analytical_grad - numerical_grad))

Analytical gradient (PyTorch):
[-3.0171025 -1.8691981  2.2511797  6.7941427  4.0378523]

Numerical gradient (Finite Difference):
[-3.0171028  -1.86919823  2.25118     6.7941431   4.03785221]

Difference (L2 norm):
5.997333888854135e-07
