**Note to grader:** Each question consists of parts, e.g. Q1(i), Q1(ii), etc. Each part must be first graded  on a 0-4 scale, following the standard NJIT convention (A:4, B+: 3.5, B:3, C+: 2.5, C: 2, D:1, F:0). However, any given item may be worth 4 or 8 points; if an item is worth 8 points, you need to accordingly scale the 0-4 grade.


The total score must be re-scaled to 100. That should apply to all future assignments so that Canvas assigns the same weight on all assignments. 



# Assignment 1



## Preparation Steps

In [1]:
# Import all necessary python packages
import numpy as np
import torch



## <font color = 'blue'> Question 1. Basic Operations with Tensors </font>

Your task for this question is to follow the NumPy  [**tutorial**](https://jalammar.github.io/visual-numpy/?fbclid=IwAR0tSntx5mj1aHteokRKrT4G6z77M3z0Quj40AQZ9mvKlhs2RTN3xXrc6Eo) and 'mirror' each of the operations presented in the tutorial with tensors in PyTorch. 

You may find useful to consult this PyTorch introductory [tutorial](https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/), and as always the full PyTorch [documentation](https://pytorch.org/docs/stable/torch.html) is the ultimate resource.

*(Please insert cells below for your answers )*

---
#### Creating Arrays

In [4]:
np.array([1,2,3])

array([1, 2, 3])

In [15]:
torch.tensor([1,2,3])

tensor([1, 2, 3])

Use to inistalize an array's shape

In [19]:
torch.ones(3)
torch.zeros(3)
torch.rand(3)


tensor([0.5248, 0.4510, 0.7375])

#### Array Arithmetic
Create two arrays and then lets do some simple arithmetic.

In [20]:
data = torch.tensor([1,2])
ones = torch.ones(2)

#Keep in mind that the arrays must be of the same shape for this orperation to work.
sum = data + ones 

sum

tensor([2., 3.])

In [21]:
difference = data - ones
product = data * ones 
quotient = data / ones 

print(difference,product,quotient)

tensor([0., 1.]) tensor([1., 2.]) tensor([1., 2.])


This can also be done between a vector and a scalar value.<br>
This is called "Broadcasting"

In [22]:
data = torch.tensor([1,2])

scalar_transforms = data * 1.6 
scalar_transforms

tensor([1.6000, 3.2000])

#### Indexing Numpy Arrays
Same as indexing python lists

In [23]:
data = torch.tensor([1,2,3])
print(data[0],data[1],data[0:2],data[1:])

tensor(1) tensor(2) tensor([1, 2]) tensor([2, 3])


#### Aggregation 
Many more operations than listed here

In [34]:
data = torch.tensor([1,2,3])
max = data.max()
min = data.min() 
sum = data.sum()
mean = data.float().mean() 
product = data.float().prod() 
sd = data.float().std()

print(f"Max: {max}, Min: {min}, Mean: {mean}, Sum: {sum}, Product: {product}, Standard Deviation: {sd}")

Max: 3, Min: 1, Mean: 2.0, Sum: 6, Product: 6.0, Standard Deviation: 1.0


#### In More Dimensions 
(Arrays with more and 1 dimension)

In [35]:
#Two dimension Matrix
# Number of brackets indicates how many dimensions
torch.tensor([[1,2],[3,4]])

tensor([[1, 2],
        [3, 4]])

In [40]:
#((row,column))
# Number of parenthesis indicates how many dimensions
torch.ones((3,2))
torch.zeros((3,2))
torch.rand((3,2))

tensor([[0.4618, 0.0469],
        [0.6281, 0.9009],
        [0.7208, 0.3040]])

#### Matrix Arithmetic

In [42]:
ones = torch.ones((2,2))
data = torch.tensor([[1,2],[3,4]])

sum = ones + data

sum

tensor([[2., 3.],
        [4., 5.]])

*this can be done with all of the arithmetic operators (+-*/)

We can get away with doing these arithmetic operations on matrices of different size only if the different dimension is one (e.g. the matrix has only one column or one row)

In [2]:
data = torch.tensor([[1,2],[3,4],[5,6]])
ones_row = torch.ones(2)

sum = data + ones_row

print(f"data + ones_row = \n{data} \n+ \n{ones_row} \n= \n{sum}")

data + ones_row = 
tensor([[1, 2],
        [3, 4],
        [5, 6]]) 
+ 
tensor([1., 1.]) 
= 
tensor([[2., 3.],
        [4., 5.],
        [6., 7.]])


#### Dot Product 
A key distinction to make with arithmetic is the case of matrix multiplication using the dot product. NumPy gives every matrix a dot() method we can use to carry-out dot product operations with other matrices:<br>
**Matrices must have one common dimension!**

In [7]:
data = torch.tensor([1,2,3])
powers_of_ten = torch.tensor([[1,10],[100,1000],[10000,100000]])

dot_prod = torch.matmul(data,powers_of_ten) 
dot_prod

tensor([ 30201, 302010])

>What is happening: 1x1 + 2x100 + 3x10,000 | 1x10 + 2x1,000 + 3x100,000 = 30201 | 302010

#### Matrix Indexing 
Indexing and slicing operations become even more useful when we’re manipulating matrices:<br>
**array[Row,Column]**

In [8]:
data = torch.tensor([[1,2],[3,4],[5,6]])

#[Row,Column]- If no Row or Column value is entered it exp
ex1 = data[0,1] 

#Slicing
ex2 = data[1:3]
ex3 = data[0:2,0]


print(f"ex1 = {ex1}\nex2 = {ex2}\nex3 = {ex3}")

ex1 = 2
ex2 = tensor([[3, 4],
        [5, 6]])
ex3 = tensor([1, 3])


#### Matrix Aggregation
We can aggregate matrices the same way we aggregated vectors:<br>
^ This can be done with alot of other aggregation methods:)

In [9]:
data = torch.tensor([[1,2],[3,4],[5,6]])

max = data.max()
min = data.min()
sum = data.sum()

print(f"max = {max}, min = {min}, sum = {sum} ")

max = 6, min = 1, sum = 21 


Not only can we aggregate all the values in a matrix, but we can also aggregate across the rows or columns by using the axis parameter:


In [11]:
max_axis_0 = data.max(dim=0)
max_axis_1 = data.max(dim=1)

print(f"Axis 0 max: {max_axis_0}, Axis 1 max: {max_axis_1}")


Axis 0 max: torch.return_types.max(
values=tensor([5, 6]),
indices=tensor([2, 2])), Axis 1 max: torch.return_types.max(
values=tensor([2, 4, 6]),
indices=tensor([1, 1, 1]))


#### Transposing and Reshaping
A common need when dealing with matrices is the need to rotate them. This is often the case when we need to take the dot product of two matrices and need to align the dimension they share. NumPy arrays have a convenient property called **T** to get the transpose of a matrix:<br>

In [12]:
data = torch.tensor([[1,2],[3,4],[5,6]])

t_data = data.T 

print(f"Data:\n{data}\nTransposed Data:\n{t_data}")

Data:
tensor([[1, 2],
        [3, 4],
        [5, 6]])
Transposed Data:
tensor([[1, 3, 5],
        [2, 4, 6]])


In more advanced use case, you may find yourself needing to switch the dimensions of a certain matrix. This is often the case in machine learning applications where a certain model expects a certain shape for the inputs that is different from your dataset. NumPy’s **reshape()** method is useful in these cases. You just pass it the new dimensions you want for the matrix. You can pass -1 for a dimension and NumPy can infer the correct dimension based on your matrix:

In [13]:
data = torch.tensor([1,2,3,4,5,6])

#Play with this:
reshape_data = data.reshape(3,2)

print(f"Data: {data}\nReshaped Data:\n{reshape_data} ")

Data: tensor([1, 2, 3, 4, 5, 6])
Reshaped Data:
tensor([[1, 2],
        [3, 4],
        [5, 6]]) 


#### Yet More Dimensions
NumPy can do everything we’ve mentioned in any number of dimensions. Its central data structure is called ndarray (N-Dimensional Array) for a reason.


In [14]:
n_data = torch.tensor([[[1,2],[3,4]],[[5,6],[7,8]]]) 
n_data

tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])

In [15]:
n_ones = torch.ones((4,3,2))
n_zeros = torch.zeros((4,3,2))
n_random = torch.rand((4,3,2))

print(f"n_ones:\n{n_ones}")

n_ones:
tensor([[[1., 1.],
         [1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.],
         [1., 1.]]])


##### torch.ones((4,3,2)): 4 High, 3 Wide, 2 depth or y = 4, x = 3, z = 2 in 3D Space
* Note: Keep in mind that when you print a 3-dimensional NumPy array, the text output visualizes the array differently than shown here. NumPy’s order for printing n-dimensional arrays is that the last axis is looped over the fastest, while the first is the slowest.

#### Practical Usage<br>
##### Formulas<br>
Implementing mathematical formulas that work on matrices and vectors is a key use case to consider NumPy for. It’s why NumPy is the darling of the scientific python community. For example, consider the mean square error formula that is central to supervised machine learning models tackling regression problems:




Mean Squared Error (MSE) is calculated as:

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (Y_{\text{prediction}_i} - Y_i)^2
$$

where:
- \( n \) is the number of observations.
- \( Y_{\text{prediction}_i} \) is the predicted value for the \( i \)-th observation.
- \( Y_i \) is the actual value for the \( i \)-th observation. 

Which results in the error value for that prediction and a score for the quality of the model.



In [16]:
n = 3
predictions = torch.ones(3)
labels = torch.tensor([1,2,3])

error = (1/n) * torch.sum(np.square(predictions-labels))
error

tensor(1.6667)

#### Data Representation<br>
Think of all the data types you’ll need to crunch and build models around (spreadsheets, images, audio…etc). So many of them are perfectly suited for representation in an n-dimensional array:<br>
##### Tables and Spreadsheets
- A spreadsheet or a table of values is a two dimensional matrix. Each sheet in a spreadsheet can be its own variable. The most popular abstraction in python for those is the **pandas dataframe**, which actually uses NumPy and builds on top of it.



Generate some random data to play with - CSV

In [7]:
import csv
import random

# Specify the number of rows and the file name
num_rows = 100
file_name = './random_data.csv'

# Generate data for each column
data = [{'id': i, 'value1': random.random(), 'value2': random.random()} for i in range(num_rows)]

# Write data to a CSV file
with open(file_name, mode='w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=['id', 'value1', 'value2'])
    writer.writeheader()
    writer.writerows(data)

print(f'{file_name} has been created with {num_rows} rows of random data.')


./random_data.csv has been created with 100 rows of random data.


In [13]:
import pandas as pd
df = pd.read_csv('random_data.csv')
df

Unnamed: 0,id,value1,value2
0,0,0.018787,0.826462
1,1,0.920879,0.138136
2,2,0.244157,0.758379
3,3,0.191055,0.203947
4,4,0.154480,0.189564
...,...,...,...
95,95,0.599771,0.503856
96,96,0.481265,0.955024
97,97,0.693665,0.620967
98,98,0.622628,0.217780


##### Audio and Timeseries
An audio file is a one-dimensional array of samples. Each sample is a number representing a tiny chunk of the audio signal. CD-quality audio may have 44,100 samples per second and each sample is an integer between -32767 and 32768. Meaning if you have a ten-seconds WAVE file of CD-quality, you can load it in a NumPy array with length 10 * 44,100 = 441,000 samples. Want to extract the first second of audio? simply load the file into a NumPy array that we’ll call audio, and get audio[:44100].

##### Images
An image is a matrix of pixels of size (height x width).

- If the image is black and white (a.k.a. grayscale), each pixel can be represented by a single number (commonly between 0 (black) and 255 (white)). Want to crop the top left 10 x 10 pixel part of the image? Just tell NumPy to get you image[:10,:10].

- If the image is colored, then each pixel is represented by three numbers - a value for each of red, green, and blue. In that case we need a 3rd dimension (because each cell can only contain one number). So a colored image is represented by an ndarray of dimensions: (height x width x 3).

##### Language
If we’re dealing with text, the story is a little different. The numeric representation of text requires a step of building a vocabulary (an inventory of all the unique words the model knows) and an embedding step. Let us see the steps of numerically representing this (translated) quote by an ancient spirit:

“Have the bards who preceded me left any theme unsung?”

A model needs to look at a large amount of text before it can numerically represent the anxious words of this warrior poet. We can proceed to have it process a small dataset and use it to build a vocabulary (of 71,290 words): 

The sentence can then be broken into an array of tokens (words or parts of words based on common rules):

We then replace each word by its id in the vocabulary table:

These ids still don’t provide much information value to a model. So before feeding a sequence of words to a model, the tokens/words need to be replaced with their embeddings (50 dimension word2vec embedding in this case):

You can see that this NumPy array has the dimensions [embedding_dimension x sequence_length]. In practice these would be the other way around, but I’m presenting it this way for visual consistency. For performance reasons, deep learning models tend to preserve the first dimension for batch size (because the model can be trained faster if multiple examples are trained in parallel). This is a clear case where **reshape()** becomes super useful. A model like BERT, for example, would expect its inputs in the shape: [batch_size, sequence_length, embedding_size]. 

This is now a numeric volume that a model can crunch and do useful things with. I left the other rows empty, but they’d be filled with other examples for the model to train on (or predict).

(It turned out the poet’s words in our example were immortalized more so than those of the other poets which trigger his anxieties. Born a slave owned by his father, Antarah’s valor and command of language gained him his freedom and the mythical status of having his poem as one of seven poems suspended in the kaaba in pre-Islamic Arabia).





In [None]:
# For grader use only

G = [0]*2


# insert grade here  (from 0 to 8)
# G[1] = 

# please justify point subtraction  s

##  <font color = 'yellow'> Question 2. Quadratic Regression

In the lecture we discussed a simple regression problem, where points are coming from the line $y = 2x + 1$, plus some noise.  For this question you are asked to: 

(i) Generate data points coming from a quadratic function: $ y = -x^2 + 3x +10 $, plus some noise. <br>
(ii) Modify the PyTorch model from the class in order for it to learn a quadratic function of the form $y = a x^2 + bx +c$. <br>
(iii) Train your model and report what values it computes. (Sanity check: These sould be close to -1, 3, 10)




https://colab.research.google.com/drive/1VfMeRBCNdTyzW4ZGBnj9AAIFXr1F2Bja?usp=drive_fs#scrollTo=eKvl7myycxoq

# (i)

In [1]:
import numpy as np
import sklearn
import torch
import torch.optim as optim
import torch.nn as nn

from torchviz import make_dot

In [14]:
# Data Generation
np.random.seed(42)
x = np.random.rand(100, 1)
y = 10 + 3 * x - x**2 + .1 * np.random.randn(100, 1)
print(f"RAW DATA (First 10 Samples):\n{y[:10]}")

# Shuffles the indices
idx = np.arange(100)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:80]
# Uses the remaining indices for validation
val_idx = idx[80:]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

print(f"Training and Validation Splits:\nx_train:{x_train}\ny_train:{y_train}\nx_val:{x_val}\ny_val:{y_val}")

RAW DATA (First 10 Samples):
[[10.99204476]
 [11.91838449]
 [11.66934277]
 [11.23882658]
 [10.42174692]
 [10.47936053]
 [10.31866653]
 [11.7964403 ]
 [11.36115642]
 [11.57267525]]
Training and Validation Splits:
x_train:[[0.77127035]
 [0.06355835]
 [0.86310343]
 [0.02541913]
 [0.73199394]
 [0.07404465]
 [0.19871568]
 [0.31098232]
 [0.47221493]
 [0.96958463]
 [0.12203823]
 [0.77513282]
 [0.80219698]
 [0.72960618]
 [0.09767211]
 [0.18485446]
 [0.15601864]
 [0.02058449]
 [0.98688694]
 [0.62329813]
 [0.70807258]
 [0.59789998]
 [0.92187424]
 [0.63755747]
 [0.28093451]
 [0.25877998]
 [0.11959425]
 [0.72900717]
 [0.94888554]
 [0.60754485]
 [0.5612772 ]
 [0.4937956 ]
 [0.18182497]
 [0.27134903]
 [0.96990985]
 [0.21233911]
 [0.18340451]
 [0.86617615]
 [0.37454012]
 [0.29122914]
 [0.80839735]
 [0.05808361]
 [0.83244264]
 [0.54269608]
 [0.77224477]
 [0.88721274]
 [0.0884925 ]
 [0.04522729]
 [0.59241457]
 [0.68423303]
 [0.71324479]
 [0.03438852]
 [0.60111501]
 [0.81546143]
 [0.44015249]
 [0.325183

# (ii)

In [66]:
np.random.seed(42)
a = np.random.randn(1)
b = np.random.randn(1)
c = np.random.randn(1)

print(f"Initialized Parameters:\na:{a},b:{b},c:{c}")

# Sets learning rate
lr = 1e-1
# Defines number of epochs
n_epochs = 100000

for epoch in range(n_epochs):
    # Computes our model's predicted output
    yhat = a * x_train**2 + b * x_train + c

    # The error
    error = (y_train - yhat)
    # It is a regression, so it computes mean squared error (MSE)
    loss = (error ** 2).mean()

    # Computes gradients for both "a" and "b" parameters
    a_grad = -2 * (x_train**2 * error).mean()
    b_grad = -2 * (x_train * error).mean()
    c_grad = -2 * error.mean()

    # Updates parameters using gradients and the learning rate
    a = a - lr * a_grad
    b = b - lr * b_grad
    c = c - lr * c_grad

print(f"Parameters after {n_epochs} epochs:\na:{a},b:{b},c:{c}")



Initialized Parameters:
a:[0.49671415],b:[-0.1382643],c:[0.64768854]
Parameters after 100000 epochs:
a:[-0.81125085],b:[2.78508306],c:[10.05058536]


# (iii)

In [79]:
# Sanity Check: do we get the same results as our gradient descent?
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

quadratic_model = make_pipeline(PolynomialFeatures(degree=2),LinearRegression())

quadratic_model.fit(x_train, y_train)
# Expose the linear model in the pipeline so that we can access it's parameters. 
linear_model = quadratic_model.named_steps['linearregression']
print(f"Quadratic Model Coeffients:\na:{linear_model.coef_[0][2]}, b:{linear_model.coef_[0][1]}, c:{linear_model.intercept_} ")

Quadratic Model Coeffients:
a:-0.8112508474398796, b:2.785083060532637, c:[10.05058536] 


In [None]:
# for grader use only

# insert grade here  
# part (i): 4, part(ii) 8, part (iii) 8

# G[2] = 
#
# please justify point subtractions

In [None]:
# total score
max_score = 36
final_score = sum(G)*(100/max_score)