<h1>AI in Fact and Fiction - Summer 2021</h1>
<h2>Introduction to Deep Learning and PyTorch Basics</h2>

<p>

<p>In this lab, you will learn the basics of PyTorch. You will learn about tensors, compare them to vectors, and NumPy arrays, learn the basics of tensor operations on 1D and 2D tensors, learn how to do differentiation in PyTorch, and learn about the Dataset class.</p>
<p>You will also learn about the building blocks of deep learning, and explore some basic ideas in Linear Regression and Gradient Descent.</p>

* Use [Google Collab](https://colab.research.google.com/github/AIFictionFact/Summer2021/blob/master/lab1.ipynb) to run the python code, and to complete any missing lines of code.
* You might find it helpful to save this notebook on your Google Drive.
* Please make sure to fill the required information in the **Declaration** cell.
* Once you complete the lab, please download the .ipynb file (File --> Download .ipynb).
* Then, please use the following file naming convention to rename the downloaded python file lab1_YourRCS.ipynb (make sure to replace 'YourRCS' with your RCS ID, for example 'lab1_senevo.ipynb').
* Submit the .ipynb file in LMS.

<p>Due Date/Time: Friday, Jun 18 <b>1.00 PM ET</b></p>

<p>Estimated Time Needed: <b>4 hours</b></p>

<p>Total Tasks: <b>21</b></p>
<p>Total Points: <b>50</b></p>

<hr>


**Declaration**

*Your Name* :

*Collaborators (if any)* :

*Online Resources consulted (if any):*

<h3>Preparation</h3>

Import the following libraries that you'll use for this lab:

In [None]:
import torch 
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

(By the way, `%matplotlib inline` ensures that all matplotlib plots will be plotted in the output cell within the notebook and will be kept in the notebook when saved. This type of code is called "line magics" and are functions that you can run on cells. They should be at the beginning of a line and take as an argument the rest of the line from where they are called.)

The following plotVec function is a helper function for plotting diagrams. You will use this function to plot the vectors in a coordinate system.

In [None]:
def plotVec(vectors):
    """ Plot vecotrs, please keep the parameters in the same length
    @param: vectors = [{"vector": vector variable, "name": name of vector, "color": color of the vector on diagram}]
    """
    ax = plt.axes()
    # For loop to draw the vectors
    for vec in vectors:
        ax.arrow(0, 0, *vec["vector"], head_width = 0.05,color = vec["color"], head_length = 0.1)
        plt.text(*(vec["vector"] + 0.1), vec["name"])
    plt.ylim(-2,2)
    plt.xlim(-2,2)

<h2 id="Types_Shape">Types and Shape of 1-D tensors</h2>

You can find the type of the following list of integers <i>[0, 1, 2, 3, 4]</i> by applying the method <code>torch.tensor()</code>:

In [None]:
# Convert an integer list with length 5 to a tensor.
# Then examine the type of the objects contained in the tensor, and the type of the tensor itself.
ints = [0, 1, 2, 3, 4]
ints_to_tensor = torch.tensor(ints)
print("The dtype of tensor object after converting it to tensor: ", ints_to_tensor.dtype)
print("The type of tensor object after converting it to tensor: ", ints_to_tensor.type())

You can find the type of this float list <i>[0.0, 1.0, 2.0, 3.0, 4.0]</i> by applying the method <code>torch.tensor()</code>:

In [None]:
# Convert a float list with length 5 to a tensor
floats = [0.0, 1.0, 2.0, 3.0, 4.0]
floats_to_tensor = torch.tensor(floats)
print("The dtype of tensor object after converting it to tensor: ", floats_to_tensor.dtype)
print("The type of tensor object after converting it to tensor: ", floats_to_tensor.type())

<p>The float list is converted to a float tensor.</p>
<b>Note: The elements in the list that will be converted to tensor must have the same type.</b>
<p>From the previous examples, you see that <code>torch.tensor()</code> converts the list to the tensor type, which is similar to the original list type. However, what if you want to convert the list to a certain tensor type? <code>torch</code> contains the methods required to do this conversion. The following code  converts an integer list to float tensor:</p>


In [None]:
# Convert a integer list with length 5 to float tensor
new_float_tensor = torch.FloatTensor(ints)
print("The type of the new_float_tensor:", new_float_tensor.type())

<p>You can also convert an existing tensor object (<code><i>tensor_obj</i></code>) to another tensor type. Convert the integer tensor to a float tensor:</p>

In [None]:
# Another method to convert the integer list to float tensor
old_int_tensor = torch.tensor(ints)
new_float_tensor = old_int_tensor.type(torch.FloatTensor)
print("The type of the new_float_tensor:", new_float_tensor.type())

<p>The <code><i>tensor_obj</i>.size()</code> helps you to find out the size of the <code><i>tensor_obj</i></code>.
The <code><i>tensor_obj</i>.ndimension()</code> shows the dimension of the tensor object.</p>

In [None]:
# Introduce the tensor_obj.size() & tensor_ndimension.size() methods

print("The size of the new_float_tensor: ", new_float_tensor.size())
print("The dimension of the new_float_tensor: ",new_float_tensor.ndimension())

<p>The <code><i>tensor_obj</i>.view(<i>row, column</i>)</code> is used for reshaping a tensor object.<br></p>
<p>What if you have a tensor object with <code>torch.Size([5])</code> as a <code>new_float_tensor</code> as shown in the previous example?<br>
After you execute <code>new_float_tensor.view(5, 1)</code>, the size of <code>new_float_tensor</code> will be <code>torch.Size([5, 1])</code>.<br>
This means that the tensor object <code>new_float_tensor</code> has been reshaped from a one-dimensional  tensor object with 5 elements to a two-dimensional tensor object with 5 rows and 1 column.</p>

In [None]:
# Introduce the tensor_obj.view(row, column) method
twoD_float_tensor = new_float_tensor.view(5, 1)
print("Original Size: ", new_float_tensor)
print("Size after view method", twoD_float_tensor)

<p>Note that the original size is 5. The tensor after reshaping becomes a 5X1 tensor analog to a column vector.</p>
<b>Note: The number of elements in a tensor must remain constant after applying view.</b>
<p>What if you have a tensor with dynamic size but you want to reshape it? You can use <b>-1</b> to do just that.</p>

In [None]:
# Introduce the use of -1 in tensor_obj.view(row, column) method
twoD_float_tensor = new_float_tensor.view(-1, 1)
print("Original Size: ", new_float_tensor)
print("Size after view method", twoD_float_tensor)

<p>You get the same result as the previous example. The <b>-1</b> can represent any size. However, be careful because you can set only one argument as <b>-1</b>.</p>
<p>You can also convert a <b>numpy</b> array to a <b>tensor</b>, for example: </p>

In [None]:
# Convert a numpy array to a tensor

numpy_array = np.array(floats)
new_tensor = torch.from_numpy(numpy_array)

print("The dtype of new tensor: ", new_tensor.dtype)
print("The type of new tensor: ", new_tensor.type())

<p>Converting a <b>tensor</b> to a <b>numpy</b> is also supported in PyTorch. The syntax is shown below:</p>

In [None]:
# Convert a tensor to a numpy array

back_to_numpy = new_tensor.numpy()
print("The numpy array from tensor: ", back_to_numpy)
print("The dtype of numpy array: ", back_to_numpy.dtype)

<p><code>back_to_numpy</code> and <code>new_tensor</code> still point to <code>numpy_array</code>. As a result, if we change <code>numpy_array</code> both <code>back_to_numpy</code> and <code>new_tensor</code> will change. For example if we set all the elements in <code>numpy_array</code> to zeros, <code>back_to_numpy</code> and <code> new_tensor</code> will follow suit.</p>

In [None]:
# Set all elements in numpy array to zero 
numpy_array[:] = 0
print("The new tensor points to numpy_array : ", new_tensor)
print("and back to numpy array points to the tensor: ", back_to_numpy)

<p><b>Pandas Series</b>, a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.), can also be converted by using the numpy array that is stored in <code>pandas_series.values</code>. Note that <code>pandas_series</code> can be any pandas_series object. </p>

In [None]:
# Convert a panda series to a tensor
pandas_series=pd.Series([0.1, 2, 0.3, 10.1])
new_tensor=torch.from_numpy(pandas_series.values)
print("The new tensor from numpy array: ", new_tensor)
print("The dtype of new tensor: ", new_tensor.dtype)
print("The type of new tensor: ", new_tensor.type())

<h3>Task 1 (1 point)</h3>
<p>Convert <code>your_tensor</code> to a 1X5 tensor.</p>

In [None]:
# Convert the following tensor to a tensor object with 1 row and 5 columns
your_tensor = torch.tensor([1, 2, 3, 4, 5])
# Type your code here

<h2 id="Index_Slice">Indexing and Slicing of 1-D tensors</h2>
<p>In Python, <b>the index starts with 0</b>. Therefore, the last index will always be 1 less than the length of the tensor object.
You can access the value on a certain index by using the square bracket, for example:</p>

In [None]:
# A tensor for showing how the indexs work on tensors

index_tensor = torch.tensor([0, 1, 2, 3, 4])
print("The value on index 0:",index_tensor[0])
print("The value on index 1:",index_tensor[1])
print("The value on index 2:",index_tensor[2])
print("The value on index 3:",index_tensor[3])
print("The value on index 4:",index_tensor[4])

<p>Now, you'll see how to change the values on certain indexes.</p>
<p>Suppose you have a tensor as shown here: </p>

In [None]:
# A tensor for showing how to change value according to the index

tensor_sample = torch.tensor([20, 1, 2, 3, 4])

<p>Assign the value on index 0 as 100:</p>

In [None]:
# Change the value on the index 0 to 100

print("Inital value on index 0:", tensor_sample[0])
tensor_sample[0] = 100
print("Modified tensor:", tensor_sample)

<p>As you can see, the value on index 0 changes. Change the value on index 4 to 0:</p>

In [None]:
# Change the value on the index 4 to 0

print("Inital value on index 4:", tensor_sample[4])
tensor_sample[4] = 0
print("Modified tensor:", tensor_sample)

<p>Next, let's get a subset of <code>tensor_sample</code>. The subset should contain the values in <code>tensor_sample</code> from index 1 to index 3.</p>
<p><b>Note: The number on the left side of the colon represents the index of the first value. The number on the right side of the colon is always 1 larger than the index of the last value. For example, <code>tensor_sample[1:4]</code> means you get values from the index 1 to index 3 <i>(4-1)</i></b>.</p>

In [None]:
# Slice tensor_sample

subset_tensor_sample = tensor_sample[1:4]
print("Original tensor sample: ", tensor_sample)
print("The subset of tensor sample:", subset_tensor_sample)

As for assigning values to the certain index, you can also assign the value to the slices:

Change the values of <code>tensor_sample</code> from index 3 to index 4 to something else:

In [None]:
# Change the values on index 3 and index 4

print("Inital value on index 3 and index 4:", tensor_sample[3:5])
tensor_sample[3:5] = torch.tensor([300.0, 400.0])
print("Modified tensor:", tensor_sample)

You can also use a variable to contain the selected indexes and pass that variable to a tensor slice operation as a parameter, for example:  

In [None]:
# Using variable to contain the selected index, and pass it to slice operation

selected_indexes = [3, 4]
subset_tensor_sample = tensor_sample[selected_indexes]
print("The inital tensor_sample", tensor_sample)
print("The subset of tensor_sample with the values on index 3 and 4: ", subset_tensor_sample)

You can also assign one value to the selected indexes by using the variable. For example, assign 100,000 to all the <code>selected_indexes</code>: (<b>Note: You can use only one value for the assignment.</b>)

In [None]:
#Using variable to assign the value to the selected indexes

print("The inital tensor_sample", tensor_sample)
selected_indexes = [1, 3]
tensor_sample[selected_indexes] = 100000
print("Modified tensor with one value: ", tensor_sample)

<h3>Task 2 (1 point)</h3>
Change the values on index 3, index 4, and index 7 of the following tensor to 0.

In [None]:
# Change the values on index 3, 4, 7 to 0
my_tensor = torch.tensor([2, 7, 3, 4, 6, 2, 3, 1, 2])
# Type your code here

<h2 id="Tensor_Func">Functions on 1-D tensors</h2>
In this section, you'll work with some methods that you can apply to tensor objects.

<h3>Mean and Standard Deviation</h3>
You'll review the mean and standard deviation methods first. They are two basic statistical methods.
<br/>
Create a tensor with values <i>[1.0, -1, 1, -1]</i>:


In [None]:
# Sample tensor for mathmatic calculation methods on tensor

math_tensor = torch.tensor([1.0, -1.0, 1, -1])
print("Tensor example: ", math_tensor)

Here is the mean method:  

In [None]:
#Calculate the mean for math_tensor

mean = math_tensor.mean()
print("The mean of math_tensor: ", mean)

The standard deviation can also be calculated by using <code><i>tensor_obj</i>.std()</code>:

In [None]:
#Calculate the standard deviation for math_tensor

standard_deviation = math_tensor.std()
print("The standard deviation of math_tensor: ", standard_deviation)

<h3>Max and Min</h3>

Now, you'll review another two useful methods: <code><i>tensor_obj</i>.max()</code> and <code><i>tensor_obj</i>.min()</code>. These two methods are used for finding the maximum value and the minimum value in the tensor.

Create a <code>max_min_tensor</code>: 

In [None]:
# Sample for introducing max and min methods

max_min_tensor = torch.tensor([1, 1, 3, 5, 5])
print("Tensor example: ", max_min_tensor)

<b>Note: There are two minimum numbers as 1 and two maximum numbers as 5 in the tensor. Can you guess how PyTorch is going to deal with the duplicates?</b>

Apply <code><i>tensor_obj</i>.max()</code> on <code>max_min_tensor</code>:

In [None]:
# Method for finding the maximum value in the tensor

max_val = max_min_tensor.max()
print("Maximum number in the tensor: ", max_val)

Use <code><i>tensor_obj</i>.min()</code> on <code>max_min_tensor</code>:

In [None]:
# Method for finding the minimum value in the tensor

min_val = max_min_tensor.min()
print("Minimum number in the tensor: ", min_val)

<h3>Sin</h3>

Create a tensor with 0, π/2 and π. Then, apply the sin function on the tensor. Notice here that the <code>sin()</code> is not a method of tensor object but is a function of torch:

In [None]:
# Method for calculating the sin result of each element in the tensor

pi_tensor = torch.tensor([0, np.pi/2, np.pi])
sin = torch.sin(pi_tensor)
print("The sin result of pi_tensor: ", sin)

The resultant tensor <code>sin</code> contains the result of the <code>sin</code> function applied to each element in the <code>pi_tensor</code>.<br>
This is different from the previous methods. For <code><i>tensor_obj</i>.mean()</code>, <code><i>tensor_obj</i>.std()</code>, <code><i>tensor_obj</i>.max()</code>, and <code><i>tensor_obj</i>.min()</code>, the result is a tensor with only one number because these are aggregate methods.<br>
However, the <code>torch.sin()</code> is not. Therefore, the resultant tensors have the same length as the input tensor.

<h3>Create Tensor by <code>torch.linspace()</code></h3>

A useful function for plotting mathematical functions is <code>torch.linspace()</code>. <code>torch.linspace()</code> returns evenly spaced numbers over a specified interval. You specify the starting point of the sequence and the ending point of the sequence. The parameter <code>steps</code> indicates the number of samples to generate. Now, you'll work with <code>steps = 5</code>.

In [None]:
# First try on using linspace to create tensor

len_5_tensor = torch.linspace(-2, 2, steps = 5)
print ("First try on linspace", len_5_tensor)

Assign <code>steps</code> with 9:

In [None]:
# Second try on using linspace to create tensor

len_9_tensor = torch.linspace(-2, 2, steps = 9)
print ("Second try on linspace", len_9_tensor)

Use both <code>torch.linspace()</code> and <code>torch.sin()</code> to construct a tensor that contains the 100 sin result in range from 0 (0 degree) to 2π (360 degree): 

In [None]:
# Construct the tensor within 0 to 360 degree

pi_tensor = torch.linspace(0, 2*np.pi, 100)
sin_result = torch.sin(pi_tensor)

Plot the result to get a clearer picture. You must cast the tensor to a numpy array before plotting it.

In [None]:
# Plot sin_result

plt.plot(pi_tensor.numpy(), sin_result.numpy())

<h3>Task 3 (2 points)</h3>
<p>Construct a tensor with 100 steps in the range -π/2 to π/2. Print out the maximum and minimum number in the tensor. Then apply the hyperbolic sine function (<code>torch.sinh()</code>) to the tensor you created and plot the result using <code>plt.plot</code> method.</p>

In [None]:
# Construct a tensor with 100 steps in the range -π/2 to π/2. Print out the maximum and minimum number. Then apply the hyperbolic sine function (torch.sinh())  to the tensor you created and plot the result.
# Type your code here

<h2 id="Tensor_Op">Tensor Operations on 1-D tensors</h2>
In the following section, you'll work with operations that you can apply to a tensor.

<h3>Tensor Addition</h3>
You can perform addition between two tensors.

Here we have a tensor <code>u</code> with 1 dimension and 2 elements, and another tensor <code>v</code> with the same number of dimensions and the same number of elements:

In [None]:
# Create two sample tensors

u = torch.tensor([1, 0])
v = torch.tensor([0, 1])
# Add u and v

w = u + v
print("The result tensor: ", w)

Plot the result using the `plotVec(` method we defined earlier in the lab.

In [None]:
# Plot u, v, w

plotVec([
    {"vector": u.numpy(), "name": 'u', "color": 'r'},
    {"vector": v.numpy(), "name": 'v', "color": 'b'},
    {"vector": w.numpy(), "name": 'w', "color": 'g'}
])

<h3>Task 4 (1 point)</h3> 
Implement the tensor subtraction with <code>u</code> and <code>v</code>, i.e., <code>u - v</code>, as w, print the result and plot it using the <code>plotVec</code> method.

In [None]:
u = torch.tensor([1, 0])
v = torch.tensor([0, 1])
# Implement the tensor subtraction with u and v, print the result and plot it.
# Type your code here

You can add a scalar to the tensor. Use <code>u</code> as the sample tensor:

In [None]:
# tensor + scalar

u = torch.tensor([1, 2, 3, -1])
v = u + 1
print ("Addition Result: ", v)

<h3>Tensor Multiplication </h3>
Now, you'll review the multiplication between a tensor and a scalar.

Create a tensor with value <code>[1, 2]</code> and then multiply it by 2:

In [None]:
# tensor * scalar

u = torch.tensor([1, 2])
v = 2 * u
print("The result of 2 * u: ", v)

The result is <code>tensor([2, 4])</code>, so the code <code>2 * u</code> multiplies each element in the tensor by 2. This is how you get the product between a vector or matrix and a scalar in linear algebra.

You can use multiplication between two tensors.

Create two tensors <code>u</code> and <code>v</code> and then multiply them together:

In [None]:
# tensor * tensor

u = torch.tensor([1, 2])
v = torch.tensor([3, 2])
w = u * v
print ("The result of u * v", w)

The result is simply <code>tensor([3, 4])</code>. This result is achieved by multiplying every element in <code>u</code> with the corresponding element in the same position <code>v</code>, which is similar to <i>[1 * 3, 2 * 2]</i>.

<h3>Dot Product</h3>
The dot product is a special operation for a vector that you can use in Torch.

Here is the dot product of the two tensors <code>u</code> and <code>v</code>:

In [None]:
# Calculate dot product of u, v

u = torch.tensor([1, 2])
v = torch.tensor([3, 2])

print("Dot Product of u, v:", torch.dot(u,v))

The result is <code>tensor(7)</code>. The function is <i>1 x 3 + 2 x 2 = 7</i>.

<h3>Task 5 (1 point)</h3>

Convert the lists <i>[-1, 1]</i> and <i>[1, 1]</i> to tensors <code>u</code> and <code>v</code>. Then, plot the tensor <code>u</code> and <code>v</code> as a vector by using the function <code>plotVec</code> and find the dot product as <code>w</code> and print it.

In [None]:
# Calculate the dot product of u and v, and plot out two vectors
# Type your code here

<h2 id="Types_Shape">Types and Shape of 2-D tensors</h2>

Let us see how to convert a 2D list to a 2D tensor. First, let us create a 3X3 2D list. Then let us try to use <code>torch.tensor()</code> which we used for converting a 1D list to 1D tensor. Is it going to work?

In [None]:
# Convert 2D List to 2D Tensor

twoD_list = [[11, 12, 13], [21, 22, 23], [31, 32, 33]]
twoD_tensor = torch.tensor(twoD_list)
print("The New 2D Tensor: ", twoD_tensor)

Let us try <code><i>tensor_obj</i>.ndimension()</code> (<code>tensor_obj</code>: This can be any tensor object), <code><i>tensor_obj</i>.shape</code>, and <code><i>tensor_obj</i>.size()</code>

In [None]:
# Try tensor_obj.ndimension(), tensor_obj.shape, tensor_obj.size()

print("The dimension of twoD_tensor: ", twoD_tensor.ndimension())
print("The shape of twoD_tensor: ", twoD_tensor.shape)
print("The shape of twoD_tensor: ", twoD_tensor.size())

Now, let us try converting the tensor to a numpy array and convert the numpy array back to a tensor.

In [None]:
# Convert tensor to numpy array; Convert numpy array to tensor

twoD_numpy = twoD_tensor.numpy()
print("Tensor -> Numpy Array:")
print("The numpy array after converting: ", twoD_numpy)
print("Type after converting: ", twoD_numpy.dtype)

print("================================================")

new_twoD_tensor = torch.from_numpy(twoD_numpy)
print("Numpy Array -> Tensor:")
print("The tensor after converting:", new_twoD_tensor)
print("Type after converting: ", new_twoD_tensor.dtype)

Now let us try to convert a Pandas Dataframe to a tensor. The process is the same as the 1D conversion. We can obtain the numpy array via the attribute <code>values</code>. Then, we can use <code>torch.from_numpy()</code> to convert the value of the Pandas Series to a tensor.

In [None]:
# Try to convert the Panda Dataframe to tensor

df = pd.DataFrame({'a':[11,21,31],'b':[12,22,312]})

print("Pandas Dataframe to numpy: ", df.values)
print("Type BEFORE converting: ", df.values.dtype)

print("================================================")

new_tensor = torch.from_numpy(df.values)
print("Tensor AFTER converting: ", new_tensor)
print("Type AFTER converting: ", new_tensor.dtype)

<h3>Task 6 (1 point)</h3>
Convert the following Pandas Dataframe  to a tensor.

In [None]:
# Convert Pandas Series to tensor
df = pd.DataFrame({'A':[11, 33, 22],'B':[3, 3, 2]})
# Type your code here

<h2 id="Index_Slice">Indexing and Slicing of 2D-tensors</h2>
You can use rectangular brackets to access the different elements of the tensor. Now, let us try to access the value on position 2nd-row 3rd-column. Remember that the index is always 1 less than how we count rows and columns. There are two ways to access the certain value of a tensor.

In [None]:
# Use tensor_obj[row, column] and tensor_obj[row][column] to access certain position

tensor_example = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])
print("What is the value on 2nd-row 3rd-column? ", tensor_example[1, 2])
print("What is the value on 2nd-row 3rd-column? ", tensor_example[1][2])

What if we want to get the value on both 1st-row 1st-column and 1st-row 2nd-column? You can also use slicing in a tensor.

In [None]:
# Use tensor_obj[begin_row_number: end_row_number, begin_column_number: end_column number] 
# and tensor_obj[row][begin_column_number: end_column number] to do the slicing

tensor_example = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])
print("What is the value on 1st-row first two columns? ", tensor_example[0, 0:2])
print("What is the value on 1st-row first two columns? ", tensor_example[0][0:2])

We get the result as <code>tensor([11, 12])</code> successfully.

But we <b>can't</b> combine using slicing on row and pick one column by using the code <code>tensor_obj[begin_row_number: end_row_number][begin_column_number: end_column number]</code>. The reason is that the slicing will be applied on the tensor first. The result type will be a two dimension again. The second bracket will no longer represent the index of the column it will be the index of the row at that time. Let us see an example. 

In [None]:
# Give an idea on tensor_obj[number: number][number]

tensor_example = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])
sliced_tensor_example = tensor_example[1:3]
print("1. Slicing step on tensor_example: ")
print("Result after tensor_example[1:3]: ", sliced_tensor_example)
print("Dimension after tensor_example[1:3]: ", sliced_tensor_example.ndimension())
print("================================================")
print("2. Pick an index on sliced_tensor_example: ")
print("Result after sliced_tensor_example[1]: ", sliced_tensor_example[1])
print("Dimension after sliced_tensor_example[1]: ", sliced_tensor_example[1].ndimension())
print("================================================")
print("3. Combine these steps together:")
print("Result: ", tensor_example[1:3][1])
print("Dimension: ", tensor_example[1:3][1].ndimension())

Notice that the results and dimensions in scenarios 2 and 3 above are the same. Both of them contain the 3rd row in the <code>tensor_example</code>, but not the last two values in the 3rd column.

So how can we get the elements in the 3rd column with the last two rows? 

In [None]:
# Use tensor_obj[begin_row_number: end_row_number, begin_column_number: end_column number] 

tensor_example = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])
print("What is the value on 3rd-column last two rows? ", tensor_example[1:3, 2])

Fortunately, the code <code>tensor_obj[begin_row_number: end_row_number, begin_column_number: end_column number]</code> still works.

<h3>Task 7 (1 point)</h3>
Change the values of the last two rows of the second column to 0. Basically, change the values on <code>tensor_ques[1][1]</code> and <code>tensor_ques[2][1]</code> to 0.

In [None]:
# Use slice and index to change the values on the matrix tensor_ques.
tensor_ques = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])
# Type your code here

<h2 id="Tensor_Op">Operations on 2D-tensors</h2> 

<h3>Tensor Addition</h3>
You can add tensors; the process is identical to matrix addition.

In [None]:
# Calculate [[1, 0], [0, 1]] + [[2, 1], [1, 2]]

X = torch.tensor([[1, 0],[0, 1]]) 
Y = torch.tensor([[2, 1],[1, 2]])
X_plus_Y = X + Y
print("The result of X + Y: ", X_plus_Y)

<h3> Scalar Multiplication </h3>

Multiplying a tensor by a scalar is identical to multiplying a matrix by a scaler. If you multiply the matrix <b>Y</b> by the scalar 2, you simply multiply every element in the matrix by 2.

In [None]:
# Calculate 2 * [[2, 1], [1, 2]]

Y = torch.tensor([[2, 1], [1, 2]]) 
two_Y = 2 * Y
print("The result of 2Y: ", two_Y)

<h3>Element-wise Product</h3>

Multiplication of two tensors corresponds to an element-wise product or Hadamard product.  Consider matrix the <b>X</b> and <b>Y</b> with the same size. The Hadamard product corresponds to multiplying each of the elements at the same position, that is, multiplying elements with the same color together. The result is a new matrix that is the same size as matrix <b>X</b> and <b>Y</b>.

In [None]:
# Calculate [[1, 0], [0, 1]] * [[2, 1], [1, 2]]

X = torch.tensor([[1, 0], [0, 1]])
Y = torch.tensor([[2, 1], [1, 2]]) 
X_times_Y = X * Y
print("The result of X * Y: ", X_times_Y)

<h3>Matrix Multiplication </h3>

We can also apply matrix multiplication to two tensors, if you have learned linear algebra, you should know that in the multiplication of two matrices order matters. This means if <i>X * Y</i> is valid, it does not mean <i>Y * X</i> is valid. The number of columns of the matrix on the left side of the multiplication sign must equal to the number of rows of the matrix on the right side.

First, let us create a tensor <code>A</code> with size 2X3. Then, let us create another tensor <code>B</code> with size 3X2. Since the number of columns of <code>A</code> is equal to the number of rows of <code>B</code>. We are able to perform the multiplication.

We use <code>torch.mm()</code> for calculating the multiplication between tensors with different sizes.

In [None]:
# Calculate [[0, 1, 1], [1, 0, 1]] * [[1, 1], [1, 1], [-1, 1]]

A = torch.tensor([[0, 1, 1], [1, 0, 1]])
B = torch.tensor([[1, 1], [1, 1], [-1, 1]])
A_times_B = torch.mm(A,B)
print("The result of A * B: ", A_times_B)

<h3>Task 8 (1 point)</h3>

Create two tensors: <code>X</code> should be a 2x3 matrix and <code>Y</code> should be a 3x4 matrix. Multiply <code>X</code> and <code>Y</code> and print the result.

In [None]:
# Create two tensors: X should be a 2x3 matrix and Y should be a 3x4 matrix. Multiply <code>X</code> and <code>Y</code> and print the result.
# Type your code here

<h1>Differentiation in PyTorch</h1> 

<h2 id="Derivative">Derivatives</h2>

Let us create the tensor <code>x</code> and set the parameter <code>requires_grad</code> to true because you are going to take the derivative of the tensor.

In [None]:
# Create a tensor x

x = torch.tensor(2.0, requires_grad = True)
print("The tensor x: ", x)

Then let us create a tensor according to the equation $ y=x^2 $.

In [None]:
# Create a tensor y according to y = x^2

y = x ** 2
print("The result of y = x^2: ", y)

Then let us take the derivative with respect x at x = 2

In [None]:
# Take the derivative and print out the derivative at the value x = 2

y.backward()
print("The derivative at x = 2: ", x.grad)

The preceding lines perform the following operation: 

$\frac{\mathrm{dy(x)}}{\mathrm{dx}}=2x$

Let us try to calculate the derivative for a more complicated function. 

In [None]:
# Calculate the y = x^2 + 2x + 1, then find the derivative 

x = torch.tensor(2.0, requires_grad = True)
y = x ** 2 + 2 * x + 1
print("The result of y = x^2 + 2x + 1: ", y)
y.backward()
print("The dervative at x = 2: ", x.grad)

The function is in the following form:
$y=x^{2}+2x+1$

The derivative is given by:


$\frac{\mathrm{dy(x)}}{\mathrm{dx}}=2x+2$

$\frac{\mathrm{dy(x=2)}}{\mathrm{dx}}=2(2)+2=6$

<h3>Task 9 (3 points)</h3>

Determine the derivative of $ y = 2x^3+x $ at $x=1$

In [None]:
# Calculate the derivative of y = 2x^3 + x at x = 1
# Type your code here

<h2 id="Partial_Derivative">Partial Derivatives</h2>

We can also calculate <b>Partial Derivatives</b>. Consider the function: $f(u,v)=vu+u^{2}$

Let us create <code>u</code> tensor, <code>v</code> tensor and  <code>f</code> tensor

In [None]:
# Calculate f(u, v) = v * u + u^2 at u = 1, v = 2

u = torch.tensor(1.0,requires_grad=True)
v = torch.tensor(2.0,requires_grad=True)
f = u * v + u ** 2
print("The result of v * u + u^2: ", f)

This is equivalent to the following: 

$f(u=1,v=2)=(2)(1)+1^{2}=3$

Now let us take the derivative with respect to <code>u</code>:

In [None]:
# Calculate the derivative with respect to u

f.backward()
print("The partial derivative with respect to u: ", u.grad)

the expression is given by:

$\frac{\mathrm{\partial f(u,v)}}{\partial {u}}=v+2u$

$\frac{\mathrm{\partial f(u=1,v=2)}}{\partial {u}}=2+2(1)=4$

Now, take the derivative with respect to <code>v</code>:

In [None]:
# Calculate the derivative with respect to v

print("The partial derivative with respect to u: ", v.grad)

The equation is given by:

$\frac{\mathrm{\partial f(u,v)}}{\partial {v}}=u$

$\frac{\mathrm{\partial f(u=1,v=2)}}{\partial {v}}=1$

Calculate the derivative with respect to a function with multiple values as follows. We can use the sum trick to produce a scalar valued function and then take the gradient: 

In [None]:
# Calculate the derivative with multiple values

x = torch.linspace(-10, 10, 10, requires_grad = True)
Y = x ** 2
y = torch.sum(x ** 2)

We can plot the function  and its derivative 

In [None]:
# Take the derivative with respect to multiple value. Plot out the function and its derivative

y.backward()

plt.plot(x.detach().numpy(), Y.detach().numpy(), label = 'function')
plt.plot(x.detach().numpy(), x.grad.numpy(), label = 'derivative')
plt.xlabel('x')
plt.legend()
plt.show()

The orange line is the slope of the blue line at the intersection point, which is the derivative of the blue line.

The <b>relu</b> activation function is an essential function in neural networks. We can take the derivative as follows: 

In [None]:
import torch.nn.functional as F

# Take the derivative of Relu with respect to multiple value. Plot out the function and its derivative

x = torch.linspace(-3, 3, 100, requires_grad = True)
Y = F.relu(x)
y = Y.sum()
y.backward()
plt.plot(x.detach().numpy(), Y.detach().numpy(), label = 'function')
plt.plot(x.detach().numpy(), x.grad.numpy(), label = 'derivative')
plt.xlabel('x')
plt.legend()
plt.show()

<h3>Task 10 (3 points)</h3>

Determine partial derivative  $u$ of the following function where $u=2$ and $v=1$: $ f=uv+(uv)^2$

In [None]:
# Calculate the derivative of f = u * v + (u * v) ** 2 at u = 2, v = 1
# Type the code here

<h1>Simple Dataset</h1> 

The following are the libraries we are going to use for this portion of the lab. The <code>torch.manual_seed()</code> is for forcing the random function in PyTorch to produce the same number every time we run it.

In [None]:
from torch.utils.data import Dataset
torch.manual_seed(1)

Let us try to create our own dataset class.

In [None]:
class ToySet(Dataset):
    """Toy Dataset Class"""
    
    def __init__(self, length = 100, transform = None):
        """Constructor with defult values """
        self.len = length
        self.x = 2 * torch.ones(length, 2)
        self.y = torch.ones(length, 1)
        self.transform = transform
     
    def __getitem__(self, index):
        """Getter"""
        sample = self.x[index], self.y[index]
        if self.transform:
            sample = self.transform(sample)     
        return sample
    
    def __len__(self):
        """Get the number of items"""
        return self.len

Now, let us create our <code>ToySet</code> object, and find out the value on index 1 and the length of the inital dataset

In [None]:
# Create Dataset Object. 
# Find out the value on index 1. 
# Find out the length of Dataset Object.

our_dataset = ToySet()
print("Our ToySet object: ", our_dataset)
print("Value on index 0 of our ToySet object: ", our_dataset[1])
print("Our ToySet length: ", len(our_dataset))

We can apply the same indexing convention as a <code>list</code>,
and apply the function <code>len</code> on the <code>ToySet</code> object. We are able to customize the indexing and length method by <code>def &#95;&#95;getitem&#95;&#95;(self, index)</code> and <code>def &#95;&#95;len&#95;&#95;(self)</code>.

Now, let us print out the first 3 elements and assign them to x and y:

In [None]:
# Use loop to print out first 3 elements in dataset
for i in range(3):
    x, y=our_dataset[i]
    print("index: ", i, '; x:', x, '; y:', y)

<h3>Task 11 (1 point)</h3>

Create an <code>ToySet</code> object with length <b>50</b>. Print out the length of your object.

In [None]:
# Create a new object with length 50, and print the length of object out.
# Type your code here

<h2 id="Transforms">Transforms</h2>

You can also create a class for transforming the data. In this case, we will try to add 1 to x and multiply y by 2:

In [None]:
class AddMult(object):
    """Transform class that adds to x and multiplies y by the given parameters."""
    
    def __init__(self, addx = 1, muly = 2):
        """Constructor"""
        self.addx = addx
        self.muly = muly
    
    def __call__(self, sample):
        """Executor"""
        x = sample[0]
        y = sample[1]
        x = x + self.addx
        y = y * self.muly
        sample = x, y
        return sample

Now, create a transform object:.

In [None]:
# Create an AddMult transform object, and an ToySet object

a_m = AddMult()
data_set = ToySet()

Assign the outputs of the original dataset to <code>x</code> and <code>y</code>. Then, apply the transform <code>add_mult</code> to the dataset and output the values as <code>x_</code> and <code>y_</code>, respectively: 

In [None]:
# Use loop to print out first 10 elements in dataset

for i in range(10):
    x, y = data_set[i]
    print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
    x_, y_ = a_m(data_set[i])
    print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)

As the result, <code>x</code> has been added by 1 and y has been multiplied by 2, as <i>[2, 2] + 1 = [3, 3]</i> and <i>[1] x 2 = [2]</i>

We can apply the transform object every time we create a new <code>ToySet object</code>. Remember, we have the constructor in ToySet class with the parameter <code>transform = None</code>.
When we create a new object using the constructor, we can assign the transform object to the parameter transform, as the following code demonstrates.

In [None]:
# Create a new data_set object with AddMult object as transform

cust_data_set = ToySet(transform = a_m)

This applied <code>a_m</code> object (a transform method) to every element in <code>cust_data_set</code> as initialized. Let us print out the first 10 elements in <code>cust_data_set</code> in order to see whether the <code>a_m</code> applied on <code>cust_data_set</code>

In [None]:
# Use loop to print out first 10 elements in dataset

for i in range(10):
    x, y = data_set[i]
    print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
    x_, y_ = cust_data_set[i]
    print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)

The result is the same as the previous method.

<h3>Task 12 (4 points)</h3>

Construct your own <code>MyAddMult</code> class by adding x and y with 2 and then multiply both x and y by 10. Apply it on a new ToySet object, and print out the first 3 elements from the transformed dataset.

In [None]:
# Construct your own MyAddMult transform by adding x and y with 2 and multiply both x and y by 10. 
# Apply MyAddMult on a new ToySet object. Print out the first three elements from the transformed dataset.
# Type your code here.

<h2 id="Compose">Compose</h2>

You can compose multiple transforms on the dataset object. First, import <code>transforms</code> from <code>torchvision</code>:

In [None]:
# Run the command below if you do not have torchvision installed
# !pip install -y torchvision

from torchvision import transforms

Then, create a new transform class that multiplies each of the elements by 100: 

In [None]:
class Mult(object):

    def __init__(self, mult = 100):
        """Constructor"""
        self.mult = mult
        
    def __call__(self, sample):
        """Executor"""
        x = sample[0]
        y = sample[1]
        x = x * self.mult
        y = y * self.mult
        sample = x, y
        return sample

Now let us try to combine the transforms <code>AddMult</code> and <code>Mult</code>

In [None]:
# Combine the AddMult() and Mult()

data_transform = transforms.Compose([AddMult(), Mult()])
print("The combination of transforms (Compose): ", data_transform)

The new <code>Compose</code> object will perform each transform concurrently. Now we can pass the new <code>Compose</code> object (The combination of <code>AddMult()</code> and <code>Mult()</code>) to the constructor for creating <code>ToySet</code> object.

In [None]:
# Create a new ToySet object with compose object as transform

compose_data_set = ToySet(transform = data_transform)

Let us print out the first 3 elements in different <code>ToySet</code> datasets in order to compare the output after different transforms have been applied: 

In [None]:
# Use loop to print out first 3 elements in dataset

for i in range(3):
    x, y = data_set[i]
    print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
    x_, y_ = cust_data_set[i]
    print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)
    x_co, y_co = compose_data_set[i]
    print('Index: ', i, 'Compose Transformed x_co: ', x_co ,'Compose Transformed y_co: ',y_co)

Let us see what happened on index 0. The original value of <code>x</code> is <i>[2, 2]</i>, and the original value of <code>y</code> is [1]. If we only applied <code>AddMult()</code> on the original dataset, then the <code>x</code> became <i>[3, 3]</i> and y became <i>[2]</i>. Now let us see what is the value after applied both <code>AddMult()</code> and <code>Mult()</code>. The result of x is <i>[300, 300]</i> and y is <i>[200]</i>. The calculation which is equavalent to the compose is <i> x = ([2, 2] + 1) x 100 = [300, 300], y = ([1] x 2) x 100 = 200</i>

<h3>Task 13 (3 points)</h3>

Combine the <code>Mult()</code> and <code>AddMult()</code> as <code>Mult()</code> to be executed first. And apply this on a new <code>ToySet</code> dataset. Print out the first 3 elements in the transformed dataset.

In [None]:
# Make a compose as Mult() execute first and then AddMult(). Apply the compose on ToySet dataset. Print out the first 3 elements in the transformed dataset.
# Type your code here.

<h2 id="Prebuilt_Dataset">Prebuilt Datasets</h2> 

In this section you will focus on the following libraries. Please note that the <code>torchvision</code> package consists of popular datasets, model architectures, and common image transformations for computer vision. 

In [None]:
import torchvision.transforms as transforms
import torchvision.datasets as dsets

We can import a prebuilt dataset. In this case, we use <code>MNIST</code> (a database of handwritten digits). You'll work with several of these parameters later by placing a transform object in the argument <code>transform</code>.

In [None]:
# Import the prebuilt dataset into variable dataset

dataset = dsets.MNIST(
    root = './data', 
    train = False, 
    download = True, 
    transform = transforms.ToTensor()
)

Each element of the dataset object contains a tuple. Let us see whether the first element in the dataset is a tuple and what is in it.

In [None]:
# Examine whether the elements in dataset MNIST are tuples, and what is in the tuple?

print("Type of the first element: ", type(dataset[0]))
print("The length of the tuple: ", len(dataset[0]))
print("The shape of the first element in the tuple: ", dataset[0][0].shape)
print("The type of the first element in the tuple", type(dataset[0][0]))
print("The second element in the tuple: ", dataset[0][1])
print("The type of the second element in the tuple: ", type(dataset[0][1]))
print("As the result, the structure of the first element in the dataset is (tensor([1, 28, 28]), tensor(7)).")

As shown in the output, the first element in the tuple is a cuboid tensor. As you can see, there is a dimension with only size 1, so basically, it is a rectangular tensor.<br>
The second element in the tuple is a number tensor, which indicate the real number the image shows. As the second element in the tuple is <code>tensor(7)</code>, the image should show a hand-written 7.

This is a helper function for drawing diagram given a data sample.

In [None]:
def showData(data_sample, shape = (28, 28)):
    """show data by diagram
    """
    plt.imshow(data_sample[0].numpy().reshape(shape), cmap='gray')
    plt.title('y = ' + str(data_sample[1]))

Let us plot the first element in the dataset:

In [None]:
# Plot the first element in the dataset
showData(dataset[0])

Plot the second sample:   

In [None]:
# Plot the second element in the dataset
showData(dataset[1])

<h2 id="Torchvision"> Torchvision Transforms  </h2> 

We can apply some image transform functions on the MNIST dataset.

As an example, the images in the MNIST dataset can be cropped and converted to a tensor. We can use <code>transform.Compose</code> we learned from the previous section to combine the two transform functions.

In [None]:
# Combine two transforms: crop and convert to tensor. Apply the compose to MNIST dataset

croptensor_data_transform = transforms.Compose([transforms.CenterCrop(20), transforms.ToTensor()])
dataset = dsets.MNIST(root = './data', train = False, download = True, transform = croptensor_data_transform)
print("The shape of the first element in the first tuple: ", dataset[0][0].shape)

We can see the image is now 20 x 20 instead of 28 x 28.

Let us plot the first image again. You should notice that the image is cropped a bit.

In [None]:
# Plot the first element in the dataset
showData(dataset[0],shape = (20, 20))

In [None]:
# Plot the second element in the dataset
showData(dataset[1],shape = (20, 20))

In the below example, we horizontally flip the image, and then convert it to a tensor. Use <code>transforms.Compose()</code> to combine these two transform functions. Plot the flipped image.

In [None]:
# Construct the compose. Apply it on MNIST dataset. Plot the image out.

fliptensor_data_transform = transforms.Compose([transforms.RandomHorizontalFlip(p = 1),transforms.ToTensor()])
dataset = dsets.MNIST(root = './data', train = False, download = True, transform = fliptensor_data_transform)
showData(dataset[1])

<h3>Task 14 (3 points)</h3>

Use the <code>RandomVerticalFlip</code> (vertically flip the image) with horizontally flip and convert to tensor as a compose. Apply the compose on image. Use <code>showData()</code> to plot the second image (the image has the hand-written digit <b>2</b>).

In [None]:
# Combine vertical flip, horizontal flip and convert to tensor as a compose. Apply the compose on image. Then plot the image
# Type your code here

<h2 id="lr-1d-prediction">Linear Regression Prediction</h2>
<p>In this section, we will see how to make a prediction in several different ways by using PyTorch.</h2>

Let us create the following expressions:

$b=-1,w=2$

$\hat{y}=-1+2x$

First, let's define the parameters:

In [None]:
# Define w = 2 and b = -1 for y = wx + b
w = torch.tensor(2.0, requires_grad = True)
b = torch.tensor(-1.0, requires_grad = True)

Then, define the function <code>forward(x)</code> to make the prediction: 

In [None]:
def forward(x):
    yhat = w * x + b
    return yhat

Let's make the following prediction at <i>x = 1</i>

$\hat{y}=-1+2x$

$\hat{y}=-1+2(1)$

In [None]:
# Predict y = 2x - 1 at x = 1
x = torch.tensor([[1.0]])
yhat = forward(x)
print("The prediction: ", yhat)

Now, let us try to make the prediction for multiple inputs (i.e., <code>x</code> contains <code>1.0</code> and <code>2.0</code>). Let us construct the <code>x</code> tensor first. Let's check the shape of <code>x</code> first.

In [None]:
# Create x Tensor and check the shape of x tensor
x = torch.tensor([[1.0], [2.0]])
print("The shape of x: ", x.shape)

Now make the prediction: 

In [None]:
# Make the prediction of y = 2x - 1 at x = [1, 2]
yhat = forward(x)
print("The prediction: ", yhat)

<h3 id="task-15">Task 15 (3 points)</h3>
Make a prediction of the following <code>x</code> tensor using the <code>w</code> and <code>b</code> from above. You must return the value from the <code>lab1_task15</code> function.

In [None]:
# Task 15
x = torch.tensor([[1.0], [2.0], [3.0]])
def lab1_task15(x):
    # Type your code here to return the prediction

<h3 id="Linear">Class Linear</h3>

We can also use the linear class to build more complex models. Let's first import the module.

In [None]:
from torch.nn import Linear

Set the random seed because the parameters are randomly initialized, otherwise. By seeding the random number generator to a fixed value the results will be reproducible.

In [None]:
torch.manual_seed(1)

Let us create the linear object by using the constructor. The parameters are randomly created. Let us print out to see what <i>w</i> and <i>b</i> are.

In [None]:
# Create Linear Regression Model, and print out the parameters
linear = Linear(in_features=1, out_features=1, bias=True)
print("Parameters w and b: ", list(linear.parameters()))

This is equivalent to the following expression: 

$b=-0.4414, w=0.5153$

$\hat{y}=-0.4414+0.5153x$

Now let us make a single prediction at <i>x = [[1.0]]</i>.

In [None]:
# Make the prediction at x = [[1.0]]
x = torch.tensor([[1.0]])
yhat = linear(x)
print("The prediction: ", yhat)

Similarly, you can make multiple predictions for <code>x</code>. Use model <code>linear(x)</code> to predict the result.

In [None]:
# Create the prediction using linear model
x = torch.tensor([[1.0], [2.0]])
yhat = linear(x)
print("The prediction: ", yhat)

<h3 id="task-16">Task 16 (2 points)</h3>
Make a prediction of the following <code>x</code> tensor using the linear regression model <code>linear</code>. You must return the value from the <code>lab1_task16</code> function.

In [None]:
# Task 16
x = torch.tensor([[1.0],[2.0],[3.0]])
def lab1_task16(x):
    # Type your code here to return the prediction


<h3 id="Cust">Custom Modules</h3>

Now, let's build a custom module. We can make more complex models by using this method later on. 

First, import the following library.

In [None]:
from torch import nn

Now, let us define the class: 

In [None]:
class LinearRegression(nn.Module):
    """ Customized Linear Regression Class"""
        
    def __init__(self, input_size, output_size):
        """Constuctor"""  
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
    
    def forward(self, x):
        """Prediction function"""
        out = self.linear(x)
        return out

Create an object by using the constructor. Print out the parameters we get and the model.

In [None]:
# Create the linear regression model. Print out the parameters.
linear_regression = LinearRegression(1, 1)
print("The parameters: ", list(linear_regression.parameters()))
print("Linear model: ", linear_regression.linear)

Let us try to make a prediction of a single input sample.

In [None]:
# Let's try the customized linear regression model with a single input
x = torch.tensor([[1.0]])
yhat = linear_regression(x)
print("The prediction: ", yhat)

Now, let us try another example.

In [None]:
# Let's try the customized linear regression model with multiple inputs
x = torch.tensor([[1.0], [2.0]])
yhat = linear_regression(x)
print("The prediction: ", yhat)

<h3 id="task-17">Task 17 (2 points)</h3>

Create an object from the <code>LinearRegression</code> class we created before and make a prediction by using the following tensor <code>x</code>. You must return the value from the <code>lab1_task17</code> function. 

In [None]:
# Task 17: Use the LinearRegression class to create a model and make a prediction of the following tensor.
x = torch.tensor([[1.0], [2.0], [3.0]])
def lab1_task17(x):
    # Type your code here to return the prediction

<h2 id="lr-1d-training-one-param">Training One Parameter</h2>
<p>In this section, you will train a model with PyTorch by using some data that you will create. The model only has one parameter: the slope.</p>
The class <code>PlotDiagram</code> helps us to visualize the data space and the parameter space during training. This class has nothing to do with PyTorch, and is included here as supplemental code for the lab.

In [None]:
class PlotDiagram():
    """The class for plotting diagrams"""
    
    def __init__(self, X, Y, w, stop, go = False):
        """Constructor"""
        start = w.data
        self.error = []
        self.parameter = []
        self.X = X.numpy()
        self.Y = Y.numpy()
        self.parameter_values = torch.arange(start, stop)
        self.Loss_function = [criterion(forward(X), Y) for w.data in self.parameter_values] 
        w.data = start
        
    def __call__(self, Yhat, w, error, n):
        """Executor"""
        self.error.append(error)
        self.parameter.append(w.data)
        plt.subplot(212)
        plt.plot(self.X, Yhat.detach().numpy())
        plt.plot(self.X, self.Y,'ro')
        plt.xlabel("A")
        plt.ylim(-20, 20)
        plt.subplot(211)
        plt.title("Data Space (top) Estimated Line (bottom) Iteration " + str(n))
        plt.plot(self.parameter_values.numpy(), self.Loss_function)   
        plt.plot(self.parameter, self.error, 'ro')
        plt.xlabel("B")
        plt.figure()
    
    def __del__(self):
        """Destructor"""
        plt.close('all')

Generate values from -3 to 3 that create a line with a slope of -3. This is the line you will estimate.

In [None]:
# Create the f(X) with a slope of -3
X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = -3 * X

Let us plot the line.

In [None]:
# Plot the line
plt.plot(X.numpy(), f.numpy(), label = 'f')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Let's add some noise to the data in order to simulate real data. Use <code>torch.randn(X.size())</code> to generate Gaussian noise that is the same size as <code>X</code> and has a standard deviation of 0.1.

In [None]:
# Add some noise to f(X) and save it in Y
Y = f + 0.1 * torch.randn(X.size())

Plot the <code>Y</code>: 

In [None]:
# Plot the data points
plt.plot(X.numpy(), Y.numpy(), 'rx', label = 'Y')
plt.plot(X.numpy(), f.numpy(), label = 'f')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

<h3 id="Model_Cost">Create the Model and Cost Function (Total Loss)</h3>
In this section, let us create the model and the cost function (total loss) we are going to use to train the model and evaluate the result.
First, let's define the <code>forward_without_bias</code> function $y=w*x$. (We will add the bias soon.)


In [None]:
def forward_without_bias(x):
    """forward funtion for prediction"""
    return w * x

Define the cost or criterion function using MSE (Mean Square Error): 

In [None]:
def criterion(yhat, y):
    """Create the MSE (Mean Square Error) function to evaluate the result."""
    return torch.mean((yhat - y) ** 2)

Define the learning rate <code>learning_rate</code> and an empty list <code>LOSS</code> to record the loss for each iteration:   

In [None]:
# Create Learning Rate and an empty list to record the loss for each iteration
learning_rate = 0.1
LOSS = []

Now, we create a model parameter by setting the argument <code>requires_grad</code> to <code> True</code> because the system must learn it.

In [None]:
w = torch.tensor(-10.0, requires_grad = True)

Create a <code>PlotDiagram</code> object to visualize the data space and the parameter space for each iteration during training:

In [None]:
gradient_plot = PlotDiagram(X, Y, w, stop = 5)

<h3 id="Train">Train the Model</h3>
Let us define a function for training the model. The steps are described in the comments.

In [None]:
def train_model(iter):
    """Function for training the model"""
    for epoch in range (iter):
        
        # make the prediction
        Yhat = forward_without_bias(X)
        
        # calculate the loss in each iteration
        loss = criterion(Yhat,Y)
        
        # plot the diagram for us to have a better idea
        gradient_plot(Yhat, w, loss.item(), epoch)
        
        # store the loss in the global LOSS list
        LOSS.append(loss)
        
        # backward pass: compute gradient of the loss with respect to all the learnable parameters
        loss.backward()
        
        # updata parameters
        w.data = w.data - learning_rate * w.grad.data
        
        # zero the gradients before running the backward pass
        w.grad.data.zero_()

Let us try to run 4 iterations of gradient descent:  

In [None]:
# 4 iterations for training the model.
train_model(4)

Plot the cost for each iteration: 

In [None]:
# Plot the loss for each iteration
plt.plot(LOSS)
plt.tight_layout()
plt.xlabel("Epoch/Iterations")
plt.ylabel("Cost")

<h3 id="task-18">Task 18 (3 points)</h3>
Create a new learnable parameter <code>w</code> with an initial value of -15.0, and an empty list <code>LOSS2</code>. Then write your own <code>lab1_task18_train_model</code> function with loss list <code>LOSS2</code>. Run your training model with 4 iterations. Plot an overlay of the list <code>LOSS2</code> and <code>LOSS</code>.

In [None]:
# Task 18
w = # Type your code here
LOSS2 = # Type your code here
gradient_plot1 = PlotDiagram(X, Y, w, stop = 15)

def lab1_task18_train_model(iter):
    # Type your lab02_task04_train_model function definiition here

lab1_task18_train_model(4)

# Type your code to plot the results using plt

<h2 id="lr-1d-training-two-params">Training Two Parameters</h2>

We'll need the following library for this section.

In [None]:
from mpl_toolkits import mplot3d

The class <code>PlotErrorSurfaces</code> is just to help you visualize the data space and the parameter space during training. This class has nothing to do with PyTorch, and is included here as supplemental code for the lab.

In [None]:
class PlotErrorSurfaces(object):
    
    def __init__(self, w_range, b_range, X, Y, n_samples = 30, go = True):
        W = np.linspace(-w_range, w_range, n_samples)
        B = np.linspace(-b_range, b_range, n_samples)
        w, b = np.meshgrid(W, B)    
        Z = np.zeros((30,30))
        count1 = 0
        self.y = Y.numpy()
        self.x = X.numpy()
        for w1, b1 in zip(w, b):
            count2 = 0
            for w2, b2 in zip(w1, b1):
                Z[count1, count2] = np.mean((self.y - w2 * self.x + b2) ** 2)
                count2 += 1
            count1 += 1
        self.Z = Z
        self.w = w
        self.b = b
        self.W = []
        self.B = []
        self.LOSS = []
        self.n = 0
        if go == True:
            plt.figure()
            plt.figure(figsize = (7.5, 5))
            plt.axes(projection='3d').plot_surface(self.w, self.b, self.Z, rstride = 1, cstride = 1,cmap = 'viridis', edgecolor = 'none')
            plt.title('Cost/Total Loss Surface')
            plt.xlabel('w')
            plt.ylabel('b')
            plt.show()
            plt.figure()
            plt.title('Cost/Total Loss Surface Contour')
            plt.xlabel('w')
            plt.ylabel('b')
            plt.contour(self.w, self.b, self.Z)
            plt.show()
    
    def set_para_loss(self, W, B, loss):
        self.n = self.n + 1
        self.W.append(W)
        self.B.append(B)
        self.LOSS.append(loss)
    
    def set_para_loss_model(self, model, loss):
        self.n = self.n + 1
        self.LOSS.append(loss)
        self.W.append(list(model.parameters())[0].item())
        self.B.append(list(model.parameters())[1].item())

    
    def final_plot(self): 
        ax = plt.axes(projection = '3d')
        ax.plot_wireframe(self.w, self.b, self.Z)
        ax.scatter(self.W,self.B, self.LOSS, c = 'r', marker = 'x', s = 200, alpha = 1)
        plt.figure()
        plt.contour(self.w,self.b, self.Z)
        plt.scatter(self.W, self.B, c = 'r', marker = 'x')
        plt.xlabel('w')
        plt.ylabel('b')
        plt.show()
    
    def plot_ps(self):
        plt.subplot(121)
        plt.ylim
        plt.plot(self.x, self.y, 'ro', label="training points")
        plt.plot(self.x, self.W[-1] * self.x + self.B[-1], label = "estimated line")
        plt.xlabel('x')
        plt.ylabel('y')
        plt.ylim((-10, 15))
        plt.title('Data Space Iteration: ' + str(self.n))

        plt.subplot(122)
        plt.contour(self.w, self.b, self.Z)
        plt.scatter(self.W, self.B, c = 'r', marker = 'x')
        plt.title('Total Loss Surface Contour Iteration' + str(self.n))
        plt.xlabel('w')
        plt.ylabel('b')
        plt.show()

<h3 id="Makeup_Data">Make Some Data</h3>
Start with generating values from -3 to 3 that create a line with a slope of 1 and a bias of -1. This is the line that you need to estimate.

In [None]:
# Create a 1-D tensor of size (end-start)/step, where start=-3, end=3, and step=0.1,
# with the values from the interval [start, end) taken with common difference step 
# beginning from start, and then reshape it to create a 2-D tensor.
X = torch.arange(-3, 3, 0.1).view(-1, 1)

# Create f(X) with a slope of 1 and a bias of -1
f = 1 * X - 1

Now, add some noise to the data:

In [None]:
# Add noise
Y = f + 0.1 * torch.randn(X.size())

Plot the line and <code>Y</code> with noise:

In [None]:
# Plot the line and the points with noise
plt.plot(X.numpy(), Y.numpy(), 'rx', label = 'y')
plt.plot(X.numpy(), f.numpy(), label = 'f')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

<h3 id="Model_Cost">Create the Model and Cost Function (Total Loss)</h3>
Create a <code>PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training:

In [None]:
# Create PlotErrorSurfaces object for viewing the data
error_surfaces = PlotErrorSurfaces(15, 15, X, Y, 30)

<h3 id="Train">Train the Model</h3>

Create model parameters <code>w</code>, <code>b</code> by setting the argument <code>requires_grad</code> to True because we must learn it using the data.

In [None]:
# Define the parameters w, b for y = wx + b
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

Set the learning rate to 0.1 and create an empty list <code>loss</code> for storing the loss for each iteration.

In [None]:
# Define learning rate and create an empty list for containing the loss for each iteration.
learning_rate = 0.1
LOSS = []

Define <code>train_model</code> function to train the model.

In [None]:
def train_model(iter):
    
    # Loop
    for epoch in range(iter):
        
        # make a prediction
        Yhat = forward(X)
        
        # calculate the loss 
        loss = criterion(Yhat, Y)

        # Section for plotting
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        if epoch % 3 == 0:
            error_surfaces.plot_ps()
            
        # store the loss in the list LOSS
        LOSS.append(loss)
        
        # backward pass: compute gradient of the loss with respect to all the learnable parameters
        loss.backward()
        
        # update parameters slope and bias
        w.data = w.data - learning_rate * w.grad.data
        b.data = b.data - learning_rate * b.grad.data
        
        # zero the gradients before running the backward pass
        w.grad.data.zero_()
        b.grad.data.zero_()

Run 15 iterations of gradient descent.

In [None]:
# Train the model with 15 iterations
train_model(15)

Plot total loss/cost surface with loss values for different parameters in red:

In [None]:
# Plot out the Loss Result
error_surfaces.final_plot()
plt.plot(LOSS)
plt.tight_layout()
plt.xlabel("Epoch/Iterations")
plt.ylabel("Cost")

<h3 id="task-19">Task 19 (4 points)</h3>

Using learning rate of 0.2 and with the following parameters, fill in the code for the function <code>lab1_task19_train_model(iter)</code>. As can be seen in the code snippet below, we want to run 15 iterations of your model. Then plot <code>LOSS</code> and <code>LOSS2</code>.

In [None]:
# Task 19: train and plot the result with learning_rate = 0.2 and the following parameters
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
learning_rate = 0.2
LOSS2 = []

def lab1_task19_train_model(iter):
    # Type your code here

lab1_task19_train_model(15)

# Type your code here to plot the LOSS and LOSS2 in order to compare the Total Loss.

<h2 id="#lr-1d-training-two-params-sgd">Stochastic Gradient Descent (SGD)</h2>

<p>In this section, you will practice training a model by using Stochastic Gradient descent.</p>

<h3 id="Makeup_Data">Make Some Data</h3>
Set random seed: 


In [None]:
torch.manual_seed(1)

Generate values from <i>-3</i> to <i>3</i> that create a line with a slope of <i>1</i> and a bias of <i>-1</i>. This is the line that you need to estimate. Add some noise to the data:

In [None]:
# Setup the actual data and simulated data
X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())

Plot the data:

In [None]:
plt.plot(X.numpy(), Y.numpy(), 'rx', label = 'y')
plt.plot(X.numpy(), f.numpy(), label = 'f')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

<h3 id="Model_Cost">Create the Model and Cost Function (Total Loss)</h3>
Create a <code> PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training:

In [None]:
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30)

<h3 id="BGD">Train the Model: Batch Gradient Descent</h3>
Create model parameters <code>w</code>, <code>b</code> by setting the argument <code>requires_grad</code> to True because the system must learn it.

In [None]:
# Define the parameters w, b for y = wx + b
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

Set the learning rate to  0.1 and create an empty list <code>LOSS</code> for storing the loss for each iteration.

In [None]:
# Define learning rate and create an empty list for containing the loss for each iteration.
learning_rate = 0.1
LOSS_BGD = []

Define <code>train_model</code> function for train the model.

In [None]:
def train_model(iter):
    
    # Loop
    for epoch in range(iter):
        
        # make a prediction
        Yhat = forward(X)
        
        # calculate the loss 
        loss = criterion(Yhat, Y)

        # Section for plotting
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        error_surfaces.plot_ps()
            
        # store the loss in the list LOSS_BGD
        LOSS_BGD.append(loss)
        
        # backward pass: compute gradient of the loss with respect to all the learnable parameters
        loss.backward()
        
        # update parameters slope and bias
        w.data = w.data - learning_rate * w.grad.data
        b.data = b.data - learning_rate * b.grad.data
        
        # zero the gradients before running the backward pass
        w.grad.data.zero_()
        b.grad.data.zero_()

Run 10 epochs of batch gradient descent.

In [None]:
train_model(10)

<h3 id="SGD">Train the Model: Stochastic Gradient Descent</h3>

Create a <code>PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training.

In [None]:
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30, go = False)

Define <code>train_model_SGD</code> function for training the model.

In [None]:
LOSS_SGD = []
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

def train_model_SGD(iter):
    
    for epoch in range(iter):
        
        # SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
        Yhat = forward(X)

        # store the loss 
        LOSS_SGD.append(criterion(Yhat, Y).tolist())
        
        for x, y in zip(X, Y):
            
            # make a pridiction
            yhat = forward(x)
        
            # calculate the loss 
            loss = criterion(yhat, y)

            # Section for plotting
            error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        
            # backward pass: compute gradient of the loss with respect to all the learnable parameters
            loss.backward()
        
            # update parameters slope and bias
            w.data = w.data - learning_rate * w.grad.data
            b.data = b.data - learning_rate * b.grad.data

            # zero the gradients before running the backward pass
            w.grad.data.zero_()
            b.grad.data.zero_()
            
        #plot surface and data space after each epoch    
        error_surfaces.plot_ps()

Run 10 epochs of stochastic gradient descent.

In [None]:
train_model_SGD(10)

Compare the losses of Batch Gradient Descent (BGD) and Stochastic Gradient Descent (SGD).

In [None]:
plt.plot(LOSS_BGD,label = "Batch Gradient Descent")
plt.plot(LOSS_SGD,label = "Stochastic Gradient Descent")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()

<h3 id="SGD_Loader">SGD with Dataset DataLoader</h3>
Import the module for building a dataset class.

In [None]:
from torch.utils.data import Dataset, DataLoader

Create a custom <code>Data</code> class for loading datasets into the model.

In [None]:
class Data(Dataset):
    
    def __init__(self):
        """Constructor"""
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        self.y = 1 * self.x - 1
        self.len = self.x.shape[0]
        
    def __getitem__(self,index):
        """Getter"""  
        return self.x[index], self.y[index]
    
    def __len__(self):
        """Return the length"""
        return self.len

Create a dataset object and check the length of the dataset.

In [None]:
my_dataset = Data()
print("The length of dataset: ", len(my_dataset))

Obtain the first training point:  

In [None]:
# Print the first point
x, y = my_dataset[0]
print("(", x, ", ", y, ")")

Similarly, obtain the first three training points:  

In [None]:
# Print the first 3 points
x, y = my_dataset[0:3]
print("The first 3 x: ", x)
print("The first 3 y: ", y)

Create a <code>PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training:

In [None]:
# Create plot_error_surfaces for viewing the data
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30, go = False)

Create a <code>DataLoader</code> object.

In [None]:
trainloader = DataLoader(dataset = my_dataset, batch_size = 1)

Define <code>train_model_dataloader</code> function for training the model.

In [None]:
w = torch.tensor(-15.0,requires_grad=True)
b = torch.tensor(-10.0,requires_grad=True)
LOSS_loader = []

def train_model_dataloader(epochs):
    
    # Loop
    for epoch in range(epochs):
        
        # SGD is an approximation of our true total loss/cost, 
        # in this line of code we calculate our true loss/cost and store it.
        Yhat = forward(X)
        
        # store the loss 
        LOSS_loader.append(criterion(Yhat, Y).tolist())
        
        for x, y in trainloader:
            
            # make a prediction
            yhat = forward(x)
            
            # calculate the loss
            loss = criterion(yhat, y)
            
            # Section for plotting
            error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
            
            # Backward pass: compute gradient of the loss with respect to all the learnable parameters
            loss.backward()
            
            # Updata parameters slope
            w.data = w.data - learning_rate * w.grad.data
            b.data = b.data - learning_rate * b.grad.data
            
            # Clear gradients 
            w.grad.data.zero_()
            b.grad.data.zero_()
            
        #plot surface and data space after each epoch    
        error_surfaces.plot_ps()

Run 10 epochs of stochastic gradient descent with <code>train_model_dataloader</code>

In [None]:
train_model_dataloader(10)

Compare the loss of both batch gradient decent as SGD. 

In [None]:
plt.plot(LOSS_BGD,label="Batch Gradient Descent")
plt.plot(LOSS_loader,label="Stochastic Gradient Descent with DataLoader")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()

<h3 id="task-20">Task 20 (5 points)</h3>
Use SGD with DataLoader to train the model with 10 iterations. Please complete the <code>lab1_task20_train_model(epochs)</code>. Use <code>LOSS</code> to store the total loss, and plot the total loss.

In [None]:
# Task 20
LOSS = []
w = torch.tensor(-12.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

def lab1_task20_train_model(epochs):
    # Type your code here

lab1_task20_train_model(10)

# Type your code here to plot the LOSS.


<h2 id="lr-1d-two-params-mbgd">Mini-Batch Gradient Decent</h2>

<p>In this section, you will practice training a model by using Mini-Batch Gradient Descent.</p>
<h3 id="Makeup_Data">Make Some Data </h3>
Generate values from -3 to 3 that create a line with a slope of 1 and a bias of -1. This is the line that you need to estimate. Add some noise to the data and plot the results. Then create a <code> PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training.


In [None]:
torch.manual_seed(1)

X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())

plt.plot(X.numpy(), Y.numpy(), 'rx', label = 'y')
plt.plot(X.numpy(), f.numpy(), label = 'f')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

#Initialize error surfaces
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30)

<h3>Train the Model: Batch Gradient Descent (BGD)</h3>
Define <code>train_model_BGD</code> function, and run 10 epochs of batch gradient descent.

In [None]:
# Define the function for training model
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
learning_rate = 0.1
LOSS_BGD = []

def train_model_BGD(epochs):
    for epoch in range(epochs):
        Yhat = forward(X)
        loss = criterion(Yhat, Y)
        LOSS_BGD.append(loss)
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        error_surfaces.plot_ps()
        loss.backward()
        w.data = w.data - learning_rate * w.grad.data
        b.data = b.data - learning_rate * b.grad.data
        w.grad.data.zero_()
        b.grad.data.zero_()

# Run train_model_BGD with 10 iterations
train_model_BGD(10)

<h3 id="SGD"> Stochastic Gradient Descent (SGD) with Dataset DataLoader</h3>
Import <code>Dataset</code> and <code>DataLoader</code> libraries and create a dataset object and a dataloader object: 

In [None]:
from torch.utils.data import Dataset, DataLoader

my_dataset = Data()
trainloader = DataLoader(dataset = my_dataset, batch_size = 1)

#Initialize error surfaces
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30)

Define <code>train_model_SGD</code> function for training the model.

In [None]:
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_SGD = []
learning_rate = 0.1
def train_model_SGD(epochs):
    for epoch in range(epochs):
        Yhat = forward(X)
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
        error_surfaces.plot_ps()
        LOSS_SGD.append(criterion(forward(X), Y).tolist())
        for x, y in trainloader:
            yhat = forward(x)
            loss = criterion(yhat, y)
            error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
            loss.backward()
            w.data = w.data - learning_rate * w.grad.data
            b.data = b.data - learning_rate * b.grad.data
            w.grad.data.zero_()
            b.grad.data.zero_()
        error_surfaces.plot_ps()

Run 10 epochs of stochastic gradient descent: 

In [None]:
# Run train_model_SGD(iter) with 10 iterations
train_model_SGD(10)

<h3 id="Mini5">Mini Batch Gradient Descent: Batch Size Equals 5</h3> 

Create a <code> PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training, and create <code>Data</code> object and create a <code>Dataloader</code> object batch size equals 5.

In [None]:
# Create DataLoader object
my_dataset = Data()
trainloader = DataLoader(dataset = my_dataset, batch_size = 5)

#Initialize error surfaces
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30, go = False)

Define <code>train_model_Mini5</code> function to train the model.

In [None]:
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_MINI5 = []
learning_rate = 0.1

def train_model_Mini5(epochs):
    for epoch in range(epochs):
        Yhat = forward(X)
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
        error_surfaces.plot_ps()
        LOSS_MINI5.append(criterion(forward(X), Y).tolist())
        for x, y in trainloader:
            yhat = forward(x)
            loss = criterion(yhat, y)
            error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
            loss.backward()
            w.data = w.data - learning_rate * w.grad.data
            b.data = b.data - learning_rate * b.grad.data
            w.grad.data.zero_()
            b.grad.data.zero_()

Run 10 epochs of mini-batch gradient descent: 

In [None]:
train_model_Mini5(10)

<h3 id="Mini10">Mini Batch Gradient Descent: Batch Size Equals 10</h3> 
Create a <code> PlotErrorSurfaces</code> object to visualize the data space and the parameter space during training, and create <code>Data</code> object and create a <code>Dataloader</code> object batch size equals 10.

In [None]:
# Create DataLoader object
my_dataset = Data()
trainloader = DataLoader(dataset = my_dataset, batch_size = 10)

#Initialize error surfaces
error_surfaces = PlotErrorSurfaces(15, 13, X, Y, 30)

Define <code>train_model_mini10</code> function for training the model.

In [None]:
# Define train_model_mini10 function
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_MINI10 = []
learning_rate = 0.1

def train_model_Mini10(epochs):
    for epoch in range(epochs):
        Yhat = forward(X)
        error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
        error_surfaces.plot_ps()
        LOSS_MINI10.append(criterion(forward(X),Y).tolist())
        for x, y in trainloader:
            yhat = forward(x)
            loss = criterion(yhat, y)
            error_surfaces.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
            loss.backward()
            w.data = w.data - learning_rate * w.grad.data
            b.data = b.data - learning_rate * b.grad.data
            w.grad.data.zero_()
            b.grad.data.zero_()

Run 10 epochs of mini-batch gradient descent and plot the loss for each epoch.

In [None]:
train_model_Mini10(10)

Plot the loss for each epoch:  

In [None]:
plt.plot(LOSS_BGD,label = "Batch Gradient Descent")
plt.plot(LOSS_SGD,label = "Stochastic Gradient Descent")
plt.plot(LOSS_MINI5,label = "Mini-Batch Gradient Descent, Batch size: 5")
plt.plot(LOSS_MINI10,label = "Mini-Batch Gradient Descent, Batch size: 10")
plt.legend()

<h3 id="task-21">Task 21 (5 points)</h3>

Perform mini batch gradient descent with a batch size of 20. Store the total loss for each epoch in the list LOSS20. Please complete the <code>lab1_task21_train_model(epochs)</code>. Use <code>LOSS20</code> to store the total loss, and plot a graph that shows the LOSS results for all the methods (i.e., batch gradient descent, stochastic gradient descent, batch size = 5,10, and 20).


In [None]:
# Task 21
my_dataset = Data()
LOSS20 = []
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
learning_rate = 0.1

# Type your code here to define the trainloader

def lab1_task21_train_model(epochs):
    # Type your code here

lab1_task21_train_model(10)

# Type your code here to plot the LOSS.