<a href="https://colab.research.google.com/github/Bitdribble/cs224n/blob/main/CS224N%20PyTorch%20Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS224N: PyTorch Tutorial (Winter '21)

Author: Dilara Soylu

In this notebook, we will have a basic introduction to PyTorch and work on a toy NLP task. Following resources have been used in preparation of this notebook:

* [Word Window Classification](https://web.stanford.edu/class/cs224n/materials/(https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1204/materials/ww_classifier.ipynb) tutorial notebook by Matt Lamm, from Winter 2020 offering of CS224N
* Official PyTorch Documentation on [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) by Soumith Chintala
* PyTorch Tutorial Notebook, [Build Basic Generative Adversarial Networks (GANs) | Coursera](https://www.coursera.org/learn/build-basic-generative-adversarial-networks-gans) by Sharon Zhou, offered on Coursera

Many thanks to Angelica Sun and John Hewitt for their feedback.

## Introduction
[PyTorch](https://pytorch.org/) is a machine learning framework that is used in both academia and industry for various applications. PyTorch started of as a more flexible alternative to [TensorFlow](https://www.tensorflow.org/), which is another popular machine learning framework. At the time of its release, `PyTorch` appealed to the users due to its user friendly nature: as opposed to defining static graphs before performing an operation as in `TensorFlow`, `PyTorch` allowed users to define their operations as they go, which is also the approached integrated by `TensorFlow` in its following releases. Although `TensorFlow` is more widely preferred in the industry, `PyTorch` is often times the preferred machine learning framework for researchers. 

Now that we have learned enough about the background of `PyTorch`, let's start by importing it into our notebook. To install `PyTorch`, you can follow the instructions here. Alternatively, you can open this notebook using `Google Colab`, which already has `PyTorch` installed in its base kernel. Once you are done with the installation process, run the following cell:

In [None]:
import torch
import torch.nn as nn

# Import pprint, module we use for making our print statements prettier
import pprint
pp = pprint.PrettyPrinter()

We are all set to start our tutorial. Let's dive in!

# Tensors
Tensors are the most basic building blocks in `PyTorch`. Tensors are similar to matrices, but the have extra properties and they can represent higher dimensions. For example, an square image with 256 pixels in both sides can be represented by a `3x256x256` tensor, where the first 3 dimensions represent the color channels, red, green and blue.

## Tensor Initialization

There are several ways to instantiate tensors in `PyTorch`, which we will go through next.

### From a Python List

We can initalize a tensor from a `Python` list, which could include sublists. The dimensions and the data types will be automatically inferred by `PyTorch` when we use `torch.tensor()`.



In [3]:
# Initialize a tensor from a Python List
data = [
        [0, 1], 
        [2, 3],
        [4, 5]
       ]
x_python = torch.tensor(data)

# Print the tensor
x_python

tensor([[0, 1],
        [2, 3],
        [4, 5]])

We can also call `torch.tensor()` with the optional `dtype` parameter, which will set the data type. Some useful datatypes to be familiar with are: `torch.bool`, `torch.float`, and `torch.long`.

In [4]:
# We are using the dtype to create a tensor of particular type
x_float = torch.tensor(data, dtype=torch.float)
x_float

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

In [5]:
# We are using the dtype to create a tensor of particular type
x_bool = torch.tensor(data, dtype=torch.bool)
x_bool

tensor([[False,  True],
        [ True,  True],
        [ True,  True]])

We can also get the same tensor in our specified data type using methods such as `float()`, `long()` etc.


In [6]:
x_python.float()

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

We can also use `tensor.FloatTensor`, `tensor.LongTensor`, `tensor.Tensor` classes to instantiate a tensor of particular type. `LongTensors` are particularly important in NLP as many methods that deal with indices require the indices to be passed as a `LongTensor`, which is a 64 bit integer.

In [7]:
# `torch.Tensor` defaults to float
# Same as torch.FloatTensor(data)
x = torch.Tensor(data) 
x


tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

### From a NumPy Array

We can also initialize a tensor from a `NumPy` array.


In [8]:
import numpy as np

# Initialize a tensor from a NumPy array
ndarray = np.array(data)
x_numpy = torch.from_numpy(ndarray)

# Print the tensor
x_numpy

tensor([[0, 1],
        [2, 3],
        [4, 5]])

### From a Tensor

We can also initialize a tensor from another tensor, using the following methods:

* `torch.ones_like(old_tensor)`: Initializes a tensor of 1s.
* `torch.zeros_like(old_tensor)`: Initializes a tensor of 0s.
* `torch.rand_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a uniform distribution between 0 and 1.
* `torch.randn_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a normal distribution.

All of these methods preserve the tensor properties of the original tensor passed in, such as the shape and device, which we will cover in a bit.


In [9]:
# Initialize a base tensor
x = torch.tensor([[1., 2.], [3., 4.]])
x

tensor([[1., 2.],
        [3., 4.]])

In [10]:
# Initialize a tensor of 0s
x_zeros = torch.zeros_like(x)
x_zeros

tensor([[0., 0.],
        [0., 0.]])

In [11]:
# Initialize a tensor of 1s
x_ones = torch.ones_like(x)
x_ones

tensor([[1., 1.],
        [1., 1.]])

In [12]:
# Initialize a tensor where each element is sampled from a uniform distribution
# between 0 and 1
x_rand = torch.rand_like(x)
x_rand

tensor([[0.3386, 0.0887],
        [0.6176, 0.8548]])

In [13]:
# Initialize a tensor where each element is sampled from a normal distribution
x_randn = torch.randn_like(x)
x_randn

tensor([[-1.0035, -0.6282],
        [-0.5455,  1.5023]])

### By Specifying a Shape

We can also instantiate tensors by specifying their shapes (which we will cover in more detail in a bit). The methods we could use follow the ones in the previous section:

* `torch.zeros()`
* `torch.ones()`
* `torch.rand()`
* `torch.randn()`

In [14]:
# Initialize a 2x3x2 tensor of 0s
shape = (4, 2, 2)
x_zeros = torch.zeros(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x_zeros

tensor([[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]])

### With torch.arange()

We can also create a tensor `with torch.arange(end)`, which returns a 1-D tensor with elements ranging from `0` to `end-1`. We can use the optional start and step parameters to create tensors with different ranges.


In [15]:
# Create a tensor with values 0-9
x = torch.arange(10)
x

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Tensor Properties

Tensors have a few properties that are important for us to cover. These are namely shape, and the device properties.

### Data Type

The `dtype` property lets us see the data type of a tensor.


In [16]:
# Initialize a 3x2 tensor, with 3 rows and 2 columns
x = torch.ones(3, 2)
x.dtype

torch.float32

### Shape

The `shape` property tells us the shape of our tensor. This can help us identify how many dimensional our tensor is as well as how many elements exist in each dimension.


In [17]:
# Initialize a 3x2 tensor, with 3 rows and 2 columns
x = torch.Tensor([[1, 2], [3, 4], [5, 6]])
x

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [18]:
# Print out its shape
# Same as x.size()
x.shape 

torch.Size([3, 2])

In [19]:
# Print out the number of elements in a particular dimension
# 0th dimension corresponds to the rows
x.shape[0]

3

We can change the shape of a tensor with the `view()` method.

In [20]:
# Example use of view()
# x_view shares the same memory as x, so changing one changes the other
x_view = x.view(3, 2)
x_view

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [21]:
# We can ask PyTorch to infer the size of a dimension with -1
x_view = x.view(-1, 3)
x_view

tensor([[1., 2., 3.],
        [4., 5., 6.]])

We can also use `torch.reshape()` method for a similar purpose. There is a subtle difference between `reshape()` and `view()`: `view()` requires the data to be stored contiguously in the memory. You can refer to this StackOverflow answer for more information. In simple terms, contiguous means that the way our data is laid out in the memory is the same as the way we would read elements from it. This happens because some methods, such as `transpose()` and `view()`, do not actually change how our data is stored in the memory. They just change the meta information about out tensor, so that when we use it we will see the elements in the order we expect.

`reshape()` calls `view()` internally if the data is stored contiguously, if not, it returns a copy. The difference here isn't too important for basic tensors, but if you perform operations that make the underlying storage of the data non-contiguous (such as taking a `transpose`), you will have issues using `view()`. If you would like to match the way your tensor is stored in the memory to how it is used, you can use the `contiguous()` method.


In [22]:
# Change the shape of x to be 3x2
# x_reshaped could be a reference to or copy of x
x_reshaped = torch.reshape(x, (2, 3))
x_reshaped

tensor([[1., 2., 3.],
        [4., 5., 6.]])