## Neural Network Programming - Deep Learning 

[Playlist link](https://www.youtube.com/watch?v=iTKbyFh-7GM&list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG&index=2)

### PyTorch - Python deep learning neural network API


**A tensor is an n-dimensional array.**

With PyTorch tensors, GPU support is built-in. It’s very easy with PyTorch to move tensors to and from a GPU if we have one installed on our system.

![](./img/diag1.png)

Let’s talk about the prospects for learning PyTorch. For beginners to deep learning and neural networks, the top reason for learning PyTorch is that it is a thin framework that stays out of the way.

**PyTorch is thin and stays out of the way!**

When we build neural networks with PyTorch, we are super close to programming neural networks from scratch. The experience of programming in PyTorch is as close as it gets to the real thing.

A common PyTorch characteristic that often pops up is that it’s great for research. The reason for this research suitability has do do with a technical design consideration. To optimize neural networks, we need to calculate derivatives, and to do this computationally, deep learning frameworks use what are called [computational graphs](http://colah.github.io/posts/2015-08-Backprop/).

Computational graphs are used to graph the function operations that occur on tensors inside neural networks.


These graphs are then used to compute the derivatives needed to optimize the neural network. PyTorch uses a computational graph that is called a dynamic computational graph. This means that the graph is generated on the fly as the operations are created.

This is in contrast to static graphs that are fully determined before the actual operations occur.

It just so happens that many of the cutting edge research topics in deep learning are requiring or benefiting greatly from dynamic graphs.



In [15]:
import torch
import numpy as np

In [2]:
t = torch.tensor([1,2,3]) # created on CPU by default

# so any operation we do on this tensor will be carried out in the CPU

t

tensor([1, 2, 3])

In [5]:
# move tensor t onto GPU: Returns a copy of this object in CUDA memory.

# t = t.cuda()

# t

### Introducing tensors for deep learning


**A tensor is the primary data structure used by neural networks.**

The relationship within each of these pairs is that both elements require the same number of indexes to refer to a specific element within the data structure.


| Indexes reqd | Computer science | Mathematics
--- | --- | ---
|0 |	number |	scalar |
|1 |	array |	vector |
|2 |	2d-array |	matrix |

For example, suppose we have this array:
Now, suppose we want to access (refer to) the number 3 in this data structure. We can do it using a single index like so:


```
> a = [1,2,3,4]

> a[2]
3

```


This logic works the same for a vector.

As another example, suppose we have this 2d-array:
Now, suppose we want to access (refer to) the number 3 in this data structure. In this case, we need two indexes to locate the specific element.

```
> dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

> dd[0][2]
3 
```

#### Tensors are generalizations


When more than two indexes are required to access a specific element, we stop giving specific names to the structures, and we begin using more general language.

Indexes required |	Computer science |	Mathematics
--- | --- | ---
n |	nd-array |	nd-tensor

**Tensors and nd-arrays are the same thing!**

So tensors are multidimensional arrays or nd-arrays for short. The reason we say a tensor is a generalization is because we use the word tensor for all values of n like so:

- A scalar is a 0 dimensional tensor
- A vector is a 1 dimensional tensor
- A matrix is a 2 dimensional tensor
- A nd-array is an n dimensional tensor

### Rank, Axes and Shape

The rank, axes, and shape are three tensor attributes that will concern us most when starting out with tensors in deep learning. These concepts build on one another starting with rank, then axes, and building up to shape, so keep any eye out for this relationship between these three.

#### Rank

The rank of a tensor refers to the number of dimensions present within the tensor. Suppose we are told that we have a rank-2 tensor. This means all of the following:

- We have a matrix
- We have a 2d-array
- We have a 2d-tensor


**A tensor's rank tells us how many indexes are needed to refer to a specific element within the tensor.**

#### Axes

An axis of a tensor is a specific dimension of a tensor.


If we say that a tensor is a rank 2 tensor, we mean that the tensor has 2 dimensions, or equivalently, the tensor has two axes.

Elements are said to exist or run along an axis. This running is constrained by the length of each axis. Let's look at the length of an axis now.

The length of each axis tells us how many indexes are available along each axis.

Suppose we have a tensor called t, and we know that the first axis has a length of three while the second axis has a length of four.

Since the first axis has a length of three, this means that we can index three positions along the first axis like so:

```
t[0]
t[1]
t[2]
```

All of these indexes are valid, but we can't move passed index 2.

Since the second axis has a length of four, we can index four positions along the second axis. This is possible for each index of the first axis, so we have

```
t[0][0]
t[1][0]
t[2][0]

t[0][1]
t[1][1]
t[2][1]

t[0][2]
t[1][2]
t[2][2]

t[0][3]
t[1][3]
t[2][3]
```

Let's look at some examples to make this solid. We'll consider the same tensor dd as before:

```

> dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

# Each element along the first axis, is an array:

> dd[0]
[1, 2, 3]

> dd[1]
[4, 5, 6]

> dd[2]
[7, 8, 9]

# Each element along the second axis, is a number:

> dd[0][0]
1

> dd[1][0]
4

> dd[2][0]
7

```

Note that, with tensors, the elements of the last axis are always numbers. Every other axis will contain n-dimensional arrays. This is what we see in this example, but this idea generalizes.

The rank of a tensor tells us how many axes a tensor has, and the length of these axes leads us to the very important concept known as the shape of a tensor.

#### Shape of a tensor


The shape of a tensor gives us the length of each axis of the tensor.


To work with this tensor's shape, we’ll create a torch.Tensor object like so:




In [3]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

t = torch.Tensor(dd)

t

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [4]:
type(t)

torch.Tensor

In [5]:
t.shape

torch.Size([3, 3])

This allows us to see the tensor's shape is 3 x 3. Note that, in PyTorch, size and shape of a tensor are the same thing.

The shape of 3 x 3 tells us that each axis of this rank two tensor has a length of 3 which means that we have three indexes available along each axis. Let's look now at why the shape of a tensor is so important.

The shape of a tensor is important for a few reasons. The first reason is because the shape allows us to conceptually think about, or even visualize, a tensor. Higher rank tensors become more abstract, and the shape gives us something concrete to think about.

The shape also encodes all of the relevant information about axes, rank, and therefore indexes.

Additionally, one of the types of operations we must perform frequently when we are programming our neural networks is called reshaping.

As our tensors flow through our networks, certain shapes are expected at different points inside the network, and as neural network programmers, it is our job to understand the incoming shape and have the ability to reshape as needed.

### CNN tensor input shape and feature maps


The shape of a CNN input typically has a length of four. This means that we have a rank-4 tensor with four axes. Each index in the tensor’s shape represents a specific axis, and the value at each index gives us the length of the corresponding axis.

Each axis of a tensor usually represents some type of real world or logical feature of the input data. If we understand each of these features and their axis location within the tensor, then we can have a pretty good understanding of the tensor data structure overall.

To break this down, we’ll work backwards, considering the axes from right to left. Remember, the last axis, which is where we’ll start, is where the actual numbers or data values are located.

If we are running along the last axis and we stop to inspect an element there, we will be looking at a number. If we are running along any other axis, the elements are multidimensional arrays.

For images, the raw data comes in the form of pixels that are represented by a number and are laid out using two dimensions, height and width.

#### Image height and width


The image height and width are represented on the last two axes. Possible values here are 28 x 28, as will be the case for our image data in the fashion-MNIST dataset we’ll be using in our CNN project, or the 224 x 224 image size that is used by VGG16 neural network, or any other image dimensions we can imagine.

#### Image color channels

The next axis represents the color channels. Typical values here are 3 for RGB images or 1 if we are working with grayscale images. This color channel interpretation only applies to the input tensor.

As we will reveal in a moment, the interpretation of this axis changes after the tensor passes through a convolutional layer.

Up to this point using the last three axes, we have represented a complete image as a tensor. We have the color channels and the height and width all laid out in tensor form using three axes.

In terms of accessing data at this point, we need three indexes. We choose a color channel, a height, and a width to arrive at a specific pixel value.

#### Image batches


This brings us to the first axis of the four which represents the batch size. In neural networks, we usually work with batches of samples opposed to single samples, so the length of this axis tells us how many samples are in our batch.

This allows us to see that an entire batch of images is represented using a single rank-4 tensor.

Suppose we have the following shape [3, 1, 28, 28] for a given tensor. Using the shape, we can determine that we have a batch of three images.

**[Batch, Channels, Height, Width]**

Each image has a single color channel, and the image height and width are 28 x 28 respectively.


This gives us a single rank-4 tensor that will ultimately flow through our convolutional neural network.

Given a tensor of images like this, we can navigate to a specific pixel in a specific color channel of a specific image in the batch using four indexes.

#### Output channels and feature maps

Let’s look at how the interpretation of the color channel axis changes after the tensor is transformed by a convolutional layer.

Suppose we have a tensor that contains data from a single 28 x 28 grayscale image. This gives us the following tensor shape: [1, 1, 28, 28].

Now suppose this image is passed to our CNN and passes through the first convolutional layer. When this happens, the shape of our tensor and the underlying data will be changed by the convolution operation.

The convolution changes the height and width dimensions as well as the number of channels. The number of output channels changes based on the number of filters being used in the convolutional layer.

Suppose we have three convolutional filters, and lets just see what happens to the channel axis.

Since we have three convolutional filters, we will have three channel outputs from the convolutional layer. These channels are outputs from the convolutional layer, hence the name output channels opposed to color channels.

Each of the three filters convolves the original single input channel producing three output channels. The output channels are still comprised of pixels, but the pixels have been modified by the convolution operation. Depending on the size of the filter, the height and width dimensions of the output will change also, but we'll leave those details for a future post.

With the output channels, we no longer have color channels, but modified channels that we call feature maps. These so-called feature maps are the outputs of the convolutions that take place using the input color channels and the convolutional filters.

The word “feature” is used because the outputs represent particular features from the image, like edges for example, and these mappings emerge as the network learns during the training process and become more complex as we move deeper into the network.



### Introducing PyTorch Tensors

PyTorch tensors are instances of the torch.Tensor Python class. We can create a torch.Tensor object using the class constructor like so:



In [7]:
t = torch.Tensor()
print (type(t))
print (t.dtype)
print (t.device)
print (t.layout)

<class 'torch.Tensor'>
torch.float32
cpu
torch.strided


The device, cpu in our case, specifies the device (CPU or GPU) where the tensor's data is allocated. This determines where tensor computations for the given tensor will be performed.

PyTorch supports the use of multiple devices, and they are specified using an index like so:

```
> device = torch.device('cuda:0')
> device
device(type='cuda', index=0)
```

If we have a device like above, we can create a tensor on the device by passing the device to the tensor’s constructor. One thing to keep in mind about using multiple devices is that tensor operations between tensors must happen between tensors that exists on the same device.

The layout, strided in our case, specifies how the tensor is stored in memory.

- Tensors contain data of a uniform type (dtype).
- Tensor computations between tensors depend on the dtype and the device


#### Creating tensors using data

These are the primary ways of creating tensor objects (instances of the torch.Tensor class), with data (array-like) in PyTorch:

1. torch.Tensor(data)
2. torch.tensor(data)
3. torch.as_tensor(data)
4. torch.from_numpy(data)

Let’s look at each of these. They all accept some form of data and give us an instance of the torch.Tensor class. Sometimes when there are multiple ways to achieve the same result, things can get confusing, so let’s break this down.

We’ll begin by just creating a tensor with each of the options and see what we get. We’ll start by creating some data.

We can use a Python list, or sequence, but numpy.ndarrays are going to be the more common option, so we’ll go with a numpy.ndarray like so:

In [8]:
data = np.array([1,2,3])

type(data)

numpy.ndarray

In [9]:
o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)

print(o1, o2, o3, o4)

tensor([1., 2., 3.]) tensor([1, 2, 3]) tensor([1, 2, 3]) tensor([1, 2, 3])


All of the options (o1, o2, o3, o4) appear to have produced the same tensors except for the first one. The first option (o1) has dots after the number indicating that the numbers are floats, while the next three options have a type of int32.



In [10]:
data[0] = 1000

In [11]:
print(o1, o2, o3, o4)

tensor([1., 2., 3.]) tensor([1, 2, 3]) tensor([1000,    2,    3]) tensor([1000,    2,    3])


Modification of the numpy array modifies the tensors in case of `as_tensor` and `from_numpy` methods

### Creating PyTorch Tensors - Best Options

Uppercase/lowercase: torch.Tensor() vs torch.tensor()
Notice how the first option torch.Tensor() has an uppercase T while the second option torch.tensor() has a lowercase t. What’s up with this difference?

The first option with the uppercase T is the constructor of the torch.Tensor class, and the second option is what we call a factory function that constructs torch.Tensor objects and returns them to the caller.

You can think of the torch.tensor() function as a factory that builds tensors given some parameter inputs. Factory functions are a software design pattern for creating objects. Okay. That’s the difference between the uppercase T and the lower case t, but which way is better between these two? The answer is that it’s fine to use either one. However, the factory function torch.tensor() has better documentation and more configuration options, so it gets the winning spot at the moment.

Alright, before we knock the torch.Tensor() constructor off our list in terms of use, let’s go over the difference we observed in the printed tensor outputs. The difference is in the dtype of each tensor.

The difference here arises in the fact that the torch.Tensor() constructor uses the default dtype when building the tensor. We can verify the default dtype using the torch.get_default_dtype() method:



In [12]:
torch.get_default_dtype()

torch.float32

The other calls choose a dtype based on the incoming data. This is called type inference. The dtype is inferred based on the incoming data. Note that the dtype can also be explicitly set for these calls by specifying the dtype as an argument:



In [13]:
torch.tensor(data, dtype=torch.float32)

tensor([1000.,    2.,    3.])

In [14]:
torch.as_tensor(data, dtype=torch.float32)

tensor([1000.,    2.,    3.])

With torch.Tensor(), we are unable to pass a dtype to the constructor. This is an example of the torch.Tensor() constructor lacking in configuration options. This is one of the reasons to go with the torch.tensor() factory function for creating our tensors.

#### Sharing memory for performance: copy vs share

torch.Tensor() and torch.tensor() copy their input data while torch.as_tensor() and torch.from_numpy() share their input data in memory with the original input object.

This sharing just means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the torch.Tensor and the numpy.ndarray.

Sharing data is more efficient and uses less memory than copying data because the data is not written to two locations in memory.

This establishes that torch.as_tensor() and torch.from_numpy() both share memory with their input data. However, which one should we use, and how are they different?

The torch.from_numpy() function only accepts numpy.ndarrays, while the torch.as_tensor() function accepts a wide variety of Python array-like objects including other PyTorch tensors. For this reason, torch.as_tensor() is the winning choice in the memory sharing game.

#### Best options for creating tensors in PyTorch

Given all of these details, these two are the best options:

- torch.tensor()
- torch.as_tensor()

The torch.tensor() call is the sort of go-to call, while torch.as_tensor() should be employed when tuning our code for performance.

Some things to keep in mind (memory sharing works where it can):


- Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
- The memory sharing of as_tensor() doesn’t work with built-in Python data structures like lists.
- The as_tensor() call requires developer knowledge of the sharing feature. This is necessary so we don’t inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
- The as_tensor() performance improvement will be greater when there are a lot of back and forth operations between numpy.ndarray objects and tensor objects. However, if there is just a single load operation, there shouldn’t be much impact from a performance perspective.

### Reshaping operations

We have the following high-level categories of operations:

- Reshaping operations
- Element-wise operations
- Reduction operations
- Access operations

#### Reshaping operations for tensors

Reshaping operations are perhaps the most important type of tensor operations. This is because, like we mentioned in the post where we introduced tensors, the shape of a tensor gives us something concrete we can use to shape an intuition for our tensors.



In [18]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

print (t.size())
print (t.shape)
print ("Rank:", len(t.shape))
print ("Num elems:", torch.tensor(t.shape).prod())
print ("Num elems:", t.numel())

torch.Size([3, 4])
torch.Size([3, 4])
Rank: 2
Num elems: tensor(12)
Num elems: 12


The number of elements contained within a tensor is important for reshaping because the reshaping must account for the total number of elements present. Reshaping changes the tensor's shape but not the underlying data. Our tensor has 12 elements, so any reshaping must account for exactly 12 elements.

Let’s look now at all the ways in which this tensor t can be reshaped without changing the rank:



In [24]:
t.reshape([1,12])
t.reshape([2,6])
t.reshape([4,3])
t.reshape(12,1)

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [2.],
        [2.],
        [2.],
        [2.],
        [3.],
        [3.],
        [3.],
        [3.]])

 The underlying logic is the same for higher dimensional tenors even though we may not be able to use the intuition of rows and columns in higher dimensional spaces. For example:

In [37]:
t.reshape([2,2,3])

tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])

In this example, we increase the rank to 3, and so we lose the rows and columns concept. However, the product of the shape's components (2,2,3) still has to be equal to the number of elements in the original tensor ( 12).

Note that PyTorch has another function that you may see called view() that does the same thing as the reshape() function, but don't let these names through you off. No matter which deep learning framework we are using, these concepts will be the same.

#### Changing shape by squeezing and unsqueezing

The next way we can change the shape of our tensors is by squeezing and unsqueezing them.

- Squeezing a tensor removes the dimensions or axes that have a length of one.
- Unsqueezing a tensor adds a dimension with a length of one.

These functions allow us to expand or shrink the rank (number of dimensions) of our tensor. Let’s see this in action.

In [48]:
print(t.reshape([1,12]))
print(t.reshape([1,12]).shape)
print(t.reshape([1,12]).squeeze())
print(t.reshape([1,12]).squeeze().shape)

print(t.reshape([1,12]).squeeze().unsqueeze(dim=0))
print(t.reshape([1,12]).squeeze().unsqueeze(dim=0).shape)
print(t.reshape([1,12]).squeeze().unsqueeze(dim=1))
print(t.reshape([1,12]).squeeze().unsqueeze(dim=1).shape)

# t remains unchanged in all of these ops

print (t)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])
torch.Size([1, 12])
tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])
torch.Size([12])
tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])
torch.Size([1, 12])
tensor([[1.],
        [1.],
        [1.],
        [1.],
        [2.],
        [2.],
        [2.],
        [2.],
        [3.],
        [3.],
        [3.],
        [3.]])
torch.Size([12, 1])
tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])


#### Flatten a tensor

A flatten operation on a tensor reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as a 1d-array of elements.

Flattening a tensor means to remove all of the dimensions except for one.



In [52]:
t.reshape(-1)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

In [55]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

The flatten() function takes in a tensor t as an argument.

Since the argument t can be any tensor, we pass -1 as the second argument to the reshape() function. In PyTorch, the -1 tells the reshape() function to figure out what the value should be based on the number of elements contained within the tensor. Remember, the shape must equal the product of the shape's component values. This is how PyTorch can figure out what the value should be, given a 1 as the first argument.

Since our tensor t has 12 elements, the reshape() function is able to figure out that a 12 is required for the length of the second axis.

After squeezing, the first axis (axis-0) is removed, and we obtain our desired result, a 1d-array of length 12.

Here's an example of this in action:

In [58]:
t = torch.ones(4, 3)
print (t)

flatten(t)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])


tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In a future post when we begin building a convolutional neural network, we will see the use of this flatten() function. We'll see that flatten operations are required when passing an output tensor from a convolutional layer to a linear layer.

In these examples, we have flattened the entire tensor, however, it is possible to flatten only specific parts of a tensor. For example, suppose we have a tensor of shape [2,1,28,28] for a CNN. This means that we have a batch of 2 grayscale images with height and width dimensions of 28 x 28, respectively.

Here, we can specifically flatten the two images. To get the following shape: [2,1,784]. We could also squeeze off the channel axes to get the following shape: [2,784].

#### Concatenating tensors

We combine tensors using the cat() function, and the resulting tensor will have a shape that depends on the shape of the two input tensors.



In [60]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
t2 = torch.tensor([
    [5,6],
    [7,8]
])

# We can combine t1 and t2 row-wise (axis-0) in the following way:

print (torch.cat((t1,t2), dim=0))

print (torch.cat((t1,t2), dim=1))

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])
