# 1. Import statements

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim

# 2. Tensor Basics

## Create Tensor

Initialize random 5x3 matrix: 

In [7]:
x = torch.rand(5,3)

Construct a matrix filled zeros and of dtype long:

In [8]:
x = torch.zeros(5, 3, dtype=torch.long)

Construct a tensor directly from data:

In [9]:
x = torch.tensor([5.5, 3])

Converting a torch tensor to NumPy Array and vice versa:

In [10]:
numpy_array = x.numpy()
x = torch.from_numpy(numpy_array)

## Operations

Maße des Tensors:

In [13]:
x.size()

torch.Size([5, 3])

Grundoperationen von Tensoren:

In [22]:
x = torch.rand(5,3)
y = torch.rand(5,3)
z = x + y # Add x and y element wise
z = x * y # Multiply x and y element wise 
z = x + 2 # Add 2 to every element of x

Skalarprodukt (Dot Product):

Reshape tensor: 

In [15]:
x = torch.randn(4, 4) # 4x4
y = x.view(16) # 1x16
z = x.view(-1, 8) # 2x8

Indexing is like in NumPy:

In [None]:
x[:,1]

# 3. Autograd: Automatic Differentiation

## Dynamic Computational graph
It abstracts the complicated mathematics and helps us “magically” calculate gradients of high dimensional curves with only a few lines of code. On setting ``<Tensor>.requires_grad = True`` tensors start forming a backward graph that tracks every operation applied on them. <br>
The autograd class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. <br>
The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule. <br>
Gradient enabled tensors (variables) along with functions (operations) combine to create the dynamic computational graph. The flow of data and the operations applied to the data are defined at runtime hence constructing the computational graph dynamically. <br>
Each tensor has a `.grad_fn attribute` that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None). <br>
If you want to compute the derivatives, you can call `.backward()` on a Tensor.

# 4. PyTorch terminology
## Variable 
A wrapper around tensor is created called Variable to store more properties. <br>
Variable have certain properties:
- .data (the tensor under the variable) 
- .grad (the gradient computed for this variable, must be of the same shape and type of .data), 
- .requires_grad (boolean indicating whether to calculate gradient for the Variable during backpropagation)
- .grad_fn (the function that created this Variable, used when backproping the gradients).
- .volatile, whose function will be explained later on. 

Variable is available under `torch.autograd.Variable`

## Parameter
Parameter is a subclass of Variable so most behaviors are the same.
The most important difference is that if you use `nn.Parameter` in a `nn.Module's` constructor, it will be added into the modules parameters just like nn.Module object do. Here is an example:


In [43]:
class MyModule(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.variable = torch.autograd.Variable(torch.Tensor([5]))
        self.parameter = torch.nn.Parameter(torch.Tensor([10]))

Das bedeutet ...

## Functions
These transforms the input using some operation. These do not store any state or buffer, so, have no memory of their own and are completely predictable. A log function will give log value output of its inputs. Whereas, a linear layer cannot be a function, since it have internal states such as weights and biases. <br>
Whenever we need to create a new function we will create a subclass and inherit from `torch.autograd.Function`.

## Modules
In modules we can club our parameters, layers and functions. Whenever, we are backproping we will compute gradients for parameters of the module and child modules recursively.
Predefined modules are implemented under `torch.nn` as `torch.nn.Conv2d`, `torch.nn.Linear` etc. <br>
Whenever we need to define a new model (module) we will create a subclass and inherit from `torch.nn.Module`

## Use it inside the code

In [28]:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2


# 5. Neural Network

## Different Layer Typs

Convolutional Layer:

In [31]:
conv1 = nn.Conv2d(1, 6, 5) #in_channels, out_channels, kernel_size

Fully Connected Layer:

In [32]:
fc1 = nn.Linear(120, 84) #120 features in; 84 features out

LSTM:

In [38]:
lstm = nn.LSTM(3, 3) # 

## Different Functions
- Input of a function is its in front tensor and the input of this tensor is the previous x: `x = F.relu(self.fc1(x))`
- You can nest functions: `x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))`

Some examples:<br>
F.relu() <br>
F.dorpout() <br>
F.log_softmax() <br>
F.elu() <br>
F.sigmoid() <br>
...

## Get weights of each Layer
You can access weights of each layer which is defined in your network: `network.conv1.weights`. <br>
The layer weigth shape is accessable like this: `network.conv1.shape`

# 6. Build and Train Neural Network
## Define NN via Class

In [1]:
class Network(nn.Module):
    # inside __init__() you define the different layers
    def __init__(self):
        super(MeinNetz, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    # inside forward() you define the sequence of functions
    # input of a function is its in front tensor and the input of this tensor is the previous x
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # you can nest functions
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)
    
    def num_flat_features(self, x):
        size = x.size[1:]
        num = 1
        for i in size:
            num *= i
        return num
    

NameError: name 'nn' is not defined

## Initialize Network, Define Input and Target

Initialize Network

In [36]:
neural_network = Network()

Define input and target

In [39]:
input = Variable(torch.randn(10,10)) # 10x datapoints with 10 features
target = Variable(torch.Tensor([[0,1,1,1,0,1,1,1,0,0] for _ in range(10)]))

## Define Loss-function and Optimizer

In [42]:
loss_fn = nn.MSELoss()
optimizer = optim.SGD(netz.parameters(), lr=0.01)

## Train the model

In [None]:
hist = np.zeros(num_epochs)

for epoch in range(num_epochs):
    # Zero out gradient, else they will accumulate between epochs
    optim.zero_grad()
    
    # Forward pass
    out = neural_network(input)
    loss = loss_fn(out, target)
    hist[t] = loss.item()
    
    # Backward pass
    loss.backward()
    
    # Update parameters
    optimizer.step()

# 7. Data Loader
`torch.utils.data.Dataset` is an abstract class representing a dataset. Your custom dataset should inherit `Dataset` and override the following methods: 
- __ len __
- __ getitem __

We create a dataset class for our dataset. We will read the csv in __ init __ but leave the reading of images to __ getitem __ . <br>
This is memory efficient because all the images are not stored in the memory at once but read as required.

In [None]:
from torch.utils.data import Dataset, DataLoader

In [None]:
class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].as_matrix()
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

In [None]:
face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
                                    root_dir='data/faces/')

fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]

    print(i, sample['image'].shape, sample['landmarks'].shape)

    ax = plt.subplot(1, 4, i + 1)
    plt.tight_layout()
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    show_landmarks(**sample)

    if i == 3:
        plt.show()
        break