### Pytorch Tutorial for beginners


If you've tried building your own deep neural networks with TensorFlow and Keras, you're probably familiar with the frustration of debugging these libraries. Although they have a Python API, it's still hard to figure out exactly what went wrong with an error. They also don't work well with numpy, scipy, scikit-learn, Cython, and others. The PyTorch deep learning library has the claimed advantage of working well with Python and built for Python apologists. In addition, a nice property of PyTorch is the construction of a computational dynamic graph, the opposite of the static computational graphs presented in TensorFlow and Keras. PyTorch is now on the rise and is being used in development by Facebook, Twitter, NVIDIA and other companies.

The first question to consider is is PyTorch really better than TensorFlow? This is subjective as there are no big differences in terms of performance. In any case, PyTorch has become a serious contender in the competition between deep learning libraries. Let's start exploring the library, leaving the question of which is better to think about.

### PyTorch - Introduction

PyTorch is defined as an open source machine learning library for Python. It is used for applications such as natural language processing. It was originally developed by Facebook's artificial intelligence research group and the probabilistic programming software Uber Pyro, which is based on it.

PyTorch was originally developed by Hugh Perkins as a Python wrapper for LusJIT based on the Torch framework.

PyTorch reverse-engineers and embeds Torch in Python, sharing the same core C libraries for internal code. The PyTorch developers have tuned this internal code to work effectively with Python. They also retained GPU-based hardware acceleration as well as expandability features.

### Specifications
The main features of PyTorch are mentioned below −

1. **Simple Interface** - PyTorch offers an easy to use API hence it is considered very easy to work with Python. Running code in this environment is quite easy.

2. **Using Python** - This library seamlessly integrates with the Python data stack. Thus, it can use all the services and functionality offered by the Python environment.

3. **Computational Graphs** - PyTorch provides an excellent platform that offers dynamic computational graphs. So the user can change them at run time. This is very useful when the developer does not know how much memory is required to create a neural network model.

PyTorch is known for having three levels of abstraction as mentioned below:

* Tensor is an imperative n-dimensional array running on the GPU.

* Variable is a node in the computational graph. This stores the data and the gradient.

* Module - The level of the neural network in which data about the state or the weights being studied will be stored.

### Benefits of PyTorch
Following are the benefits of PyTorch:

1. The code is easy to debug and understand.

2. It includes many layers, just like a torch.

3. Includes many loss functions.

4. It can be thought of as a NumPy extension for GPUs.

5. This allows you to build networks, the structure of which depends on the calculations themselves.

### Computational graphs
The first thing to understand about any deep learning library is the idea of computational graphs. A computational graph is a set of computations, called nodes, that are connected in direct computational order. In other words, the selected node depends on the nodes in the input, which in turn performs calculations for other nodes. Below is a simple example of a computational graph for evaluating the expression a = (b + c) * (c + 2). You can break the calculation into the following steps:


![Simple-graph-example-260x300.png](attachment:2611ca42-e3a4-40c1-9b29-8074c0fe2ee7.png)

The advantage of using a computational graph is that each node is an independent functioning piece of code if it receives all the necessary input data. This allows you to optimize performance when performing calculations using multi-channel processing, parallel computing. All major deep learning frameworks (TensorFlow, Theano, PyTorch, and so on) include computational graph constructs that perform operations inside neural networks and backpropagate the error gradient.

### Tensors
Tensors are matrix-like data structures that are integral components in deep learning libraries and are used for efficient computation. Graphics processing units (GPUs) are efficient at computing operations between tensors, which has spurred a wave of opportunity in deep learning. In PyTorch, tensors can be defined in several ways:

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input/digit-recognizer"))

# Any results you write to the current directory are saved as output.

In [None]:
'''In pytorch, matrix(array) is called tensors. 3*3 matrix koy. This is 3x3 tensor.'''
# numpy array
array = [[1,2,3],[4,5,6]]
first_array = np.array(array) # We create numpy array with np.numpy() method -> 2x3 array
print("Array Type: {}".format(type(first_array))) # Type(): type of the array. In this example it is numpy
print("Array Shape: {}".format(np.shape(first_array))) # np.shape(): shape of the array. Row x Column
print(first_array)

In [None]:
import torch  # import pytorch library

tensor = torch.Tensor(2, 3)  # pytorch array

'''This code creates a tensor of size (2,3) filled with zeros. In this example, 
the first number is the number of rows, the second is the number of columns.'''
print("Array Type: {}".format(tensor.type)) # type
print("Array Shape: {}".format(tensor.shape)) # shape
print(tensor)

In [None]:

'''We can also create a tensor filled with random floats:'''
x = torch.rand(2, 3)
print(x)

In [None]:
'''Multiplying tensors, adding to each other, and other algebraic operations are simple:'''
x = torch.ones(2,3)  # tensor filled 1
print(x, '\n')
y = torch.ones(2,3) * 2  # tensor filled 2
print(y)
print('\n\n Result of adding tensors: \n', x + y)  # result of adding tensors

Allocation is one of the most used technique in coding. Therefore lets learn how to make it with pytorch.
In order to learn, compare numpy and tensor
- np.ones() = torch.ones()
- np.random.rand() = torch.rand()

In [None]:
# numpy ones
print("Numpy {}\n".format(np.ones((2,3))))

# pytorch ones
print(torch.ones((2,3)))

In [None]:
# numpy random
print("Numpy {}\n".format(np.random.rand(2,3)))

# pytorch random
print(torch.rand(2,3))

Even if when I use pytorch for neural networks, I feel better if I use numpy. Therefore, usually convert result of neural network that is tensor to numpy array to visualize or examine.
Lets look at conversion between tensor and numpy arrays.
- torch.from_numpy(): from numpy to tensor
- numpy(): from tensor to numpy

In [None]:
# random numpy array
array = np.random.rand(2,2)
print("{} {}\n".format(type(array),array))

# from numpy to tensor
from_numpy_to_tensor = torch.from_numpy(array)
print("{}\n".format(from_numpy_to_tensor))

# from tensor to numpy
tensor = from_numpy_to_tensor
from_tensor_to_numpy = tensor.numpy()
print("{} {}\n".format(type(from_tensor_to_numpy),from_tensor_to_numpy))

In [None]:
print('before: \n', y)
'''It is also available to work with the slice function in numpy. For example y[:,1]:'''
y[:,1] = y[:,1] + 1
print('\n after: \n', y)

Now you know how to create and work with tensors in PyTorch. The next step of the tutorial will be an overview of the more complex constructs in the library.

### Basic Math with Pytorch
- Resize: view()
- a and b are tensor.
- Addition: torch.add(a,b) = a + b
- Subtraction: a.sub(b) = a - b
- Element wise multiplication: torch.mul(a,b) = a * b
- Element wise division: torch.div(a,b) = a / b
- Mean: a.mean()
- Standart Deviation (std): a.std()

In [None]:
# create tensor 
tensor = torch.ones(3,3)
print("\n",tensor)

# Resize
print("{}{}\n".format(tensor.view(9).shape,tensor.view(9)))

# Addition
print("Addition: {}\n".format(torch.add(tensor,tensor)))

# Subtraction
print("Subtraction: {}\n".format(tensor.sub(tensor)))

# Element wise multiplication
print("Element wise multiplication: {}\n".format(torch.mul(tensor,tensor)))

# Element wise division
print("Element wise division: {}\n".format(torch.div(tensor,tensor)))

# Mean
tensor = torch.Tensor([1,2,3,4,5])
print("Mean: {}".format(tensor.mean()))

# Standart deviation (std)
print("std: {}".format(tensor.std()))

### Automatic Differentiation in PyTorch
Deep learning libraries have mechanisms for calculating the error gradient and backpropagating the error through the computational graph. This mechanism, called autogradient in PyTorch, is easily accessible and intuitive. The variable class is the main component of the autogradient system in PyTorch. The variable class wraps the tensor and allows you to automatically calculate the gradient on the tensor when you call the ***.backward()*** function. The object contains the data from the tensor, the gradient of the tensor (computed once with respect to some other value, loss) and also contains a reference to any function created by the variable (if it is a user-created function, the reference will be empty).



In [None]:
from torch.autograd import Variable  # import variable from pytorch library

'''Let's create a variable from a simple tensor:'''
x = Variable(torch.ones(2, 2) * 2, requires_grad=True)
print(x)

The variable declaration uses a 2x2 double tensor and additionally states that the variable needs a gradient. When using this variable in neural networks, it becomes capable of learning. If the last parameter is False, then the variable cannot be used for learning. In this simple example, we won't be training anything, but we want to query the gradient for this variable.


In [None]:
'''Next, let's create a new variable based on x.'''
z = 2 * (x * x) + 5 * x
print(z)

In [None]:
'''
To calculate the gradient of this operation in x, dz/dx, we can analytically obtain 4x + 5. 
If all elements of x are twos, then the gradient of dz/dx is a tensor of dimension (2,2) 
filled with numbers 13. However, first you need to run the inverse operation spreads .backwards() 
to calculate the gradient relative to something. In our case, the unit tensor (2,2) is initialized, 
relative to which we calculate the gradient. 
In this case, the calculation is just a d/dx operation:
'''

z.backward(torch.ones(2, 2))
print('The result is the following: \n', x.grad)

Note that this is exactly what we predicted at the beginning. Note that the gradient is stored in the x variable in the .grad property.

- Variables  accumulates gradients.
- We will use pytorch in neural network. And as you know, in neural network we have backpropagation where gradients are calculated. Therefore we need to handle gradients.
- Difference between variables and tensor is variable accumulates gradients.
- We can make math operations with variables, too.
- In order to make backward propagation we need variables

One more example:
- Assume we have equation y = x^2
- Define x = [2,4] variable
- After calculation we find that y = [4,16] (y = x^2)
- Recap o equation is that o = (1/2)sum(y) = (1/2)sum(x^2)
- deriavative of o = x
- Result is equal to x so gradients are [2,4]


Lets implement:

In [None]:
# lets make basic backward propagation
# we have an equation that is y = x^2
array = [2,4]
tensor = torch.Tensor(array)
x = Variable(tensor, requires_grad = True)
y = x**2
print(" y =  ",y)

# recap o equation o = 1/2*sum(y)
o = (1/2)*sum(y)
print(" o =  ",o)

# backward
o.backward() # calculates gradients

# As I defined, variables accumulates gradients. In this part there is only one variable x.
# Therefore variable x should be have gradients
# Lets look at gradients with x.grad
print("gradients: ",x.grad)

We have learned the simplest operations with tensors, variables and the autogradient function in PyTorch. Now let's start writing a simple neural network in PyTorch, which will be a showcase for these functions in the future.

### Creating a Neural Network in PyTorch
Here we will create a simple neural network with 4 layers, including an input layer and two hidden layers, to classify handwritten characters in the MNIST dataset. The architecture that we will use is shown in the picture:

![CNTK-Dense-example-architecture-1.jpg](attachment:cc02e7c8-9c9e-4821-86a9-7c9ecc97d994.jpg)


The input layer consists of 28 x 28 = 784 grayscale pixels that make up the input data in the MNIST dataset. The input data is then passed through two hidden layers, each containing 200 nodes using a linear rectifier activation function (ReLU). Finally, we have an output layer with ten nodes corresponding to ten handwritten digits from 0 to 9. For such a classification problem, we will use a softmax output layer.

### Class for building a neural network
To create a neural network in PyTorch, the nn.Module class is used. To use it, you need inheritance, which will allow you to use all the functionality of the nn.Module base class, but it is still possible to rewrite the base class to construct a model or directly pass through the network. The code below will help explain this:

In [None]:
'''
The main data structure torch.nn is a module, which is an abstract concept that can represent a specific layer in a neural network, 
or a neural network containing many layers. In practice the most common way is to inherit nn.Module and write your own network/layer. 
Let's first see how to use nn.Module to implement your own fully connected layer. A fully connected layer, also known as an affine layer
'''
# libraries
import torch.nn as nn  # base class
import torch.nn.functional as F

class Net(nn.Module):  # In such a definition, you can see the inheritance of the base class nn.Module
    def __init__(self):  # In the first line of class initialization, 
        super(Net, self).__init__()  # Python super() function that creates an object of the base class
        
        # in the next three lines, we create fully connected layers 
        '''
        A fully connected neural network layer is represented by an nn.Linear object, 
        in which the first argument is the number of nodes in the i-th layer, and the second 
        is the number of nodes in the i+1 layer. As you can see from the code, the first layer 
        takes 28x28 pixels as input and connects to the first hidden layer with 200 nodes.
        '''
        self.fc1 = nn.Linear(28 * 28, 200)
        
        # Next comes the connection to another hidden layer with 200 nodes
        self.fc2 = nn.Linear(200, 200)
        
        
        # And finally, connecting the last hidden layer to the output layer with 10 nodes
        self.fc3 = nn.Linear(200, 10)
        
        '''
        After defining the skeleton of the network architecture, it is necessary to set 
        the principles by which data will move through it. This is done with the forward() method 
        being defined, which overrides the dummy method in the base class and requires a per-network definition
        '''

    def forward(self, x):  # For the forward() method, we take the input data x as the main argument
        # Next, load everything in the first fully connected layer self.fc1(x) and apply the ReLU activation
        # function to the nodes in that layer using F.relu()
        x = F.relu(self.fc1(x))

        # Due to the hierarchical nature of this neural network, we replace x at each stage and send 
        # it to the next layer
        x = F.relu(self.fc2(x))

        # We do this procedure on three connected layers, except for the last one.       
        x = self.fc3(x)

        # On the last layer, we return not ReLU, but the logarithmic softmax activation function. 
        # This, combined with the negative log-likelihood loss function, yields a multi-class 
        # cross-entropy-based loss function that we will use to train the network.  
        return F.log_softmax(x)   
        
    

In [None]:
'''We have defined a neural network. The next step is to create an instance of this architecture:'''

model = Net()
print('When outputting an instance of the Net class, we get the following: \n', model)
print('Which is very convenient, as it confirms the structure of our neural network.')

### Network training
Next, you need to specify the optimization method and quality criterion:

In [None]:
import torch.optim as optim  # is a package implementing various optimization algorithms. 
# Most commonly used methods are already supported, and the interface is general enough, 
# so that more sophisticated ones can be also easily integrated in the future

learning_rate = 0.01
# In the first line, we create an optimizer based on stochastic gradient descent, 
# setting the learning rate (in our case, we will define this indicator at 0.01)

# Perform optimization by stochastic gradient descent
# Even in the optimizer, you need to define all the other network parameters, 
# but this is done easily in PyTorch thanks to the .parameters() method 
# in the nn.Module base class, which is inherited from it into the new Net class
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

'''
Next, a quality control metric is set, the negative log-likelihood loss function. 
This type of function, combined with the logarithmic softmax function at the output 
of the neural network, gives the equivalent cross-entropy loss for 10 classes of the 
classification problem.
'''
# Create a loss function
error = nn.NLLLoss()

The outer training loop goes through the number of epochs, and the inner training loop goes through all the training data in batches, the size of which is set in the code as batch_size. On the next line, we convert the data and the target variable into PyTorch variables. The input dataset has a size of (batch_size, 1, 28, 28) when retrieved from the data loader. Such a 4D tensor is more suitable for a convolutional neural network architecture than for our fully connected network. However, it is necessary to reduce the data dimension from (1,28,28) to the one-dimensional case for 28 x 28 = 784 input nodes.

In [None]:
# Import Libraries
from torch.utils.data import DataLoader

from sklearn.model_selection import train_test_split

<a id="3"></a> <br>
### Prepare Dataset


        - Dataset - there are 28*28 images and 10 labels from 0 to 9
        - Data is not normalized so we divide each image to 255 that is basic normalization for images.
        - In order to split data, we use train_test_split method from sklearn library
        - Size of train data is 80% and size of test data is 20%.
        - Create feature and target tensors. At the next parts we create variable from these tensors. As you remember we need to define variable for accumulation of gradients.
        - batch_size = batch size means is that for example we have data and it includes 1000 sample. We can train 1000 sample in a same time or we can divide it 10 groups which include 100 sample and train 10 groups in order. Batch size is the group size. For example, I choose batch_size = 100, that means in order to train all data only once we have 336 groups. We train each groups(336) that have batch_size(quota) 100. Finally we train 33600 sample one time.
        - epoch: 1 epoch means training all samples one time.
        - In our example: we have 33600 sample to train and we decide our batch_size is 100. Also we decide epoch is 29(accuracy achieves almost highest value when epoch is 29). Data is trained 29 times. Question is that how many iteration do I need? Lets calculate: 
            - training data 1 times = training 33600 sample (because data includes 33600 sample) 
            - But we split our data 336 groups(group_size = batch_size = 100) our data 
            - Therefore, 1 epoch(training data only once) takes 336 iteration
            - We have 29 epoch, so total iterarion is 9744(that is almost 10000 which I used)
        - TensorDataset(): Data set wrapping tensors. Each sample is retrieved by indexing tensors along the first dimension.
        - DataLoader(): It combines dataset and sample. It also provides multi process iterators over the dataset.
        - Visualize one of the images in dataset

In [None]:
# Prepare Dataset
# load data
train = pd.read_csv(r"../input/digit-recognizer/train.csv",dtype = np.float32)

# split data into features(pixels) and labels(numbers from 0 to 9)
targets_numpy = train.label.values
features_numpy = train.loc[:,train.columns != "label"].values/255 # normalization

# train test split. Size of train data is 80% and size of test data is 20%. 
features_train, features_test, targets_train, targets_test = train_test_split(features_numpy,
                                                                             targets_numpy,
                                                                             test_size = 0.2,
                                                                             random_state = 42) 

# create feature and targets tensor for train set. As you remember we need variable to accumulate gradients. Therefore first we create tensor, then we will create variable
featuresTrain = torch.from_numpy(features_train)
targetsTrain = torch.from_numpy(targets_train).type(torch.LongTensor) # data type is long

# create feature and targets tensor for test set.
featuresTest = torch.from_numpy(features_test)
targetsTest = torch.from_numpy(targets_test).type(torch.LongTensor) # data type is long

# batch_size, epoch and iteration
batch_size = 100
n_iters = 10000
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)

# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)

# data loader
train_loader = DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = DataLoader(test, batch_size = batch_size, shuffle = False)


In [None]:
# visualize one of the images in data set
plt.imshow(features_numpy[1].reshape(28,28))
plt.axis("off")
plt.title(str(targets_numpy[10]))
plt.savefig('graph.png')
plt.show()

It's time to train the neural network. During training, the data will be fetched from the data load object. From the loader, input and target data will come in batches, which will be fed into our neural network and loss function, respectively. Below is the complete code for training:

In [None]:
#  Traning the Model

count = 0
loss_list = []
iteration_list = []
accuracy_list = []


for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        '''
        The .view() function works with PyTorch variables and transforms their shape. If we don't know exactly the dimension of a given dimension, 
        we can use the '-1' notation in the dimension definition. So when using data.view(-1.28*28) we can say that the second dimension 
        should be 28 x 28 and the first dimension should be calculated from the size of the original data variable. In practice, 
        this means that the data will now be of size (batch_size, 784). We can pass this batch of input data into our neural network, 
        and the magical PyTorch will do the hard work for us, effectively performing the necessary calculations with tensors.
        '''
        # Define variables
        train = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        
        '''On the next line, we run optimizer.zero_grad() which zeros out or restarts the gradients in the model so that they are ready 
        for further backpropagation. Other libraries implement this implicitly, but keep in mind that PyTorch does this explicitly.'''
        # Clear gradients
        optimizer.zero_grad()
        
        
        # Forward propagation
        '''In the next line, we submit a portion of data to the input of our model, calls the forward() method in the Net class.'''
        outputs = model(train)
        
        # Calculate softmax and cross entropy loss
        # This line of code initializes the negative log-likelihood loss between the output of our neural network and the true labels of the given batch of data.
        '''After running the string, the outputs variable will have the logarithmic softmax output from our neural network for 
        the given batch of data. This is one of the great things about PyTorch, as you can activate any standard Python debugger 
        you normally use and instantly see what's going on in the neural network. This is in contrast to other deep learning libraries, 
        TensorFlow and Keras, which require complex debugging to find out what your neural network is actually creating. 
        I hope you play around with the code for this tutorial and see how handy PyTorch's debugger'''        
        loss = error(outputs, labels)  
        
        # Calculate gradients
        '''The next line runs the error backpropagation operation from the loss variable back through the neural network. 
        If you compare this with the .backward() operation mentioned above, which we looked at in the tutorial, you can see 
        that no argument is used in the .backward() operation. Scalar variables require no argument when .backward() is used on them; 
        only tensors need a matched argument to pass to the .backward() operation.'''
        loss.backward()
        
        # Update parameters
        '''In the next line, we ask PyTorch to perform stepwise gradient descent based on the gradients computed during the .backward() operation.'''
        optimizer.step()
        
        count += 1
        
        # Prediction
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Predict test dataset
            for images, labels in test_loader:
                '''
                The .view() function works with PyTorch variables and transforms their shape. If we don't know exactly the dimension of a given dimension, 
                we can use the '-1' notation in the dimension definition. So when using data.view(-1.28*28) we can say that the second dimension 
                should be 28 x 28 and the first dimension should be calculated from the size of the original data variable. In practice, 
                this means that the data will now be of size (batch_size, 784). We can pass this batch of input data into our neural network, 
                and the magical PyTorch will do the hard work for us, effectively performing the necessary calculations with tensors.
                '''
                test = Variable(images.view(-1, 28*28))
                
                # Forward propagation
                '''In the next line, we submit a portion of data to the input of our model, calls the forward() method in the Net class.'''
                outputs = model(test)
                
                # Get predictions from the maximum value
                '''the data.max(1) method, which returns the index of the largest value in a particular tensor dimension. 
                Now the output of our neural network will have a size of (batch_size, 10), where each value from the second dimension 
                of length 10 is the log probability that the neural network assigns to each output class (that is, it is the log 
                probability of the picture belonging to the symbol from 0 to 9). The value with the highest logarithmic probability 
                is the number from 0 to 9 that the neural network recognizes in the input image. 
                In other words, it is the best prediction for the given input feature. The .max(1) function determines this maximum 
                value in the second space (if we want to find the maximum in the first space, we must change the function 
                argument from 1 to 0) and immediately returns both the maximum found value and the corresponding index. 
                Therefore, this construct has a size of (batch_size, 2). 
                In this case, we are interested in the index of the maximum found value, which we access by calling .max(1)[1].'''
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)
                
                # Total correct predictions
                '''We now have a neural network prediction for each example in a particular batch of inputs, and we can compare 
                it to the actual class label from the training dataset. This is used to count the number of correct answers.'''
                correct += (predicted == labels).sum()
            
            '''We get a counter of the number of times the neural network gives the correct answer. Based on the accumulated sum 
            of correct predictions, one can determine the overall accuracy of the network on the training dataset. 
            Finally, iterating over each batch of input data, we derive the average value of the loss function and the accuracy of the model:'''
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            accuracy_list.append(accuracy)
            iteration_list.append(count)
        
        '''Finally, we will print the results every time the model reaches a certain number of iterations:
        This function prints out our progress over the epochs of training and shows the error of the neural network at that moment.'''
        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {}%  Epoch:{}'.format(count, loss.data, accuracy, epoch))

In [None]:
# visualization
plt.plot(iteration_list,loss_list)
plt.xlabel("Number of iteration")
plt.ylabel("Loss")
plt.title("Base class: Loss vs Number of iteration")
plt.show()

# visualization accuracy 
plt.plot(iteration_list, accuracy_list,color = "red")
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("Base class: Accuracy vs Number of iteration")
plt.show()

### Great, we have learned how to create and train our basic model!
However, we went to this slowly and measuredly, understanding each step. That's how it should be done. No need to mindlessly copy the code, you need to understand what kind of code it is and what it does. 

[Here](https://www.kaggle.com/andrej0marinchenko/pytorch-base-class-for-beginners) I prepared the same code in a compressed form, but with a test data set, and also received the result of the model prediction for evaluation on the leaderboard.

**Public Score - 0.97125**

<a id="3"></a> <br>
### Logistic Regression


- We use logistic regression for classification.
- linear regression + logistic function(softmax) = logistic regression

- **Steps of Logistic Regression**
    
    1. Create Logistic Regression Model
        - Same with linear regression.
        - However as you expect, there should be logistic function in model right?
        - In pytorch, logistic function is in the loss function where we will use at next parts.
    2. Instantiate Model
        - input_dim = 28x28 # size of image px*px
        - output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9
        - create model
    3. Instantiate Loss 
        - Cross entropy loss
        - It calculates loss that is not surprise :)
        - It also has softmax(logistic function) in it.
    4. Instantiate Optimizer 
        - SGD Optimizer
    5. Traning the Model
    6. Prediction
- As a result, as you can see from plot, while loss decreasing, accuracy(almost 85%) is increasing and our model is learning(training).  

In [None]:
# Create Logistic Regression Model
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LogisticRegressionModel, self).__init__()
        # Linear part
        self.linear = nn.Linear(input_dim, output_dim)
        # There should be logistic function right?
        # However logistic function in pytorch is in loss function
        # So actually we do not forget to put it, it is only at next parts
    
    def forward(self, x):
        out = self.linear(x)
        return out

# Instantiate Model Class
input_dim = 28*28 # size of image px*px
output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9

# create logistic regression model
model = LogisticRegressionModel(input_dim, output_dim)

# Cross Entropy Loss  
error = nn.CrossEntropyLoss()

# SGD Optimizer 
learning_rate = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# Traning the Model
count = 0
loss_list = []
iteration_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        # Define variables
        train = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and cross entropy loss
        loss = error(outputs, labels)
        
        # Calculate gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        # Prediction
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Predict test dataset
            for images, labels in test_loader: 
                test = Variable(images.view(-1, 28*28))
                
                # Forward propagation
                outputs = model(test)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {}%'.format(count, loss.data, accuracy))

In [None]:
# visualization
plt.plot(iteration_list,loss_list)
plt.xlabel("Number of iteration")
plt.ylabel("Loss")
plt.title("Logistic Regression Model: Loss vs Number of iteration")
plt.show()

# visualization accuracy 
plt.plot(iteration_list, accuracy_list,color = "red")
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("Logistic Regression Model: Accuracy vs Number of iteration")
plt.show()

<a id="4"></a> <br>
### Artificial Neural Network (ANN)
- Logistic regression is good at classification but when complexity(non linearity) increases, the accuracy of model decreases.
- Therefore, we need to increase complexity of model.
- In order to increase complexity of model, we need to add more non linear functions as hidden layer. 
- What we expect from artificial neural network is that when complexity increases, we use more hidden layers and our model can adapt better. As a result accuracy increase.
- **Steps of ANN:**
   
    1. Create ANN Model
        - We add 3 hidden layers.
        - We use ReLU, Tanh and ELU activation functions for diversity.
    2. Instantiate Model Class
        - input_dim = 28x28 # size of image px*px
        - output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9
        - Hidden layer dimension is 150. I only choose it as 150 there is no reason. Actually hidden layer dimension is hyperparameter and it should be chosen and tuned. You can try different values for hidden layer dimension and observe the results.
        - create model
    3. Instantiate Loss
        - Cross entropy loss
        - It also has softmax(logistic function) in it.
    4. Instantiate Optimizer
        - SGD Optimizer
    5. Traning the Model
    6. Prediction


In [None]:
# Create ANN Model
class ANNModel(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(ANNModel, self).__init__()
        
        # Linear function 1: 784 --> 150
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.relu1 = nn.ReLU()
        
        # Linear function 2: 150 --> 150
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 2
        self.tanh2 = nn.Tanh()
        
        # Linear function 3: 150 --> 150
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 3
        self.elu3 = nn.ELU()
        
        # Linear function 4 (readout): 150 --> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)  
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        # Non-linearity 1
        out = self.relu1(out)
        
        # Linear function 2
        out = self.fc2(out)
        # Non-linearity 2
        out = self.tanh2(out)
        
        # Linear function 2
        out = self.fc3(out)
        # Non-linearity 2
        out = self.elu3(out)
        
        # Linear function 4 (readout)
        out = self.fc4(out)
        return out

# instantiate ANN
input_dim = 28*28
hidden_dim = 150 #hidden layer dim is one of the hyper parameter and it should be chosen and tuned. For now I only say 150 there is no reason.
output_dim = 10

# Create ANN
model = ANNModel(input_dim, hidden_dim, output_dim)

# Cross Entropy Loss 
error = nn.CrossEntropyLoss()

# SGD Optimizer
learning_rate = 0.02
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# ANN model training
count = 0
loss_list = []
iteration_list = []
accuracy_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        train = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and ross entropy loss
        loss = error(outputs, labels)
        
        # Calculating gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Predict test dataset
            for images, labels in test_loader:

                test = Variable(images.view(-1, 28*28))
                
                # Forward propagation
                outputs = model(test)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)

                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
            accuracy_list.append(accuracy)
        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {} %'.format(count, loss.data, accuracy))

In [None]:
# visualization loss 
plt.plot(iteration_list,loss_list)
plt.xlabel("Number of iteration")
plt.ylabel("Loss")
plt.title("ANN: Loss vs Number of iteration")
plt.show()

# visualization accuracy 
plt.plot(iteration_list,accuracy_list,color = "red")
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("ANN: Accuracy vs Number of iteration")
plt.show()

- As a result, as you can see from plot, while loss decreasing, accuracy is increasing and our model is learning(training). 
- Thanks to hidden layers model learnt better and accuracy(almost 95%) is better than accuracy of logistic regression model.

<a id="5"></a> <br>
### Convolutional Neural Network (CNN)
- CNN is well adapted to classify images.
- Convolutional Neural Networks
- Convolutional neural networks are designed to process data through multiple levels of arrays. This type of neural networks is used in applications such as image recognition or face recognition.
- The main difference between CNN and any other conventional neural network is that CNN takes input as a 2D array and works directly on images instead of focusing on feature extraction which other neural networks focus on.
- The dominant CNN approach involves solving recognition problems. Leading companies such as Google and Facebook have invested in research and development projects for recognition projects to take actions faster.
- The CNN class of neural networks is defined as multilayer neural networks designed to detect complex features in data. This is the most applicable type in computer vision applications.

- **Steps of CNN:**

    1. Convolutional layer: 
        - Create feature maps with filters(kernels).
        - Padding: After applying filter, dimensions of original image decreases. However, we want to preserve as much as information about the original image. We can apply padding to increase dimension of feature map after convolutional layer.
        - We use 2 convolutional layer.
        - Number of feature map is out_channels = 16
        - Filter(kernel) size is 5*5
    1. Pooling layer: 
        - Prepares a condensed feature map from output of convolutional layer(feature map) 
        - 2 pooling layer that we will use max pooling.
        - Pooling size is 2*2
    1. Flattening: Flats the features map
    1. Fully Connected Layer: 
        - Artificial Neural Network that we learnt at previous part.
        - Or it can be only linear like logistic regression but at the end there is always softmax function.
        - We will not use activation function in fully connected layer.
        - You can think that our fully connected layer is logistic regression.
        - We combine convolutional part and logistic regression to create our CNN model.
    1. Instantiate Model Class
        - create model
    1. Instantiate Loss
        - Cross entropy loss
        - It also has softmax(logistic function) in it.
    1. Instantiate Optimizer
        - SGD Optimizer
    1. Traning the Model
    1. Prediction        

In [None]:
# Create CNN Model
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # flatten
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

# batch_size, epoch and iteration
batch_size = 100
n_iters = 2500
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)

# Pytorch train and test sets
train = torch.utils.data.TensorDataset(featuresTrain,targetsTrain)
test = torch.utils.data.TensorDataset(featuresTest,targetsTest)

# data loader
train_loader = torch.utils.data.DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(test, batch_size = batch_size, shuffle = False)
    
# Create CNN
model = CNNModel()

# Cross Entropy Loss 
error = nn.CrossEntropyLoss()

# SGD Optimizer
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# CNN model training
count = 0
loss_list = []
iteration_list = []
accuracy_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        train = Variable(images.view(100,1,28,28))
        labels = Variable(labels)
        
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and ross entropy loss
        loss = error(outputs, labels)
        
        # Calculating gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        if count % 50 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                
                test = Variable(images.view(100,1,28,28))
                
                # Forward propagation
                outputs = model(test)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += len(labels)
                
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
            accuracy_list.append(accuracy)
        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {} %'.format(count, loss.data, accuracy))

In [None]:
# visualization loss 
plt.plot(iteration_list,loss_list)
plt.xlabel("Number of iteration")
plt.ylabel("Loss")
plt.title("CNN: Loss vs Number of iteration")
plt.show()

# visualization accuracy 
plt.plot(iteration_list,accuracy_list,color = "red")
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("CNN: Accuracy vs Number of iteration")
plt.show()

- As a result, as you can see from plot, while loss decreasing, accuracy is increasing and our model is learning(training). 
- Thanks to convolutional layer, model learnt better and accuracy(almost 98%) is better than accuracy of ANN. Actually while tuning hyperparameters, increase in iteration and expanding convolutional neural network can increase accuracy but it takes too much running time that we do not want at kaggle.  

<a id="1"></a> <br>
### Recurrent Neural Network (RNN)
- RNN is essentially repeating ANN but information get pass through from previous non-linear activation function output.
- The idea of ​​recurrence is to use information consistently. In traditional neural networks, information processing follows a different principle: there is no sequence, only “chaotic” and independence of inputs and outputs in information processing. This approach may not work effectively in all cases. For example, you need to define some value, but you need to take the previous value as the basis. In this case, the traditional neural network will not cope with the task, but the recurrent one will. Why is it possible? Because in a recurrent neural network there is such a thing as memory. The memory stores information about the previous value, and based on the previous value, the next one is determined.

Recurrent neural networks work with given values. They take certain fixed inputs and return the same fixed output. Recurrent networks explore values ​​according to well-defined principles.![pasted image 0.png](attachment:a61f999b-4b54-4704-97a9-b24c1946a3aa.png)
- **Steps of RNN:**

    1. Create RNN Model
        - hidden layer dimension is 100
        - number of hidden layer is 1 
    2. Instantiate Model
    3. Instantiate Loss
        - Cross entropy loss
        - It also has softmax(logistic function) in it.
    4. Instantiate Optimizer
        - SGD Optimizer
    5. Traning the Model
    6. Prediction

In [None]:
from torch.utils.data import TensorDataset

# Create RNN Model
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        
        # Number of hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # RNN
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        
        # Initialize hidden state with zeros
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
            
        # One time step
        out, hn = self.rnn(x, h0)
        out = self.fc(out[:, -1, :]) 
        return out

# batch_size, epoch and iteration
batch_size = 100
n_iters = 8000
num_epochs = n_iters / (len(features_train) / batch_size)
num_epochs = int(num_epochs)

# Pytorch train and test sets
train = TensorDataset(featuresTrain,targetsTrain)
test = TensorDataset(featuresTest,targetsTest)

# data loader
train_loader = DataLoader(train, batch_size = batch_size, shuffle = False)
test_loader = DataLoader(test, batch_size = batch_size, shuffle = False)
    
# Create RNN
input_dim = 28    # input dimension
hidden_dim = 100  # hidden layer dimension
layer_dim = 1     # number of hidden layers
output_dim = 10   # output dimension

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

# Cross Entropy Loss 
error = nn.CrossEntropyLoss()

# SGD Optimizer
learning_rate = 0.05
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
seq_dim = 28  
loss_list = []
iteration_list = []
accuracy_list = []
count = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        train  = Variable(images.view(-1, seq_dim, input_dim))
        labels = Variable(labels )
            
        # Clear gradients
        optimizer.zero_grad()
        
        # Forward propagation
        outputs = model(train)
        
        # Calculate softmax and ross entropy loss
        loss = error(outputs, labels)
        
        # Calculating gradients
        loss.backward()
        
        # Update parameters
        optimizer.step()
        
        count += 1
        
        if count % 250 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward propagation
                outputs = model(images)
                
                # Get predictions from the maximum value
                predicted = torch.max(outputs.data, 1)[1]
                
                # Total number of labels
                total += labels.size(0)
                
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / float(total)
            
            # store loss and iteration
            loss_list.append(loss.data)
            iteration_list.append(count)
            accuracy_list.append(accuracy)

        if count % 500 == 0:
            # Print Loss
            print('Iteration: {}  Loss: {}  Accuracy: {} %'.format(count, loss.data, accuracy))

In [None]:
# visualization loss 
plt.plot(iteration_list,loss_list)
plt.xlabel("Number of iteration")
plt.ylabel("Loss")
plt.title("RNN: Loss vs Number of iteration")
plt.show()

# visualization accuracy 
plt.plot(iteration_list,accuracy_list,color = "red")
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("RNN: Accuracy vs Number of iteration")
plt.savefig('graph.png')
plt.show()

### Conclusion
In this tutorial, we learn: 
1. Basics of pytorch
2. Logistic regression with pytorch
3. Artificial neural network with with pytorch
4. Convolutional neural network with pytorch
5. Recurrent neural network with pytorch