In this notebook, we're going to actually visualize the learning process of neural networks. We're going going to see a step by step visualization on how neural networks are learning. And in this notebook, we're also going to actually understand the essence of neural networks and how good they are at separating non-linear data(using non-linear activation functions and multiple layers of neurons.).

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import sklearn.datasets

So the package ***torch.nn.functional*** is going to allow us to use the Relu activation function, which is the non-linearity.

***sklearn.datasets*** to load a ready made dataset.

In [None]:
# Now importing the dataset
x,y = sklearn.datasets.make_moons(200, noise=.20)

***sklearn.datasets.make_moons(200, noise=0.20)*** is using the **make_moons** function from the scikit-learn library to generate a synthetic dataset(A synthetic dataset is a dataset that is artificially generated rather than being collected from real-world observations or measurements) with two classes (binary classification) that resembles two crescent-shaped moon patterns. 

**200** is the number of samples or data points to generate.

**noise=0.20** specifies the amount of random noise to be added to the data.

**x** = input = feature matrix;

**y** = expected output = target labels or classes associated with each data point. In this binary classification scenario, y will contain labels that indicate which class (0 or 1) each data point belongs to.

In [None]:
x

**x** has two columns, each representing a different feature. These features  represents the coordinates (x and y positions) of each data point on a two-dimensional plane.

In [None]:
y

since this is a binary classification problem. **y** has an array of 0 or 1. That means, each 0 or 1 indicate each point on the plane belong to either class 0 or class 1

In [None]:
# Let's plot x
plt.scatter(x[:,0], x[:,1], s=40, c=y, cmap=plt.cm.Spectral)
plt.show()

As we see the 2 classes can't be seperated by a straight line. So non-linearity is introduced here. And we call them non-linearly seperable. That's where we need to use more complex algorithm rather than just linear or logistic regression.

**x[:,0]** and **x[:,1]** are used for the x and y coordinates of the data points in the plot.

**s=40** sets the size of the data points to be relatively large (size 40).

**c=y** assigns colors to the data points based on the values in the y variable. Each class (0 or 1) is given a different color.

**cmap=plt.cm.Spectral** specifies the color scheme for the plot, with distinct colors for each class.

The resulting plot visually displays data points on a graph, where each point's position is determined by its features (x and y values), and the colors represent which class (0 or 1) each data point belongs to. 

In [None]:
# Convert the dataset into torch tensor.
x = torch.FloatTensor(x)
y = torch.LongTensor(y)

**FloatTensor**: Use this when your data contains decimal numbers or real values. It's suitable for tasks involving measurements, sensor readings, or any data with decimal points.

**LongTensor**: Use this when your data contains whole numbers, typically for tasks like classification where you have discrete class labels represented as integers.

In [None]:
x

In [None]:
y

Now we're going to build and structure our neural network(Input layer, 1 hidden layer and output layer). And to do that we're going to build a class and inside this class, we're going to structure the neural network as well as apply the forward propagation function.

In [None]:
class FeedForward(torch.nn.Module):
    def __init__(self, input_neurons, hidden_neurons, output_neurons):  #since we have 3 layers
        super(FeedForward, self).__init__()
        #we're going to start building our layers.
        self.hidden = nn.Linear(input_neurons, hidden_neurons) #hiddenlayer
        self.output = nn.Linear(hidden_neurons, output_neurons)  #outputlayer
        
    def forward(self, x):
        #the forward function is already included in the module 
        #but we are overriding it to suit our current NN.
        out = self.hidden(x)
        out = F.relu(out)
        out = self.output(out)
        return out

        

We've defined our class that defines our neural network, let's create an object of that network.

In [None]:
network = FeedForward(input_neurons = 2, hidden_neurons = 50, output_neurons = 2)
optimizer = torch.optim.SGD(network.parameters(), lr = 0.02)
loss_function = torch.nn.CrossEntropyLoss()

since, we have 2 columns only in x, that means 2 input features. And for this NN we take 50 neurons in hidden layers and since it's a binary classification so there are 2 output neurons.

To use BCELoss, we have to have 1 output neurons. Moreover, it doesn't use the Softmax function. But here, we have 2 output neurons. So we use Cross Entropy which uses softmax function. In PyTorch we don't have to define the softmax function, PyTorch automatically defines it.

Now we're going to train our network. And while we're training it, we're going to visualize the process.

In [None]:
plt.ion() #interactive mode on to plot the points on the same plot
for epoch in range(10000):
    #forward propagating x by calling the object that we created
    out = network(x)
    
    #calculating the loss
    loss = loss_function(out, y) #predicted=out,actual=y
    
    #backpropagation
    #clearing gradient buffer
    optimizer.zero_grad()
    
    #calculate the gradients
    loss.backward()
    
    #update weights
    optimizer.step()
    
    
    #visualization
    #we want to plot how much our NN progressed after every 1000 epochs
    
    if epoch % 1000 == 0:
        # show the learning process up untill now
        # taking the maximum of our output predictions. 
        # We don't need to calculate it manually since softmax is blended by default in the Cross entropy calculation
        max_value, prediction = torch.max(out,1) #since softmax is distributed across the columns so axis=1, prediction=index of that maximum value
        predicted_y = prediction.data.numpy() #converting torch tensor to numpy array as matplotlib expect a numpy array
        target_y = y.data.numpy() #converting the actual y values to numpy array
        plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:,1], s=40, c=predicted_y, lw=0 ) #lw=line width
        
        #calculating the accuracy 
        accuracy = (predicted_y == target_y).sum() / target_y.size  #normalizing by dividing because we want the accuracy to be in terms of 0 to 1 and multiply it by 100 to get the percentage
        plt.text(3,-1, 'Accuracy = {:.2f}'.format(accuracy), fontdict = {'size':14})
        plt.pause(0.1)
        
        
plt.ioff()        
plt.show
        
        

So as we can see, the neural network was able to classify these two data very accurately though they are non-linear because they can't be separated by a line. They need to be separated by a curve.

So that is how good our neural networks to separate or classify nonlinear data very.