There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the neural network and how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network.

### The Number of Hidden Layers
Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers.

 Number of Hidden Layers | Result |

 0 - Only capable of representing linear separable functions or decisions.

 1 - Can approximate any function that contains a continuous mapping
from one finite space to another.

 2 - Can represent an arbitrary decision boundary to arbitrary accuracy
with rational activation functions and can approximate any smooth
mapping to any accuracy.

Deciding the number of hidden neuron layers is only a small part of the problem. You must also determine how many neurons will be in each of these hidden layers. This process is covered in the next section.

### The Number of Neurons in the Hidden Layers

Deciding the number of neurons in the hidden layers is a very important part of deciding your overall neural network architecture. Though these layers do not directly interact with the external environment, they have a tremendous influence on the final output. Both the number of hidden layers and the number of neurons in each of these hidden layers must be carefully considered.

Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.

Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network. Obviously, some compromise must be reached between too many and too few neurons in the hidden layers.

There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.
These three rules provide a starting point for you to consider. Ultimately, the selection of an architecture for your neural network will come down to trial and error. But what exactly is meant by trial and error? You do not want to start throwing random numbers of layers and neurons at your network. To do so would be very time consuming. Chapter 8, “Pruning a Neural Network” will explore various ways to determine an optimal structure for a neural network.

https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

https://arxiv.org/pdf/1707.09725.pdf#page=11

### About activation functions in neural network

https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

In [2]:
import math
def sigmoid(x):
  return 1 / (1 + math.exp(-x))

print(sigmoid(0.6))

0.6456563062257954


In [5]:
# creating a dataset
#And gate

import numpy as np
gate=np.array([[0,0,0],[0,1,0],[1,0,0],[1,1,1]])

In [10]:
gate[:,-1]

array([0, 0, 0, 1])

In [11]:
train_x = gate[:,0:2]
train_y = gate[:,-1]

array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]])

In [14]:
#defining the neural network art
import torch.nn as nn
import torch.nn.functional as F
#our class must extend nn.Module
class MyClassifier(nn.Module):
    def __init__(self):
        super(MyClassifier,self).__init__()
        #Our network consists of 3 layers. 1 input, 1 hidden and 1 output layer
        #This applies Linear transformation to input data. 
        self.layer1 = nn.Linear(2,3)
        self.layer2 = nn.Linear(3,2)
    
    
    #This must be implemented
    def forward(self,x):
        #of the first layer
        x = self.layer1(x)
        #Activation function is Relu. Feel free to experiment with this
        x = F.tanh(x)
        #This produces output
        x = self.layer2(x)    
        return x
    
    #This function takes an input and predicts the class, (0 or 1)        
    def predict(self,x):
        #Apply softmax to output. 
        prediction = F.sigmoid(self.forward(x)) #F.softmax(self.forward(x))
        ans = []
        #Pick the class with maximum weight
        for t in prediction:
            if t[0]>t[1]:
                ans.append(0)
            else:
                ans.append(1)
            return torch.tensor(ans)

In [15]:
import torch
#Initialize the model        
model = MyClassifier()
#Define loss criterion
criterion = nn.CrossEntropyLoss()
#Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

In [None]:
#convert the array into pytorch object
train_x = torch.from_numpy(train_x).type(torch.FloatTensor)
train_y = torch.from_numpy(train_y).type(torch.LongTensor)

In [7]:
#min max normilization
def normilation(lis):
    return [(i-min(lis))/(max(lis)-min(lis)) for i in lis]
normilation([1,2,34,67,290,23,56,78,60,100,7,8,9,10])

[0.0,
 0.0034602076124567475,
 0.11418685121107267,
 0.22837370242214533,
 1.0,
 0.07612456747404844,
 0.1903114186851211,
 0.2664359861591695,
 0.2041522491349481,
 0.34256055363321797,
 0.020761245674740483,
 0.02422145328719723,
 0.02768166089965398,
 0.031141868512110725]

1

[1, 2, 3]