- ANN (Artificial Neural Networks)
    - object recognition in images
    - automatic speech recognition (ASR)
    - machine translation
    - image captioning
    - video classification 

Topics
- Structure of ANNs, inspired by human brain
- Perceptron: A simple idea as the basis for larger neural networks
- Workings of artificial neurons
- Structure and topology of neural networks
- Hyperparameters and simplifying assumptions

ANN
- In simple words, the biological neuron works as follows: it receives signals through its dendrites, which are either amplified or inhibited, as they pass through the axons to the dendrites of other neurons.
- the inhibiters **hold** information, the dendrites are input data, input + held-data => output

- ANN are a collection of many small devices (artificial neurons). These networks of devices are trained for particular tasks. These neurons are trained to *fire* in a certain way when given a particular input. The network learns to inhibit or amplify the input signals to perform a certain task.

In [1]:
import numpy as np
w = np.array([3,1,5,7,4])
x = np.array([1,1,0,1,0])

In [5]:
w.reshape(len(w), 1)

array([[3],
       [1],
       [5],
       [7],
       [4]])

In [9]:
ci = (w.transpose() @ x) -2
ci

np.int64(9)

Neuron

- as compared to a perceptron where the function is a step function and the input is a weighted sum, in a neuron the function is not as simple as step function, its a non linear function
- these functions are activation function

Specifications of Neural Network
- Structure/Topology
- Graph
    - nodes => neurons
    - edges => interconnections
- Specify the input layer
- Specify the output layer
- Specify the weights
- Specify the activation function
  Specify the bias

- the number of neurons in the input layer is equal to number of attributes
- number of neurons in the output layer is equal to number of classes in case of classification and 1 in case of regression

- there are a lot of media type which flow across the internet, text, audio, video, images.
- for a neural network to work with these media types, they need to be converted to numbers
- text can be converted to numbers with the help of word embeddings. the important words in the text become the columns(attributes, or variables). and the closeness of related words is how the numerical value is decided for them
- images are pixels, hence numbers
- audio are converted to numbers using algorithms like fourier coefficients, etc.
- categorical variables are converted to numbers by using one-hot-encoding or dense-embeddings

- one of the most popular output functions is the softmax function for classification. softmax is the activation function for multi classification
- ![image.png](attachment:6273be19-6bb4-4dc0-b370-cc8289bec508.png)
- for binary classification, the activation function is sigmoid function

In [10]:
x = np.array([2,1,1])
x = x.reshape(len(x), 1)
w0 = np.array([1,1,-1])
w0 = w0.reshape(len(w0), 1)
w1 = np.array([2, 0, -1])
w1 = w1.reshape(len(w1), 1)
w2 = np.array([1,2,2])
w2 = w2.reshape(len(w2), 1)

In [21]:
p0 = (np.pow(np.e, w0.transpose() @ x))/(np.pow(np.e, w0.transpose() @ x) + np.pow(np.e, w1.transpose() @ x) + np.pow(np.e, w2.transpose() @ x))
p0
      

array([[0.01714783]])

- In case of ANNs the activation functions are non linear
- activation functions should be smooth, not like step functions
- types of activation functions
    - sigmoid/logistic $f(y) = \frac{1}{1 + e ^ {-y}}$ 
    - tanh $f(y) = \frac{e ^ {y} - e ^ {-y}}{e ^ {y} + e ^ {-y}}$
    - RELU (Rectilinear Unit) $f(y) = max(0, y)$

In [22]:
w = np.array([2, -6, 3])
w = w.reshape(len(w), 1)
x = np.array([3, 2, 1])
x = x.reshape(len(x), 1)
w.transpose() @ x -1

array([[-4]])

In [30]:
round(1/(1 + np.pow(np.e , -0.63)), 2)

np.float64(0.65)

Training of a neural network
- Fix the hyperparameters and tune the weights and biases
- Hyperparameters
    - number of layers
    - number of neurons in the input, hidden and the output layers
    - learning rate (the step size taken each time we update the weights and biases of the ANN)
    - number of epoch (the number of time the training data set is passes through the neural network)
    - activation function
- **W** => weight matrix
- **b** => bias
- **x** => input
- **y** => ground truth label
- **p** => probability vector of the output for the classification problem
- **$h ^ L$** => predicted output of the regression problem where L is the number of layers
- **h** => output of the hidden layers with appropriate superscript. The output of the second neuron in the nth hidden layer is denoted by $h ^ n_2$
- **z** => represents accumulated input to a layer, 3rd neuron - nth layer is represented by $z ^ n_3$
- **b** => bias of the first neuron of the third layer $b ^ 3_1$
- **$W^2$** => weight matrix from the first hidden layer to the second hidden layer
- **$W^2_{31}$** => second hidden layer, node to 3rd neuron from the 1st neuron of previous hidden layer


Assumptions of ANN

- The neurons in an ANN are arranged in layers, and these layers are arranged sequentially.
- The neurons within the same layer do not interact with each other.
- The inputs are fed into the network through the input layer, and the outputs are sent out from the output layer.
Neurons in consecutive layers are densely connected, i.e., all neurons in layer l are connected to all neurons in layer l+1.
- Every neuron in the neural network has a bias value associated with it, and each interconnection has a weight associated with it.
- All neurons in a particular hidden layer use the same activation function. Different hidden layers can use different activation functions, but in a hidden layer, all neurons use the same activation function.

Feed Forward ANN
![image.png](attachment:97e72835-1464-4c1b-be01-e934a7c1dfae.png)

In [32]:
w = np.array([[3,4],
              [1,9],
              [6,2]])
h2 = np.array([[1],[2]])

In [39]:
p = np.pow(np.e, w@h2)

In [43]:
p_normalized = p/p.sum()
p_normalized

array([[3.35308764e-04],
       [9.99541338e-01],
       [1.23353201e-04]])

In [50]:
np.log10(0.2)

np.float64(-0.6989700043360187)

$L(w_1, w_2) = w_1^2 + w_2^2$

$\frac{d(L)}{d(w_1)} = 2w_1$

$\frac{d(L)}{d(w_2)} = 2w_2$

In [53]:
np.tan()

np.float64(-1.995200412208242)

$\frac{\partial L}{\partial w_2} = \frac{\partial L}{\partial h_2}.\frac{\partial h_2}{\partial z_2}.\frac{\partial z_2}{\partial w_2}$

$Loss = \frac{1}{2}(y - h_2)^2$

$\frac{\partial L}{\partial h_2} = \frac{\partial}{\partial h_2}(\frac{1}{2}(y^2 + h_2^2 -2yh_2))$

$\frac{\partial L}{\partial h_2} = \frac{1}{2}\frac{\partial}{\partial h_2}(y^2 + h_2^2 -2yh_2)$

$\frac{\partial L}{\partial h_2} = \frac{1}{2}(0 + 2h_2 - 2y)$

$\frac{\partial L}{\partial h_2} = -(y - h_2)$

$\frac{\partial h_2}{\partial z_2} = h_2(1 - h_2)$

$z_2 = w_2h_1 + b_2$

$\frac{\partial z_2}{\partial w_2} = h_1$

$\frac{\partial L}{\partial w_2} = -(y - h_2). h_2(1 - h_2) . h_1$

In [3]:
%pip install tensorflow

Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting tensorflow
  Downloading tensorflow-2.18.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (231.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m231.8/231.8 MB[0m [31m865.1 kB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:06[0m
[?25hCollecting absl-py>=1.0.0
  Downloading https://www.piwheels.org/simple/absl-py/absl_py-2.1.0-py3-none-any.whl (133 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.7/133.7 kB[0m [31m279.9 kB/s[0m eta [36m0:00:00[0m kB/s[0m eta [36m0:00:01[0m:01[0m
[?25hCollecting astunparse>=1.6.0
  Downloading https://www.piwheels.org/simple/astunparse/astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers>=24.3.25
  Downloading https://www.piwheels.org/simple/flatbuffers/flatbuffers-20181003210633-py2.py3-none-any.whl (14 kB)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.

In [4]:
import tensorflow as tf
import numpy as np

In [7]:
9/4

2.25

In [5]:
"""A vector is a 1-dimensional tensor"""

# Vector in TensorFlow
vector_tf = tf.constant([0, 1, 2, 3, 4])

# Vector in Numpy
vector_np = np.array([0, 1, 2, 3, 4])

print("Shapes:", vector_tf.shape, vector_np.shape)

""" A matrix is a 2-dimensional tensor"""

# Matrix in TensorFlow
matrix_tf = tf.constant([[0, 1], [2, 3], [4, 5]])

# Matrix in Numpy
matrix_np = np.array([[0, 1], [2, 3], [4, 5]])

print("Shapes:", matrix_tf.shape, matrix_np.shape)

"""3-dimensional tensor"""

# 3-d tensor in TensorFlow
t3_tf = tf.constant([[[0., 1.], [2, 3], [4, 5]]]*5)

# 3-d matrix in Numpy
t3_np = np.array([[[0, 1], [2, 3], [4, 5]]]*5)

print("Shapes:", t3_tf.shape, t3_np.shape)

Shapes: (5,) (5,)
Shapes: (3, 2) (3, 2)
Shapes: (5, 3, 2) (5, 3, 2)


In [6]:
print(t3_tf)

tf.Tensor(
[[[0. 1.]
  [2. 3.]
  [4. 5.]]

 [[0. 1.]
  [2. 3.]
  [4. 5.]]

 [[0. 1.]
  [2. 3.]
  [4. 5.]]

 [[0. 1.]
  [2. 3.]
  [4. 5.]]

 [[0. 1.]
  [2. 3.]
  [4. 5.]]], shape=(5, 3, 2), dtype=float32)


In [16]:
x = tf.constant([[0.2],[1],[0.5]])
w = tf.Variable([[0.5, 1, -1.2]])
b = 0.01
print(f'x shape: {x.shape}')
print(f'w shape: {w.shape}')

x shape: (3, 1)
w shape: (1, 3)


In [18]:
z = tf.matmul(w, x) + b

In [21]:
tf.sigmoid(z)

<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.62480646]], dtype=float32)>