# Introduction to Neural Networks 



## What is a Neural Network?
A **Neural Network** is a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

Neural Networks are made of layers:
- **Input Layer**: Receives input data
- **Hidden Layers**: Process the data using weights and activation functions
- **Output Layer**: Produces the final prediction

![image.png](attachment:image.png)

### **Simple mathematical calculation** for a **feedforward neural network** with:

- * **3 inputs**
- * **2 hidden layers**

  * 1st hidden layer: 4 neurons
  * 2nd hidden layer: 3 neurons
* **1 output neuron**
* Using **ReLU** as the activation function in hidden layers and a **linear** activation at the output.

We'll calculate the output step-by-step.

---

### **1. Input Layer**

Let the inputs be:

$$
\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}
= \begin{bmatrix} 1.0 \\ 0.5 \\ -1.5 \end{bmatrix}
$$

---

### **2. Hidden Layer 1 (4 neurons)**

Let the weights for the first hidden layer be:

$$
W^{[1]} = 
\begin{bmatrix}
0.2 & -0.3 & 0.5 \\
-0.4 & 0.1 & -0.2 \\
0.3 & 0.6 & -0.1 \\
-0.5 & 0.2 & 0.4
\end{bmatrix}, \quad
\mathbf{b}^{[1]} = \begin{bmatrix} 0.1 \\ -0.1 \\ 0.2 \\ 0.0 \end{bmatrix}
$$

Calculate $z^{[1]} = W^{[1]} \cdot \mathbf{x} + \mathbf{b}^{[1]}$:

$$
z^{[1]} = 
\begin{bmatrix}
0.2 \cdot 1 + (-0.3) \cdot 0.5 + 0.5 \cdot (-1.5) + 0.1 \\
-0.4 \cdot 1 + 0.1 \cdot 0.5 + (-0.2) \cdot (-1.5) - 0.1 \\
0.3 \cdot 1 + 0.6 \cdot 0.5 + (-0.1) \cdot (-1.5) + 0.2 \\
-0.5 \cdot 1 + 0.2 \cdot 0.5 + 0.4 \cdot (-1.5) + 0
\end{bmatrix}
=
\begin{bmatrix}
0.2 - 0.15 - 0.75 + 0.1 \\
-0.4 + 0.05 + 0.3 - 0.1 \\
0.3 + 0.3 + 0.15 + 0.2 \\
-0.5 + 0.1 - 0.6
\end{bmatrix}
=
\begin{bmatrix}
-0.6 \\ -0.15 \\ 0.95 \\ -1.0
\end{bmatrix}
$$

Apply ReLU: $a^{[1]} = \text{ReLU}(z^{[1]}) = \max(0, z^{[1]})$

$$
a^{[1]} = \begin{bmatrix} 0 \\ 0 \\ 0.95 \\ 0 \end{bmatrix}
$$

---

### **3. Hidden Layer 2 (3 neurons)**

Weights and bias:

$$
W^{[2]} = 
\begin{bmatrix}
0.1 & -0.2 & 0.3 & 0.4 \\
-0.3 & 0.6 & -0.1 & 0.2 \\
0.5 & -0.4 & 0.2 & -0.1
\end{bmatrix}, \quad
\mathbf{b}^{[2]} = \begin{bmatrix} 0.0 \\ 0.1 \\ -0.2 \end{bmatrix}
$$

Calculate $z^{[2]} = W^{[2]} \cdot a^{[1]} + b^{[2]}$:

Only the third value of $a^{[1]}$ is nonzero:

$$
z^{[2]} = 
\begin{bmatrix}
0.3 \cdot 0.95 + 0 \\
-0.1 \cdot 0.95 + 0.1 \\
0.2 \cdot 0.95 - 0.2
\end{bmatrix}
=
\begin{bmatrix}
0.285 \\
-0.095 + 0.1 = 0.005 \\
0.19 - 0.2 = -0.01
\end{bmatrix}
$$

Apply ReLU:

$$
a^{[2]} = \max(0, z^{[2]}) = \begin{bmatrix} 0.285 \\ 0.005 \\ 0 \end{bmatrix}
$$

---

### **4. Output Layer (1 neuron)**

Weights and bias:

$$
W^{[3]} = \begin{bmatrix} 0.4 & -0.6 & 0.3 \end{bmatrix}, \quad b^{[3]} = 0.05
$$

$$
y = W^{[3]} \cdot a^{[2]} + b^{[3]} = 0.4 \cdot 0.285 - 0.6 \cdot 0.005 + 0.05
= 0.114 - 0.003 + 0.05 = 0.161
$$

---

### **Final Output**:

$$
\boxed{0.161}
$$






### Understanding Neural Networks in Deep Learning
Neural networks are capable of learning and identifying patterns directly from data without pre-defined rules. These networks are built from several key components:

- Neurons: The basic units that receive inputs, each neuron is governed by a threshold and an activation function.
- Connections: Links between neurons that carry information, regulated by weights and biases.
- Weights and Biases: These parameters determine the strength and influence of connections.
- Propagation Functions: Mechanisms that help process and transfer data across layers of neurons.
- Learning Rule: The method that adjusts weights and biases over time to improve accuracy.



## Activation Functions

An activation function is a mathematical operation applied to the output of each neuron in a neural network layer.Without an activation function, a neural network is just a linear function — no matter how many layers you stack:
![%7BA351B13F-3562-406C-B90D-BD8FD1DDCD8E%7D.png](attachment:%7BA351B13F-3562-406C-B90D-BD8FD1DDCD8E%7D.png)

### Example: 
𝑊
1
𝑥
+
𝑏
1
⇒

𝑊
2
(
𝑊
1
𝑥
+
𝑏
1
)
+
𝑏
2
y=W 
1
​
 x+b 
1
​
 ⇒y=W 
2
​
 (W 
1
​
 x+b 
1
​
 )+b 
2
​
 


- Activation functions allow the model to learn complex patterns and non-linear relationships.

- Helps the Model Learn Complex Mappings
Real-world problems (like image recognition, text generation, etc.) are non-linear in nature. Activation functions allow the network to capture such non-linear mappings.

- Controls the Output Range
Activation functions can:

- Limit outputs (e.g., between 0–1 or -1–1)

- Introduce probabilities (e.g., softmax in classification)

- Help with gradient flow (some functions like ReLU improve training speed and reduce vanishing gradients)

| Name        | Formula                            | Output Range | Use Case                          |
| ----------- | ---------------------------------- | ------------ | --------------------------------- |
| **ReLU**    | `f(x) = max(0, x)`                 | \[0, ∞)      | Hidden layers (fast & efficient)  |
| **Sigmoid** | `f(x) = 1 / (1 + e^-x)`            | (0, 1)       | Binary classification             |
| **Tanh**    | `f(x) = (e^x - e^-x)/(e^x + e^-x)` | (-1, 1)      | Can be better than sigmoid        |
| **Softmax** | `e^xᵢ / Σe^xⱼ`                     | (0, 1)       | Multi-class classification output |


In [6]:
a=[1,23,34]
b=[9,8,7]
a+b
type(a+b)

import numpy as np
a1=np.array(a)
b1=np.array(b)
a1+b1
a1/b1
a1*b1

array([  9, 184, 238])

In [7]:
# Import Required Libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense 

### Number Sequence Generation Task
**Objective**: Train a neural network to learn a sequence pattern. Example: `1 → 2`, `2 → 3`, ..., `9 → 10`.

In [11]:
# Prepare training data
X = np.array([i for i in range(1, 50)])
y = np.array([i + 1 for i in range(1, 50)])

# Try similar for y=2x, y=x*x go with , 50, 100 

# Reshape input for Keras (samples, features)
X = X.reshape(-1, 1)
y = y.reshape(-1, 1)

In [12]:
X
y

array([[ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12],
       [13],
       [14],
       [15],
       [16],
       [17],
       [18],
       [19],
       [20],
       [21],
       [22],
       [23],
       [24],
       [25],
       [26],
       [27],
       [28],
       [29],
       [30],
       [31],
       [32],
       [33],
       [34],
       [35],
       [36],
       [37],
       [38],
       [39],
       [40],
       [41],
       [42],
       [43],
       [44],
       [45],
       [46],
       [47],
       [48],
       [49],
       [50]])

### Build a Simple Neural Network

In [13]:
model = Sequential([
    Dense(50, activation='relu', input_shape=(1,)),  # Hidden layer
    Dense(1)  # Output layer
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Summary of the model
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Train the Model
- A epoch is a single pass through the entire training data 
- After each epoch, the model weights are updated based on training data

## Verbose: level of details in every step want to display enter 1 or 2

In [16]:
model.fit(X, y, epochs=250, verbose=1) 

Epoch 1/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 0.0133
Epoch 2/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0130
Epoch 3/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0141
Epoch 4/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0128   
Epoch 5/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0125   
Epoch 6/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0144
Epoch 7/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0139
Epoch 8/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0134  
Epoch 9/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0131  
Epoch 10/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0128  
Epoch 

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0123
Epoch 84/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0123
Epoch 85/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0123
Epoch 86/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0126
Epoch 87/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0114
Epoch 88/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0125 
Epoch 89/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0115
Epoch 90/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 0.0115 
Epoch 91/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0123
Epoch 92/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0124
Epoch 93/250
[1m2/2

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0101
Epoch 166/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0103
Epoch 167/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0101
Epoch 168/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0104 
Epoch 169/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0106
Epoch 170/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0096
Epoch 171/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.0110
Epoch 172/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0103
Epoch 173/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0103 
Epoch 174/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0099
Epoch 175/25

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 0.00948
Epoch 247/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0088
Epoch 248/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0091
Epoch 249/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0091 
Epoch 250/250
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0092


<keras.src.callbacks.history.History at 0x1ca9e275750>

### Test the Model

In [21]:
# Predict the next number in the sequence
test_input = np.array([[100]])
predicted = model.predict(test_input)
print(f"Input: 11 → Predicted Output: {predicted[0][0]:.2f}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
Input: 11 → Predicted Output: 101.39
