## Artificial Neural Networks (ANN)

### 1. Introduction to Artificial Neural Networks

- An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of the human brain.
- It consists of interconnected units called neurons, which work together to process information and learn patterns from data.**

**ANNs are widely used in:**

- Classification

- Regression

- Pattern recognition

- Prediction tasks

## 2. Biological Inspiration

| Biological Neuron | Artificial Neuron |
| ----------------- | ----------------- |
| Dendrites         | Input values      |
| Synapse           | Weights           |
| Cell body         | Summation         |
| Axon              | Output            |

**ANNs simulate this behavior mathematically to learn from data.**

## 3. Structure of an Artificial Neural Network

**An ANN consists of three main layers:**

**3.1 Input Layer**

- Receives raw input data

- Each neuron represents one feature

**3.2 Hidden Layer(s)**

- Performs intermediate computations

- Extracts patterns and relationships

- More hidden layers mean deeper learning

**3.3 Output Layer**

- Produces the final prediction

- Output depends on the problem type

## 4. Artificial Neuron (Perceptron)

**Components of a Neuron**

- Inputs: x₁, x₂, ..., xₙ

- Weights: w₁, w₂, ..., wₙ

- Bias: b

- Activation function

**Mathematical Representation**
**z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
output = f(z)**


- Where f(z) is the activation function.

## 5. Activation Functions

Activation functions introduce non-linearity into the network.

Common Activation Functions

| Function | Formula                 | Usage                 |
| -------- | ----------------------- | --------------------- |
| Sigmoid  | 1 / (1 + e⁻ˣ)           | Binary classification |
| ReLU     | max(0, x)               | Hidden layers         |
| Tanh     | (eˣ − e⁻ˣ) / (eˣ + e⁻ˣ) | Centered data         |
| Softmax  | eˣᵢ / Σeˣ               | Multi-class output    |


## 6. Types of Artificial Neural Networks
**6.1 Single Layer Perceptron**

- No hidden layers

- Solves only linearly separable problems

**6.2 Multilayer Perceptron (MLP)**

- One or more hidden layers

- Can solve complex, non-linear problems

**6.3 Feedforward Neural Network**

- Data flows in one direction

- No feedback loops

## 7. Learning Process in ANN

**ANNs learn by adjusting weights to minimize error.**

**Steps:**

- Forward propagation

- Loss calculation

- Backpropagation

- Weight update

## 8. Loss (Cost) Function

**Loss function measures how far predictions are from actual values.**

**Common Loss Functions**

- Mean Squared Error (MSE)

- Binary Cross Entropy

- Categorical Cross Entropy

**Example:**

**MSE = (1/n) Σ (y − ŷ)²**

## 9. Gradient Descent

**Gradient Descent is an optimization algorithm used to minimize the loss function.**

### **Weight Update Rule**
**w = w − α × ∂Loss/∂w**


**Where:**

- α is the learning rate

**Types:**

- Batch Gradient Descent

- Stochastic Gradient Descent

- Mini-batch Gradient Descent

## 10. Backpropagation Algorithm

**Backpropagation updates weights by propagating error backward through the network.**

**Steps:**

- Compute output

- Calculate error

- Compute gradients

- Update weights

It enables efficient training of deep networks.

## 11. Overfitting and Underfitting

| Problem      | Description                   |
| ------------ | ----------------------------- |
| Overfitting  | Model memorizes training data |
| Underfitting | Model fails to learn patterns |

**Solutions**

- Regularization

- Dropout

- More training data

- Early stopping

## 12. Advantages of ANN

- Learns complex patterns

- Handles non-linear data

- Adaptive and flexible

- High accuracy with sufficient data

## 13. Limitations of ANN

- Requires large datasets

- Computationally expensive

- Difficult to interpret

- Hyperparameter tuning is complex

## 14. Applications of Artificial Neural Networks

- Image recognition

- Speech recognition

- Medical diagnosis

- Stock market prediction

- Fraud detection

- Recommendation systems

## 15. Summary

- ANN mimics human brain behavior

- Consists of neurons, weights, and activation functions

- Learns through backpropagation

- Foundation for deep learning models

# Simple ANN Demo in Python
## Binary Classification Using a Neural Network
### Problem Statement

**We want to train a neural network to classify numbers as:**

- 0 → Small number

- 1 → Large number

Based on two input features.

## Step 1. Import Required Libraries

In [2]:
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Explanation:

- numpy for numerical operations

- MLPClassifier is a simple ANN

- train_test_split for splitting data

- accuracy_score to evaluate the model

## Step 2. Create a Simple Dataset

In [3]:
# Input features
X = np.array([
    [1, 2],
    [2, 3],
    [3, 4],
    [6, 7],
    [7, 8],
    [8, 9]
])

# Output labels
# 0 = Small number
# 1 = Large number
y = np.array([0, 0, 0, 1, 1, 1])


### Explanation:

- Each row has two input values

- Labels are binary

- This mimics real-world feature based classification

## Step 3. Split the Dataset

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

### Explanation:

- 70 percent for training

- 30 percent for testing

- random_state ensures reproducibility

## Step 4. Create the Neural Network Model

In [6]:
model = MLPClassifier(
    hidden_layer_sizes=(4,),
    activation='relu',
    learning_rate_init=0.01,
    max_iter=1000,
    random_state=42
)

### Explanation:

- One hidden layer with 4 neurons

- ReLU activation function

- Learning rate controls weight updates

- max_iter is number of training steps

This represents a basic Multilayer Perceptron

## Step 5. Train the Neural Network

In [8]:
model.fit(X_train, y_train)



0,1,2
,"hidden_layer_sizes  hidden_layer_sizes: array-like of shape(n_layers - 2,), default=(100,) The ith element represents the number of neurons in the ith hidden layer.","(4,)"
,"activation  activation: {'identity', 'logistic', 'tanh', 'relu'}, default='relu' Activation function for the hidden layer. - 'identity', no-op activation, useful to implement linear bottleneck,  returns f(x) = x - 'logistic', the logistic sigmoid function,  returns f(x) = 1 / (1 + exp(-x)). - 'tanh', the hyperbolic tan function,  returns f(x) = tanh(x). - 'relu', the rectified linear unit function,  returns f(x) = max(0, x)",'relu'
,"solver  solver: {'lbfgs', 'sgd', 'adam'}, default='adam' The solver for weight optimization. - 'lbfgs' is an optimizer in the family of quasi-Newton methods. - 'sgd' refers to stochastic gradient descent. - 'adam' refers to a stochastic gradient-based optimizer proposed  by Kingma, Diederik, and Jimmy Ba For a comparison between Adam optimizer and SGD, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py`. Note: The default solver 'adam' works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, 'lbfgs' can converge faster and perform better.",'adam'
,"alpha  alpha: float, default=0.0001 Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss. For an example usage and visualization of varying regularization, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py`.",0.0001
,"batch_size  batch_size: int, default='auto' Size of minibatches for stochastic optimizers. If the solver is 'lbfgs', the classifier will not use minibatch. When set to ""auto"", `batch_size=min(200, n_samples)`.",'auto'
,"learning_rate  learning_rate: {'constant', 'invscaling', 'adaptive'}, default='constant' Learning rate schedule for weight updates. - 'constant' is a constant learning rate given by  'learning_rate_init'. - 'invscaling' gradually decreases the learning rate at each  time step 't' using an inverse scaling exponent of 'power_t'.  effective_learning_rate = learning_rate_init / pow(t, power_t) - 'adaptive' keeps the learning rate constant to  'learning_rate_init' as long as training loss keeps decreasing.  Each time two consecutive epochs fail to decrease training loss by at  least tol, or fail to increase validation score by at least tol if  'early_stopping' is on, the current learning rate is divided by 5. Only used when ``solver='sgd'``.",'constant'
,"learning_rate_init  learning_rate_init: float, default=0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver='sgd' or 'adam'.",0.01
,"power_t  power_t: float, default=0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to 'invscaling'. Only used when solver='sgd'.",0.5
,"max_iter  max_iter: int, default=200 Maximum number of iterations. The solver iterates until convergence (determined by 'tol') or this number of iterations. For stochastic solvers ('sgd', 'adam'), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.",1000
,"shuffle  shuffle: bool, default=True Whether to shuffle samples in each iteration. Only used when solver='sgd' or 'adam'.",True


### Explanation:

- Forward propagation

- Loss calculation

- Backpropagation

- Weight updates

All handled internally.

## Step 6. Make Predictions

In [9]:
y_pred = model.predict(X_test)
print("Predicted values:", y_pred)

Predicted values: [0 0]


## Step 7. Evaluate Accuracy

In [11]:
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

Model Accuracy: 1.0


### Explanation:

- Compares predicted labels with actual labels

- Higher accuracy means better learning

## Step 8. Test with New Data

In [12]:
new_data = np.array([[4, 5]])
prediction = model.predict(new_data)

print("Prediction for [4, 5]:", prediction)

Prediction for [4, 5]: [0]


In [13]:
new_data = np.array([[7, 9]])
prediction = model.predict(new_data)

print("Prediction for [7, 9]:", prediction)

Prediction for [7, 9]: [1]


### Output meaning:

- 0 → Small number

- 1 → Large number

## How This Relates to ANN Concepts

| ANN Concept         | Where Used            |
| ------------------- | --------------------- |
| Neurons             | Hidden layer nodes    |
| Weights             | Learned internally    |
| Activation Function | ReLU                  |
| Loss Function       | Internal to model     |
| Backpropagation     | Automatic             |
| Output Layer        | Binary classification |


## Key Takeaway

- You created a working Artificial Neural Network

- You used real ANN concepts

- This is the foundation for deep learning models

- Same logic applies to CNN and RNN with more layers

# The End !!