## Neural Networks

In the earlier chapters, we discussed the motivation behind deep learning and the growing need
for models that can automatically learn useful representations from data. Traditional machine
learning methods often rely on carefully engineered features, a process that is both time‑
consuming and highly domain‑dependent. Neural networks address this limitation by learning
representations directly from raw data in a data‑driven manner.

In this chapter, we study **neural networks**, the fundamental computational structures that
enable deep learning. Our goal here is not to immediately dive into algorithms or code, but to
develop a clear conceptual understanding of how neural networks are constructed and why they
are capable of modeling complex phenomena.


### The Artificial Neuron

The basic unit of a neural network is the **artificial neuron**. Conceptually, an artificial
neuron is designed to mimic the information‑processing behavior of a biological neuron. It
receives multiple input signals, processes them, and produces a single output signal.

Each input is associated with a **weight**, which represents the strength or importance of that
input. The neuron computes a weighted sum of its inputs and adds a **bias** term. The bias allows
the model to shift the output independently of the input values, thereby improving flexibility.

Mathematically, the operation performed by a neuron can be written as:

\[
z=w1​x1​+w2​x2​+⋯+wn​xn​+b\]

The value \( z \) is often referred to as the *pre‑activation* or *net input* of the neuron.


### Interpretation of the Neuron Computation

From a geometric perspective, the weighted sum computed by a neuron defines a linear decision
boundary in the input space. The weights determine the orientation of this boundary, while the
bias controls its position.

If the neuron outputs the value \( z \) directly, its behavior is equivalent to that of a
linear regression or linear classification model. While such models are useful in simple
settings, they are fundamentally limited in their ability to represent complex patterns found
in real‑world data.


### Limitations of Linear Models

Many practical problems involve relationships that cannot be captured using linear functions.
For example, tasks such as image recognition, speech processing, and natural language
understanding exhibit intricate non‑linear dependencies among input variables.

One might expect that stacking multiple linear neurons into several layers would overcome this
issue. However, the composition of linear functions is itself linear. As a result, a network
composed solely of linear transformations—regardless of its depth—remains a linear model.

This observation highlights a critical limitation and motivates the introduction of
non‑linearity into neural networks.


### Activation Functions and Non‑Linearity

To enable neural networks to model non‑linear relationships, the output of each neuron is passed
through a **non‑linear activation function**. This function transforms the pre‑activation value
\( z \) into an output that is then forwarded to the next layer.

Activation functions play a central role in determining the expressive power of a neural
network. By introducing non‑linearity at each layer, they allow deep networks to approximate
highly complex functions that would otherwise be impossible to represent.

At this stage, it is sufficient to understand activation functions as mechanisms that regulate
information flow and introduce non‑linearity. Their specific mathematical forms and practical
trade‑offs will be examined in detail in a subsequent chapter.


### From Single Neurons to Neural Networks

A **neural network** is formed by organizing neurons into layers. The first layer receives the
input data, intermediate layers—known as **hidden layers**—perform successive transformations,
and the final layer produces the network’s output.

Information flows forward through the network, layer by layer, in what is known as the
*forward pass*. During training, the network adjusts its weights and biases so that its outputs
closely match the desired targets. This learning process is driven by optimization techniques
that will be discussed later.

With this conceptual foundation, we are now prepared to explore tensor representations,
activation functions in depth, and the mathematical operations that underpin neural network
training.



## Biological Inspiration of Neural Networks

Neural networks are inspired by the structure and functioning of the human brain.
The brain consists of billions of interconnected neurons that communicate using electrical signals.



### Key Components of a Biological Neuron

- **Dendrites**: receive signals from other neurons  
- **Cell body (soma)**: processes incoming signals  
- **Axon**: transmits signals to other neurons  

Learning occurs by strengthening or weakening synaptic connections.



## Artificial Neuron Model

An artificial neuron is a mathematical model designed to mimic
the behavior of a biological neuron.


## What is a neuron?

An artificial neuron (also referred to as a perceptron) is a mathematical function.  
It takes one or more inputs that are multiplied by values called **weights** and added together.  
This value is then passed to a non-linear function, known as an **activation function**,  
to become the neuron’s output.



### Core Elements of an Artificial Neuron

- Inputs represent features  
- Weights represent importance of features  
- Bias shifts the activation threshold  
- Output represents the neuron response  



## Structure of an Artificial Neuron

An artificial neuron computes its output in two stages:



- **Linear combination** of inputs  
- **Non-linear activation** of the result  


It begins with inputs. These inputs are nothing more than numbers, but each number carries meaning. In an image-processing task, they might represent pixel intensities; in a weather model, they could be temperature or humidity values. The neuron doesn’t understand what these numbers mean in the real world—it only knows their magnitudes.


The computation can be written as:

z = w₁x₁ + w₂x₂ + … + wₙxₙ + b  
a = f(z)


In [None]:

import numpy as np

x = np.array([1.0, 2.0, 3.0])
w = np.array([0.3, 0.5, -0.2])
b = 0.4

z = np.dot(w, x) + b
z



## Weights and Bias: Role and Significance

Weights are the neuron’s way of listening carefully. They decide which inputs deserve attention and which should be ignored, shaping what the neuron learns over time. As training continues, these weights change, slowly storing experience and knowledge. Bias, on the other hand, gives the neuron freedom. It allows the neuron to respond even when inputs are weak or to stay silent until the right moment. Together, weights and bias guide how a neuron thinks, reacts, and learns from data.



### Weights

- Control contribution of each input  
- Learned during training  
- Determine feature importance  



### Bias

- Allows activation even when inputs are zero  
- Shifts decision boundary  


In [None]:

x = np.array([2.0, 3.0])
w = np.array([0.5, 0.5])

np.dot(w, x), np.dot(w, x) + 1.0



Non-linearity allows neural networks to learn complex patterns.


In [None]:

z = np.array([-2, -1, 0, 1, 2])
relu = np.maximum(0, z)
relu



## Single Neuron Model (Perceptron)

The perceptron is the simplest form of an artificial neuron, designed to make a basic decision. It begins by receiving inputs, each carrying a piece of information. These inputs are weighted according to their importance and then combined into a single value. A bias is added to give the neuron flexibility in deciding when to respond. This combined signal is passed through an activation function, usually a step function, which decides whether the neuron should activate or remain silent. In this way, the perceptron acts like a yes-or-no decision maker, forming the foundation upon which more complex neural networks are built



### Perceptron Decision Rule

- Output = 1 if (w·x + b) ≥ 0  
- Output = 0 otherwise  



### Limitation of Perceptron

The perceptron can only solve linearly separable problems.



## Linear Transformation in Neural Networks (Wx + b)

Neural networks use matrix operations to compute outputs efficiently.



The linear transformation is expressed as:

z = Wx + b


In [None]:

X = np.array([[1, 2], [3, 4]])
W = np.array([[0.2, 0.4], [0.6, 0.8]])
b = np.array([0.1, 0.2])

np.dot(X, W) + b



## From Single Neuron to Fully Connected Layer

A single neuron can make only a simple decision, based on a limited view of the input. But when many such neurons are brought together, something more powerful emerges. In a fully connected layer, every neuron receives inputs from all neurons in the previous layer. Each connection has its own weight, allowing every neuron to learn a unique perspective of the same input data.



- Each neuron has its own weights and bias  
- All neurons receive the same input  
- Enables learning multiple features  



## Layers in a Neural Network



### Input Layer

- Receives raw input features  
- No trainable parameters  



### Hidden Layer

- Performs intermediate computations  
- Learns feature representations  



### Output Layer

- Produces final prediction  
- Depends on the task type

![Artificial Neuron Diagram](https://miro.medium.com/v2/resize:fit:750/format:webp/1*ToPT8jnb5mtnikmiB42hpQ.png)




## Forward Pass: Flow of Information Through the Network

The forward pass is the moment when a neural network puts its knowledge to work. Information enters the network through the input layer, where each value represents a feature of the data. This information then flows forward, layer by layer, without looking back. At each neuron, inputs are weighted, combined, adjusted by a bias, and passed through an activation function. The resulting outputs become inputs for the next layer, gradually transforming raw data into meaningful patterns. By the time the signal reaches the output layer, the network produces its final prediction—this entire smooth journey of information is known as the forward pass.



## Task for the reader

1. Compute neuron output for different weights and bias  
2. Compare ReLU and Sigmoid activation functions  
3. Explain why non-linearity is necessary  
4. Identify limitations of a single neuron  
