### Understanding Neural Networks as Linear Transformations

Neural networks, at their core, can be viewed as a series of **linear transformations** followed by **non-linear activations**. Understanding this fundamental principle allows us to use linear algebra and PyTorch to develop and train efficient models.

Linear transformations in neural networks are often represented by **matrix multiplications**.

Suppose we have an input vector \( x \) and a weight matrix \( W \). The product of \( W \) and \( x \) results in a new vector \( y \), which can be expressed as:

$$
y = W \times x
$$

This transformation changes the space and dimensions of the input data, which is a significant aspect of how neural networks operate.

### Matrix Multiplication in Neural Networks

In the context of a simple neural network layer, the operations performed can be broken down into:
- \( W \) is the weight matrix associated with the layer.
- \( b \) is the bias vector.
- \( x \) is the input vector.

The output \( y \) is computed by the expression:

$$
y = W \times x + b
$$

This operation combines a **linear transformation** with the addition of the **bias term**.

### Non-linear Activation Function

The application of a non-linear activation function, such as **ReLU** (Rectified Linear Unit), transforms the output further to introduce non-linearity:

$$
a = \text{ReLU}(y)
$$

The **ReLU** function is defined as:

$$
\text{ReLU}(z) = \max(0, z)
$$

This non-linearity is crucial for neural networks because it allows them to learn complex patterns that go beyond simple linear mappings.
