# Vectoried and Non Vectorized Neural Nets

#### This notebook demonstrates the substantial performance difference between vectorized and non-vectorized implementations for a simple neural network's forward pass. By comparing the execution times, we highlight the efficiency gains achieved by leveraging NumPy's optimized matrix operations. This is a fundamental concept in deep learning for processing large datasets efficiently.

In [24]:
import numpy as np
import time

## Sigmoid Activation

#### We define the sigmoid activation function, which is a common choice in binary classification problems. It squashes the input values to a range between 0 and 1.

In [25]:

# Sigmoid activation
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

## Neural Networks and Data Parameters

#### Next, we define the architecture of our neural network and the dimensions of our dataset:

* m: The number of training examples (1000).

* n_x: The number of input features (10).

* n_h: The number of units in the hidden layer (5).

* n_y: The number of output units (1).


In [26]:
# Parameters
np.random.seed(1)
m = 1000  # Number of examples
n_x = 10  # Input features
n_h = 5   # Hidden units
n_y = 1   # Output unit

## Data and Weights Initialization

* Input Data (X): We create a random dataset X with dimensions (n_x, m), where each column represents a training example.
* Weights and Biases (W1, b1, W2, b2): We initialize the weight matrices and bias vectors for the single hidden layer and the output layer. The weights are initialized with small random values to break symmetry, and biases are initialized to zero.


In [27]:
# Data
X = np.random.randn(n_x, m)
# Parameters
W1 = np.random.randn(n_h, n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h) * 0.01
b2 = np.zeros((n_y, 1))

## Non-Vectorized Implementation

#### In this approach, we use a for loop to iterate through each of the m training examples one by one. For each example x_i, we compute the hidden layer activation a1_i and the final output a2_i. The results for each example are stored in the A2_nonvec matrix.

In [28]:
start_nonvec = time.time()

A2_nonvec = np.zeros((n_y, m))  # Here the final predictions are stored

for i in range(m):
    x_i = X[:, i].reshape(n_x, 1)              # (n_x, 1)
    
    z1_i = np.dot(W1, x_i) + b1                # (n_h, 1)
    a1_i = sigmoid(z1_i)                       # (n_h, 1)
    
    z2_i = np.dot(W2, a1_i) + b2               # (n_y, 1)
    a2_i = sigmoid(z2_i)                       # (n_y, 1)
    
    A2_nonvec[:, i] = a2_i.flatten()
end_nonvec = time.time()

## Vectorized Implementation

#### Here, we eliminate the explicit for loop. Instead of processing one example at a time, we feed the entire dataset X (a (n_x, m) matrix) into the network at once. NumPy's optimized functions handle the matrix multiplications and additions efficiently across all m examples in parallel.

In [29]:
start_vec = time.time()

Z1 = np.dot(W1, X) + b1                        # (n_h, m)
A1 = sigmoid(Z1)                               # (n_h, m)

Z2 = np.dot(W2, A1) + b2                       # (n_y, m)
A2 = sigmoid(Z2)                               # (n_y, m)

end_vec = time.time()

### Performance Time
#### We measure and print the time taken by both implementations. The output clearly shows that the vectorized implementation is significantly faster than the non-vectorized one.
* Non-Vectorized Time: 0.019034 seconds
* Vectorized Time: 0.001000 seconds

In [30]:
print(f"Non-Vectorized Time: {end_nonvec - start_nonvec:.6f} seconds")
print(f"Vectorized Time    : {end_vec - start_vec:.6f} seconds")

Non-Vectorized Time: 0.013117 seconds
Vectorized Time    : 0.000000 seconds


### Correctness Check
#### To ensure that both methods produce the same result, we calculate the Frobenius norm of the difference between the output matrices A2 (from the vectorized approach) and A2_nonvec (from the non-vectorized approach).
* Difference = 0.0000000000
#### The difference is zero, confirming that both implementations are mathematically equivalent and yield the exact same predictions.

In [31]:
diff = np.linalg.norm(A2 - A2_nonvec)
print(f"Difference between vectorized and non-vectorized outputs: {diff:.10f}")

Difference between vectorized and non-vectorized outputs: 0.0000000000


## Prediction Check
#### Finally, we print the first 10 predictions from both methods side-by-side. As expected from the correctness check, the predicted values for each example are identical.

In [32]:
# Round outputs to 4 decimal places for neat display
A2_rounded = np.round(A2, 4)
A2_nonvec_rounded = np.round(A2_nonvec, 4)

# Print first few predictions from each method
print("\nFirst 10 predictions (Vectorized vs Non-Vectorized):")
for i in range(10):  # You can increase this if you want
    v = A2_rounded[0, i]
    nv = A2_nonvec_rounded[0, i]
    print(f"Example {i+1:2d}: Vectorized = {v:.4f}, Non-Vectorized = {nv:.4f}")



First 10 predictions (Vectorized vs Non-Vectorized):
Example  1: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  2: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  3: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  4: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  5: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  6: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  7: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example  8: Vectorized = 0.4964, Non-Vectorized = 0.4964
Example  9: Vectorized = 0.4965, Non-Vectorized = 0.4965
Example 10: Vectorized = 0.4964, Non-Vectorized = 0.4964
