# Audio Classification with One Hidden Layer

Welcome to your neural network programming assignment! It's time to build your first neural network with a hidden layer. You'll see a big difference between this model and logistic regression.

By the end of this assignment, you'll be able to:

- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation
- Apply this to audio classification (male vs female voices)

## Table of Contents
- [1 - Packages](#1)
- [2 - Load the Dataset](#2)
    - [Exercise 1](#ex-1)
- [3 - Neural Network Model](#3)
    - [3.1 - Defining the Neural Network Structure](#3-1)
        - [Exercise 2 - layer_sizes](#ex-2)
    - [3.2 - Initialize the Model's Parameters](#3-2)
        - [Exercise 3 - initialize_parameters](#ex-3)
    - [3.3 - The Loop](#3-3)
        - [Exercise 4 - forward_propagation](#ex-4)
    - [3.4 - Compute the Cost](#3-4)
        - [Exercise 5 - compute_cost](#ex-5)
    - [3.5 - Implement Backpropagation](#3-5)
        - [Exercise 6 - backward_propagation](#ex-6)
    - [3.6 - Update Parameters](#3-6)
        - [Exercise 7 - update_parameters](#ex-7)
    - [3.7 - Integration](#3-7)
        - [Exercise 8 - nn_model](#ex-8)
- [4 - Test the Model](#4)
    - [4.1 - Predict](#4-1)
        - [Exercise 9 - predict](#ex-9)
    - [4.2 - Test the Model on the Audio Dataset](#4-2)
- [5 - Tuning Hidden Layer Size](#5)

<a name='1'></a>
## 1 - Packages

First import all the packages that you will need during this assignment.

- **numpy** is the fundamental package for scientific computing with Python.
- **matplotlib** is a library for plotting graphs in Python.
- **librosa** is a library for audio analysis.
- **IPython.display.Audio** allows playing audio in the notebook.
- **testCases_v2** provides test examples to assess the correctness of your functions.
- **public_tests** provides test functions for grading.
- **audio_utils** provides audio loading and preprocessing functions.

In [None]:
# Package imports
import numpy as np
import copy
import matplotlib.pyplot as plt
import librosa
import librosa.display
from IPython.display import Audio
from testCases_v2 import *
from public_tests import *
from audio_utils import load_dataset, sigmoid

%matplotlib inline
%load_ext autoreload
%autoreload 2

<a name='2'></a>
## 2 - Load the Dataset

Let's load the audio dataset for male vs female voice classification. The dataset contains mel-spectrograms extracted from 3-second audio clips.

In [None]:
# Load the dataset
train_x, train_y, test_x, test_y, classes = load_dataset()

print(f"Classes: {classes}")
print(f"Training set: {train_x.shape[1]} examples")
print(f"Test set: {test_x.shape[1]} examples")

The dataset consists of:
- **train_x**: Training features (mel-spectrograms) of shape (n_features, m_train)
- **train_y**: Training labels of shape (1, m_train) where 0 = female, 1 = male
- **test_x**: Test features of shape (n_features, m_test)
- **test_y**: Test labels of shape (1, m_test)
- **classes**: List of class names ['female', 'male']

Let's visualize some examples from our dataset.

In [None]:
# Visualize a few examples
fig, axes = plt.subplots(2, 4, figsize=(15, 6))

for i, ax in enumerate(axes.flat):
    if i < train_x.shape[1]:
        # Reshape back to 2D spectrogram (assuming 128 mel bands)
        n_mels = 128
        spec = train_x[:, i].reshape(n_mels, -1)
        
        # Plot spectrogram
        img = librosa.display.specshow(spec, ax=ax, x_axis='time', y_axis='mel')
        label = 'Male' if train_y[0, i] == 1 else 'Female'
        ax.set_title(f'Sample {i}: {label}')
    else:
        ax.axis('off')

plt.tight_layout()
plt.show()

<a name='ex-1'></a>
### Exercise 1 - Dataset Exploration

How many training examples do you have? What is the `shape` of the variables `train_x` and `train_y`?

**Hint**: How do you get the shape of a numpy array? [(help)](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html)

In [None]:
# (≈ 3 lines of code)
# shape_X = ...
# shape_Y = ...
# m = ...
# YOUR CODE STARTS HERE
# YOUR CODE ENDS HERE

print('The shape of train_x is: ' + str(shape_X))
print('The shape of train_y is: ' + str(shape_Y))
print('I have m = %d training examples!' % (m))

<a name='3'></a>
## 3 - Neural Network Model

Now you're going to build a neural network with a single hidden layer for audio classification.

**The Model Architecture**:

<font color='cyan'>
Input → Hidden Layer (with tanh activation) → Output Layer (with sigmoid activation)
</font>

**Mathematically**:

For one example $x^{(i)}$:
$$z^{[1] (i)} =  W^{[1]} x^{(i)} + b^{[1]}\tag{1}$$
$$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$
$$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}\tag{3}$$
$$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$
$$y^{(i)}_{prediction} = \begin{cases} 1 & \text{if } a^{[2](i)} > 0.5 \\ 0 & \text{otherwise} \end{cases}\tag{5}$$

The cost function $J$:
$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right)  \large  \right) \small \tag{6}$$

**General Methodology**:
1. Define the neural network structure (# of input units, # of hidden units, etc)
2. Initialize the model's parameters
3. Loop:
    - Implement forward propagation
    - Compute loss
    - Implement backward propagation to get the gradients
    - Update parameters (gradient descent)

<a name='3-1'></a>
### 3.1 - Defining the Neural Network Structure

<a name='ex-2'></a>
### Exercise 2 - layer_sizes

Define three variables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer

**Hint**: Use shapes of X and Y to find n_x and n_y. Hard code the hidden layer size to be 4.

In [None]:
# GRADED FUNCTION: layer_sizes

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    #(≈ 3 lines of code)
    # n_x = ... 
    # n_h = ...
    # n_y = ... 
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    return (n_x, n_h, n_y)

In [None]:
t_X, t_Y = layer_sizes_test_case()
(n_x, n_h, n_y) = layer_sizes(t_X, t_Y)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))

layer_sizes_test(layer_sizes)

**Expected Output**:
```
The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 1
All tests passed!
```

<a name='3-2'></a>
### 3.2 - Initialize the Model's Parameters

<a name='ex-3'></a>
### Exercise 3 - initialize_parameters

Implement the function `initialize_parameters()`.

**Instructions**:
- Initialize the weight matrices with random values using `np.random.randn(a,b) × 0.01`
- Initialize the bias vectors as zeros using `np.zeros((a,b))`

In [None]:
# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """    
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [None]:
np.random.seed(2)
n_x, n_h, n_y = initialize_parameters_test_case()
parameters = initialize_parameters(n_x, n_h, n_y)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

initialize_parameters_test(initialize_parameters)

**Expected Output**:
```
W1 = [[-0.00416758 -0.00056267]
 [-0.02136196  0.01640271]
 [-0.01793436 -0.00841747]
 [ 0.00502881 -0.01245288]]
b1 = [[0.]
 [0.]
 [0.]
 [0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[0.]]
All tests passed!
```

<a name='3-3'></a>
### 3.3 - The Loop

<a name='ex-4'></a>
### Exercise 4 - forward_propagation

Implement `forward_propagation()` using these equations:

$$Z^{[1]} =  W^{[1]} X + b^{[1]}\tag{1}$$
$$A^{[1]} = \tanh(Z^{[1]})\tag{2}$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}\tag{3}$$
$$\hat{Y} = A^{[2]} = \sigma(Z^{[2]})\tag{4}$$

**Instructions**:
- Use the `sigmoid()` function (already imported from audio_utils)
- Use `np.tanh()` for the tanh activation
- Store intermediate values in a cache dictionary for backpropagation

In [None]:
# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    # (≈ 4 lines of code)
    # Z1 = ...
    # A1 = ...
    # Z2 = ...
    # A2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [None]:
t_X, parameters = forward_propagation_test_case()
A2, cache = forward_propagation(t_X, parameters)
print("A2 = " + str(A2))

forward_propagation_test(forward_propagation)

**Expected Output**:
```
A2 = [[0.26541412 0.42229234 0.55544138]]
All tests passed!
```

<a name='3-4'></a>
### 3.4 - Compute the Cost

<a name='ex-5'></a>
### Exercise 5 - compute_cost

Implement `compute_cost()` to compute the value of the cost $J$.

$$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{6}$$

**Instructions**:
- Use `np.multiply()` and `np.sum()` or `np.dot()` to compute the cost
- Use `np.squeeze()` to ensure the cost is a scalar (single number)

In [None]:
# GRADED FUNCTION: compute_cost

def compute_cost(A2, Y):
    """
    Computes the cross-entropy cost
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1] # number of examples

    # Compute the cross-entropy cost
    # (≈ 2 lines of code)
    # logprobs = ...
    # cost = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    cost = float(np.squeeze(cost))
    
    return cost

In [None]:
A2, t_Y = compute_cost_test_case()
cost = compute_cost(A2, t_Y)
print("cost = " + str(cost))

compute_cost_test(compute_cost)

**Expected Output**:
```
cost = 0.6930587610394646
All tests passed!
```

<a name='3-5'></a>
### 3.5 - Implement Backpropagation

<a name='ex-6'></a>
### Exercise 6 - backward_propagation

Implement the function `backward_propagation()`.

**Instructions**:

Backpropagation computes the gradients of the cost with respect to parameters. Use these formulas:

$$dZ^{[2]} = A^{[2]} - Y$$
$$dW^{[2]} = \frac{1}{m} dZ^{[2]} A^{[1]T}$$
$$db^{[2]} = \frac{1}{m} \sum_{i=1}^{m} dZ^{[2](i)}$$
$$dZ^{[1]} = W^{[2]T} dZ^{[2]} * (1 - A^{[1]2})$$
$$dW^{[1]} = \frac{1}{m} dZ^{[1]} X^{T}$$
$$db^{[1]} = \frac{1}{m} \sum_{i=1}^{m} dZ^{[1](i)}$$

**Note**: × denotes element-wise multiplication. For tanh: $g'(z) = 1 - a^2$

In [None]:
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    X -- input data of shape (n_x, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # Retrieve W1 and W2 from dictionary "parameters"
    #(≈ 2 lines of code)
    # W1 = ...
    # W2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
        
    # Retrieve A1 and A2 from dictionary "cache"
    #(≈ 2 lines of code)
    # A1 = ...
    # A2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    # Backward propagation: calculate dW1, db1, dW2, db2
    #(≈ 6 lines of code)
    # dZ2 = ...
    # dW2 = ...
    # db2 = ...
    # dZ1 = ...
    # dW1 = ...
    # db1 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [None]:
parameters, cache, t_X, t_Y = backward_propagation_test_case()

grads = backward_propagation(parameters, cache, t_X, t_Y)
print("dW1 = "+ str(grads["dW1"]))
print("db1 = "+ str(grads["db1"]))
print("dW2 = "+ str(grads["dW2"]))
print("db2 = "+ str(grads["db2"]))

backward_propagation_test(backward_propagation)

**Expected Output**:
```
All tests passed!
```

<a name='3-6'></a>
### 3.6 - Update Parameters

<a name='ex-7'></a>
### Exercise 7 - update_parameters

Implement the update rule using gradient descent:

$$\theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$$

where $\alpha$ is the learning rate and $\theta$ represents a parameter.

**Hint**: Use `copy.deepcopy()` for W1 and W2 to avoid modifying the original parameters.

In [None]:
# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate=1.2):
    """
    Updates parameters using gradient descent
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    learning_rate -- learning rate for gradient descent
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve a copy of each parameter from dictionary "parameters"
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    # Retrieve each gradient from dictionary "grads"
    #(≈ 4 lines of code)
    # dW1 = ...
    # db1 = ...
    # dW2 = ...
    # db2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    # Update rule for each parameter
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [None]:
parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

update_parameters_test(update_parameters)

**Expected Output**:
```
All tests passed!
```

<a name='3-7'></a>
### 3.7 - Integration

<a name='ex-8'></a>
### Exercise 8 - nn_model

Build your neural network model by integrating all previous functions.

In [None]:
# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations=10000, learning_rate=0.01, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (n_x, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    learning_rate -- learning rate for gradient descent
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model
    """
    
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters
    #(≈ 1 line of code)
    # parameters = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    # Loop (gradient descent)
    for i in range(0, num_iterations):
        #(≈ 4 lines of code)
        # A2, cache = ...
        # cost = ...
        # grads = ...
        # parameters = ...
        # YOUR CODE STARTS HERE
        # YOUR CODE ENDS HERE
        
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print("Cost after iteration %i: %f" %(i, cost))

    return parameters

In [None]:
nn_model_test(nn_model)

**Expected Output**:
```
Cost after iteration 0: ...
Cost after iteration 1000: ...
...
All tests passed!
```

<a name='4'></a>
## 4 - Test the Model

<a name='4-1'></a>
### 4.1 - Predict

<a name='ex-9'></a>
### Exercise 9 - predict

Use forward propagation to predict results.

$$predictions = \mathbb{1}_{\{activation > 0.5\}}$$

In [None]:
# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns:
    predictions -- vector of predictions (0 for female / 1 for male)
    """
    
    # Computes probabilities using forward propagation
    #(≈ 2 lines of code)
    # A2, cache = ...
    # predictions = ...
    # YOUR CODE STARTS HERE
    # YOUR CODE ENDS HERE
    
    return predictions

In [None]:
parameters, t_X = predict_test_case()

predictions = predict(parameters, t_X)
print("Predictions: " + str(predictions))

predict_test(predict)

**Expected Output**:
```
Predictions: [[ True False  True]]
All tests passed!
```

<a name='4-2'></a>
### 4.2 - Test the Model on the Audio Dataset

Time to train the model on real audio data!

In [None]:
# Build a model with a 4-unit hidden layer
print("Training neural network with 4 hidden units...")
parameters = nn_model(train_x, train_y, n_h=4, num_iterations=3000, learning_rate=0.01, print_cost=True)

# Make predictions
predictions_train = predict(parameters, train_x)
predictions_test = predict(parameters, test_x)

# Calculate accuracy
train_accuracy = ((np.dot(train_y, predictions_train.T) + np.dot(1 - train_y, 1 - predictions_train.T)) / float(train_y.size) * 100).item()
test_accuracy = ((np.dot(test_y, predictions_test.T) + np.dot(1 - test_y, 1 - predictions_test.T)) / float(test_y.size) * 100).item()

print(f'\nTraining Accuracy: {train_accuracy:.2f}%')
print(f'Test Accuracy: {test_accuracy:.2f}%')

Great job! Your neural network can now classify male and female voices with high accuracy! 

The model has learned patterns in the audio spectrograms that distinguish between male and female voices, such as:
- **Pitch differences**: Female voices typically have higher fundamental frequencies
- **Formant patterns**: Different resonance characteristics
- **Spectral shape**: Overall energy distribution across frequencies

<a name='5'></a>
## 5 - Tuning Hidden Layer Size

Let's experiment with different hidden layer sizes to see how it affects performance.

In [None]:
# Test different hidden layer sizes
plt.figure(figsize=(12, 8))
hidden_layer_sizes = [1, 2, 3]

train_accuracies = []
test_accuracies = []

for n_h in hidden_layer_sizes:
    print(f"\nTraining with {n_h} hidden units...")
    parameters = nn_model(train_x, train_y, n_h, num_iterations=3000, print_cost=False)
    
    predictions_train = predict(parameters, train_x)
    predictions_test = predict(parameters, test_x)
    
    train_acc = ((np.dot(train_y, predictions_train.T) + np.dot(1 - train_y, 1 - predictions_train.T)) / float(train_y.size) * 100).item()
    test_acc = ((np.dot(test_y, predictions_test.T) + np.dot(1 - test_y, 1 - predictions_test.T)) / float(test_y.size) * 100).item()
    
    train_accuracies.append(train_acc)
    test_accuracies.append(test_acc)
    
    print(f"  Train accuracy: {train_acc:.2f}%")
    print(f"  Test accuracy: {test_acc:.2f}%")

# Plot results
plt.plot(hidden_layer_sizes, train_accuracies, 'o-', label='Train Accuracy', linewidth=2)
plt.plot(hidden_layer_sizes, test_accuracies, 's-', label='Test Accuracy', linewidth=2)
plt.xlabel('Hidden Layer Size', fontsize=12)
plt.ylabel('Accuracy (%)', fontsize=12)
plt.title('Model Performance vs Hidden Layer Size', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Observations**:

- **Small hidden layers** (1-2 units): May underfit - not enough capacity to learn complex patterns
- **Medium hidden layers** (3-5 units): Often provide good balance between performance and generalization
- **Large hidden layers** (10+ units): May overfit the training data, especially with limited training examples

The optimal hidden layer size depends on:
- Complexity of the task
- Amount of training data
- Regularization techniques (covered in later modules)

## Congratulations!

You've successfully:
✅ Built a complete 2-class neural network with a hidden layer  
✅ Implemented forward and backward propagation from scratch  
✅ Applied it to real audio classification  
✅ Analyzed the impact of hidden layer size  

You've taken a major step in deep learning! This neural network architecture is the foundation for much larger and more complex networks.

**Next steps to explore**:
- Try different activation functions (ReLU, Leaky ReLU)
- Experiment with different learning rates
- Add more hidden layers (deep networks)
- Try other audio classification tasks