# Machine Learning Specialization - Advanced Learning Algorithms
## Week 1: Neural Networks Basics - Complete Solutions with Explanations

### Learning Objectives:
- Understand the basics of neural networks
- Build a simple neural network for binary classification
- Learn about activation functions and forward propagation
- Implement neural networks using TensorFlow

### Key Concepts:
- **Neural Networks**: Composed of layers of interconnected nodes (neurons)
- **Forward Propagation**: How information flows through the network
- **Activation Functions**: Non-linear functions that enable learning complex patterns
- **Hidden Layers**: Intermediate layers between input and output
- **TensorFlow**: A popular deep learning framework

Neural networks can automatically learn complex patterns and decision boundaries, making them more powerful than logistic regression for many tasks.

### 1. Import Required Libraries

Let's import the necessary libraries for our neural network exercises.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_circles
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("Libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")

### 2. Generate Non-Linear Classification Dataset

We'll create a dataset that requires non-linear decision boundaries, which logistic regression cannot handle well.

In [None]:
# Generate non-linear binary classification data (circular pattern)
X, y = make_circles(n_samples=500, noise=0.1, factor=0.5, random_state=42)

print(f"Dataset shape: X = {X.shape}, y = {y.shape}")
print(f"Classes: {np.unique(y)}")
print(f"Class distribution: Class 0 = {np.sum(y == 0)}, Class 1 = {np.sum(y == 1)}")
print(f"First 5 samples:")
for i in range(5):
    print(f"Features: [{X[i, 0]:.2f}, {X[i, 1]:.2f}], Label: {y[i]}")

### 3. Visualize the Non-Linear Data

Let's plot our data to see why linear models won't work well here.

In [None]:
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[y == 0, 0], X[y == 0, 1], color='blue', alpha=0.7, label='Class 0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], color='red', alpha=0.7, label='Class 1')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Non-Linear Classification Dataset')
plt.legend()
plt.grid(True, alpha=0.3)
plt.axis('equal')

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

plt.subplot(1, 2, 2)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='bwr', alpha=0.7)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='bwr', alpha=0.7, edgecolors='black', linewidths=2)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Train (filled) vs Test (outlined)')
plt.grid(True, alpha=0.3)
plt.axis('equal')

plt.tight_layout()
plt.show()

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

### 4. Neural Network Architecture

A neural network consists of:
- **Input Layer**: Receives the features
- **Hidden Layers**: Learn intermediate representations
- **Output Layer**: Produces the final prediction

For our binary classification task, we'll use:
- Input layer: 2 neurons (for 2 features)
- Hidden layer: 4 neurons with ReLU activation
- Output layer: 1 neuron with sigmoid activation

The network can be represented as:

**Forward Propagation:**
$$a^{[1]} = \sigma(W^{[1]} x + b^{[1]})$$
$$\hat{y} = \sigma(W^{[2]} a^{[1]} + b^{[2]})$$

### 5. Build Neural Network with TensorFlow

**Exercise 5.1: Build Your First Neural Network**

**Learning Objectives:**
- Learn how to use TensorFlow's Sequential API
- Understand the structure of neural networks (input → hidden → output layers)
- Implement appropriate activation functions for each layer
- Configure the model for binary classification

**Task:**
Complete the neural network model below by replacing the `None` values with appropriate Dense layers. Use:
- 4 neurons in the hidden layer with ReLU activation
- 1 neuron in the output layer with sigmoid activation
- Proper input_shape parameter

In [None]:
# Build the neural network model
model = Sequential([
    # YOUR CODE HERE - Add layers to the neural network
    # Hint: Use Dense layers with appropriate activation functions
    # Input layer -> Hidden layer -> Output layer
    
    # Input layer (implicitly created by first Dense layer)
    None,  # Replace with Dense layer for hidden layer
    
    # Output layer
    None   # Replace with Dense layer for output
], name='binary_classifier')

# Display model summary
model.summary()

# Compile the model
model.compile(
    optimizer='adam',  # Adaptive learning rate optimization
    loss='binary_crossentropy',  # Appropriate for binary classification
    metrics=['accuracy']  # Track accuracy during training
)

print("Model compiled successfully!")

**Solution**: Here's the complete neural network implementation:

In [None]:
# Build the neural network model
model = Sequential([
    # Input layer (implicitly created by first Dense layer)
    Dense(4, activation='relu', input_shape=(2,), name='hidden_layer'),  # Hidden layer with 4 neurons
    
    # Output layer with 1 neuron and sigmoid activation for binary classification
    Dense(1, activation='sigmoid', name='output_layer')
], name='binary_classifier')

# Display model summary
model.summary()

# Compile the model
model.compile(
    optimizer='adam',  # Adaptive learning rate optimization
    loss='binary_crossentropy',  # Appropriate for binary classification
    metrics=['accuracy']  # Track accuracy during training
)

print("Model compiled successfully!")

**Solution Explanation:**
- We created a neural network with one hidden layer containing 4 neurons using ReLU activation
- The output layer has 1 neuron with sigmoid activation for binary classification
- We used the Sequential API which stacks layers linearly
- The `input_shape=(2,)` parameter tells Keras the input has 2 features
- Adam optimizer adapts the learning rate during training
- Binary cross-entropy is the standard loss for binary classification

### 6. Train the Neural Network

Now let's train our neural network on the training data.

In [None]:
# Train the model
history = model.fit(
    X_train, y_train,
    epochs=100,  # Number of training iterations
    batch_size=32,  # Number of samples per gradient update
    validation_split=0.2,  # Use 20% of training data for validation
    verbose=1  # Show training progress
)

print("Training completed!")

### 7. Evaluate Model Performance

Let's evaluate our trained neural network on the test set.

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Make predictions
y_pred_proba = model.predict(X_test)
y_pred = (y_pred_proba > 0.5).astype(int).flatten()

# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

### 8. Visualize Training History

Let's plot the training and validation metrics over time.

In [None]:
# Plot training history
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Model Loss Over Time')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Model Accuracy Over Time')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 9. Visualize Decision Boundary

Let's visualize the complex decision boundary learned by our neural network.

In [None]:
def plot_decision_boundary_nn(X, y, model):
    """
    Plot the decision boundary for a neural network.
    
    Args:
        X: Feature matrix
        y: Target values
        model: Trained neural network model
    """
    # Create a mesh grid
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                        np.linspace(y_min, y_max, 100))
    
    # Make predictions on the grid
    grid_points = np.column_stack([xx.ravel(), yy.ravel()])
    Z = model.predict(grid_points, verbose=0)
    Z = Z.reshape(xx.shape)
    
    # Plot the decision boundary and data
    plt.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu_r', levels=np.linspace(0, 1, 11))
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    
    # Plot the data points
    plt.scatter(X[y == 0, 0], X[y == 0, 1], color='blue', alpha=0.8, label='Class 0', edgecolors='white')
    plt.scatter(X[y == 1, 0], X[y == 1, 1], color='red', alpha=0.8, label='Class 1', edgecolors='white')
    
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Neural Network Decision Boundary')
    plt.legend()
    plt.colorbar(label='Predicted Probability')
    plt.axis('equal')
    plt.grid(True, alpha=0.3)

# Plot decision boundary
plt.figure(figsize=(10, 8))
plot_decision_boundary_nn(X_test, y_test, model)
plt.show()

### 10. Compare with Logistic Regression

Let's compare our neural network with a simple logistic regression model to see the improvement.

In [None]:
# Train a logistic regression model for comparison
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)

# Evaluate logistic regression
log_reg_accuracy = log_reg.score(X_test, y_test)
print(f"Logistic Regression Test Accuracy: {log_reg_accuracy:.4f}")
print(f"Neural Network Test Accuracy: {test_accuracy:.4f}")
print(f"Improvement: {((test_accuracy - log_reg_accuracy) / log_reg_accuracy * 100):.2f}%")

# Plot comparison
plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)
# Logistic regression decision boundary
x_min, x_max = X_test[:, 0].min() - 0.5, X_test[:, 0].max() + 0.5
y_min, y_max = X_test[:, 1].min() - 0.5, X_test[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                    np.linspace(y_min, y_max, 100))
grid_points = np.column_stack([xx.ravel(), yy.ravel()])
Z_log = log_reg.predict_proba(grid_points)[:, 1].reshape(xx.shape)

plt.contourf(xx, yy, Z_log, alpha=0.4, cmap='RdYlBu_r')
plt.contour(xx, yy, Z_log, levels=[0.5], colors='black', linewidths=2)
plt.scatter(X_test[y_test == 0, 0], X_test[y_test == 0, 1], color='blue', alpha=0.8, label='Class 0')
plt.scatter(X_test[y_test == 1, 0], X_test[y_test == 1, 1], color='red', alpha=0.8, label='Class 1')
plt.title('Logistic Regression Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.axis('equal')
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# Neural network decision boundary (already plotted above, but let's show it again)
plot_decision_boundary_nn(X_test, y_test, model)
plt.title('Neural Network Decision Boundary')

plt.tight_layout()
plt.show()

### 11. Experiment with Network Architecture

Let's experiment with different neural network architectures to see how they affect performance.

In [None]:
# Experiment with different architectures
architectures = [
    {'name': 'Small Network', 'hidden_units': [2], 'epochs': 100},
    {'name': 'Medium Network', 'hidden_units': [4], 'epochs': 100},
    {'name': 'Large Network', 'hidden_units': [8, 4], 'epochs': 100},
    {'name': 'Deep Network', 'hidden_units': [16, 8, 4], 'epochs': 150}
]

results = []

for arch in architectures:
    print(f"\nTraining {arch['name']}...")
    
    # Build model
    model_exp = Sequential(name=arch['name'])
    
    # Add hidden layers
    for i, units in enumerate(arch['hidden_units']):
        if i == 0:
            model_exp.add(Dense(units, activation='relu', input_shape=(2,)))
        else:
            model_exp.add(Dense(units, activation='relu'))
    
    # Output layer
    model_exp.add(Dense(1, activation='sigmoid'))
    
    # Compile
    model_exp.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    # Train
    history_exp = model_exp.fit(
        X_train, y_train,
        epochs=arch['epochs'],
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    test_loss_exp, test_acc_exp = model_exp.evaluate(X_test, y_test, verbose=0)
    
    results.append({
        'name': arch['name'],
        'test_accuracy': test_acc_exp,
        'final_val_accuracy': history_exp.history['val_accuracy'][-1],
        'model': model_exp
    })
    
    print(f"{arch['name']}: Test Accuracy = {test_acc_exp:.4f}")

# Plot architecture comparison
plt.figure(figsize=(10, 6))
names = [r['name'] for r in results]
accuracies = [r['test_accuracy'] for r in results]

plt.bar(names, accuracies, color=['lightblue', 'blue', 'darkblue', 'navy'])
plt.ylabel('Test Accuracy')
plt.title('Model Performance vs Architecture Complexity')
plt.ylim(0.8, 1.0)
for i, acc in enumerate(accuracies):
    plt.text(i, acc + 0.005, f'{acc:.3f}', ha='center')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

### 12. Manual Neural Network Implementation

**Exercise 12.1: Implement Neural Network from Scratch**

**Learning Objectives:**
- Understand the mathematical foundations of neural networks
- Implement forward propagation manually
- Learn about parameter initialization
- Compare manual implementation with high-level frameworks

**Background:**
Neural networks perform matrix operations under the hood. Understanding these operations helps you:
- Debug complex models
- Optimize performance
- Choose appropriate architectures
- Implement custom layers and loss functions

**Task:**
Complete the manual neural network implementation below. You'll need to:
1. Initialize weights and biases with small random values
2. Implement forward propagation using matrix multiplication
3. Use ReLU activation for hidden layer and sigmoid for output

In [None]:
def sigmoid(z):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-z))

def relu(z):
    """ReLU activation function"""
    return np.maximum(0, z)

def initialize_parameters(input_size, hidden_size, output_size):
    """
    Initialize neural network parameters.
    
    Args:
        input_size: Number of input features
        hidden_size: Number of hidden units
        output_size: Number of output units
    
    Returns:
        parameters: Dictionary containing W1, b1, W2, b2
    """
    # YOUR CODE HERE - Initialize weights and biases
    # Hint: Use random initialization with small values
    W1 = None  # Replace with weight matrix for first layer
    b1 = None  # Replace with bias vector for first layer
    W2 = None  # Replace with weight matrix for second layer
    b2 = None  # Replace with bias vector for second layer
    
    parameters = {
        'W1': W1,
        'b1': b1,
        'W2': W2,
        'b2': b2
    }
    
    return parameters

def forward_propagation(X, parameters):
    """
    Perform forward propagation.
    
    Args:
        X: Input features (m x input_size)
        parameters: Network parameters
    
    Returns:
        A2: Output predictions
        cache: Intermediate values for backpropagation
    """
    # YOUR CODE HERE - Implement forward propagation
    # Hint: Z1 = X @ W1 + b1, A1 = relu(Z1), Z2 = A1 @ W2 + b2, A2 = sigmoid(Z2)
    
    W1, b1, W2, b2 = parameters['W1'], parameters['b1'], parameters['W2'], parameters['b2']
    
    Z1 = None  # Replace with linear combination for first layer
    A1 = None  # Replace with activation for first layer
    Z2 = None  # Replace with linear combination for second layer
    A2 = None  # Replace with activation for second layer
    
    cache = {
        'Z1': Z1,
        'A1': A1,
        'Z2': Z2,
        'A2': A2
    }
    
    return A2, cache

# Test the manual implementation
parameters = initialize_parameters(input_size=2, hidden_size=4, output_size=1)
predictions_manual, _ = forward_propagation(X_test[:5], parameters)
print(f"Manual neural network predictions (first 5 samples): {predictions_manual.flatten()}")

# Compare with TensorFlow model
tf_predictions = model.predict(X_test[:5], verbose=0)
print(f"TensorFlow model predictions (first 5 samples): {tf_predictions.flatten()}")
print("\n(Note: Random initialization means predictions will be different, but the structure is the same)")

**Solution**: Here's the complete manual neural network implementation:

In [None]:
def sigmoid(z):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-z))

def relu(z):
    """ReLU activation function"""
    return np.maximum(0, z)

def initialize_parameters(input_size, hidden_size, output_size):
    """
    Initialize neural network parameters.
    
    Args:
        input_size: Number of input features
        hidden_size: Number of hidden units
        output_size: Number of output units
    
    Returns:
        parameters: Dictionary containing W1, b1, W2, b2
    """
    # Initialize weights and biases with small random values
    W1 = np.random.randn(input_size, hidden_size) * 0.01
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size) * 0.01
    b2 = np.zeros((1, output_size))
    
    parameters = {
        'W1': W1,
        'b1': b1,
        'W2': W2,
        'b2': b2
    }
    
    return parameters

def forward_propagation(X, parameters):
    """
    Perform forward propagation.
    
    Args:
        X: Input features (m x input_size)
        parameters: Network parameters
    
    Returns:
        A2: Output predictions
        cache: Intermediate values for backpropagation
    """
    # Retrieve parameters
    W1, b1, W2, b2 = parameters['W1'], parameters['b1'], parameters['W2'], parameters['b2']
    
    # Forward propagation
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    
    cache = {
        'Z1': Z1,
        'A1': A1,
        'Z2': Z2,
        'A2': A2
    }
    
    return A2, cache

# Test the manual implementation
parameters = initialize_parameters(input_size=2, hidden_size=4, output_size=1)
predictions_manual, _ = forward_propagation(X_test[:5], parameters)
print(f"Manual neural network predictions (first 5 samples): {predictions_manual.flatten()}")

# Compare with TensorFlow model
tf_predictions = model.predict(X_test[:5], verbose=0)
print(f"TensorFlow model predictions (first 5 samples): {tf_predictions.flatten()}")
print("\n(Note: Random initialization means predictions will be different, but the structure is the same)")

**Solution Explanation:**
- **Parameter Initialization**: Weights are initialized with small random values (scaled by 0.01) to break symmetry, biases are initialized to zeros
- **Forward Propagation**: 
  - Z1 = X·W1 + b1 (linear combination for first layer)
  - A1 = ReLU(Z1) (apply activation function)
  - Z2 = A1·W2 + b2 (linear combination for output layer)
  - A2 = sigmoid(Z2) (final prediction probabilities)
- The manual implementation shows exactly what TensorFlow/Keras does under the hood

### 13. Experimentation and Advanced Concepts

**Exercise 13.1: Activation Function Comparison**

**Learning Objectives:**
- Compare different activation functions and their effects on learning
- Understand the vanishing gradient problem
- Learn about advanced activation functions

**Task:**
Experiment with different activation functions (ReLU, tanh, sigmoid) and analyze their performance differences.

**Solution and Analysis:**

In [None]:
# Experiment: Different activation functions
activation_functions = ['relu', 'tanh', 'sigmoid']
activation_results = []

for activation in activation_functions:
    print(f"\nTesting {activation} activation...")
    
    # Build model with different activation
    model_act = Sequential([
        Dense(8, activation=activation, input_shape=(2,)),
        Dense(4, activation=activation),
        Dense(1, activation='sigmoid')
    ])
    
    model_act.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    # Train
    history_act = model_act.fit(
        X_train, y_train,
        epochs=100,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    test_loss_act, test_acc_act = model_act.evaluate(X_test, y_test, verbose=0)
    activation_results.append({
        'activation': activation,
        'test_accuracy': test_acc_act,
        'history': history_act
    })
    
    print(f"{activation}: Test Accuracy = {test_acc_act:.4f}")

# Plot activation function comparison
plt.figure(figsize=(10, 6))
activations = [r['activation'] for r in activation_results]
accuracies = [r['test_accuracy'] for r in activation_results]

plt.bar(activations, accuracies, color=['lightgreen', 'green', 'darkgreen'])
plt.ylabel('Test Accuracy')
plt.title('Model Performance vs Activation Function')
plt.ylim(0.85, 1.0)
for i, acc in enumerate(accuracies):
    plt.text(i, acc + 0.002, f'{acc:.3f}', ha='center')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

**Activation Function Analysis:**
- **ReLU**: Usually performs best, addresses vanishing gradient problem, computationally efficient
- **Tanh**: Better than sigmoid (zero-centered), but still suffers from vanishing gradients
- **Sigmoid**: Can cause vanishing gradients in deep networks, less commonly used in hidden layers

**Key Insights:**
- ReLU helps with the vanishing gradient problem
- Sigmoid/tanh compress outputs to specific ranges
- Choice of activation function affects convergence speed and final performance

### Key Takeaways

1. **Neural Networks** can learn complex non-linear decision boundaries automatically.

2. **Forward Propagation** passes information through layers using matrix multiplications and activation functions.

3. **Activation Functions** like ReLU enable learning of non-linear patterns.

4. **Hidden Layers** allow the network to learn hierarchical features.

5. **TensorFlow/Keras** provides high-level APIs that make building neural networks easy.

### Next Steps

In the next notebook, we'll explore multi-class classification problems and learn about the softmax function, cross-entropy loss, and more advanced activation functions.

### Additional Resources

- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) by Michael Nielsen
- [Deep Learning Book](https://www.deeplearningbook.org/) by Ian Goodfellow et al.
- [TensorFlow Documentation](https://www.tensorflow.org/guide/keras)

### Practice Exercises

1. Try implementing a neural network for the XOR problem (a classic non-linear problem).
2. Experiment with different optimizers (SGD, RMSprop, Adam) and compare their performance.
3. Add regularization techniques (dropout, L2 regularization) to prevent overfitting.
4. Implement early stopping based on validation performance.