# Neural Networks for Function Approximation: Business Applications

This notebook demonstrates how neural networks can learn to model complex, non-linear relationships - a fundamental capability that powers many business applications. We'll explore how even a simple neural network can approximate complicated functions, and how this ability translates to real-world business problems.

## Business Context

Many business relationships are inherently non-linear:
- Marketing spend vs. return on investment
- Price vs. demand elasticity
- Product features vs. customer satisfaction
- Resource allocation vs. productivity

Traditional approaches often use oversimplified linear models or require manual feature engineering to capture these non-linearities. Neural networks can automatically learn these complex relationships directly from data.

## 1. Setup and Imports

First, let's import the necessary libraries for our deep learning exercise:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Configure plots
plt.style.use('seaborn-whitegrid')
plt.rcParams['figure.figsize'] = (12, 7)
plt.rcParams['font.size'] = 12

## 2. Generate Synthetic Data

Let's create synthetic data that represents a non-linear business relationship. We'll use a quadratic function with some noise, which could represent a real-world scenario like:

- Price optimization curve (price vs. profit)
- Marketing spend efficiency curve (spend vs. conversions)
- Production volume vs. unit cost (economies of scale)

Our function will be $y = x^2 + \text{noise}$

In [None]:
# Define the underlying function: y = x^2 with a bit of noise
def generate_data(x):
    noise = 0.05 * np.random.randn(*x.shape)
    return x**2 + noise

# Create the training data: 20 samples from a uniform distribution in [-1, 1]
n_train = 20
X_train = np.random.uniform(-1, 1, (n_train, 1)).astype(np.float32)
y_train = generate_data(X_train)

# Create validation data:
# 30 samples (about 30%) similar to training: uniform(-1, 1)
n_val_similar = 30
X_val_similar = np.random.uniform(-1, 1, (n_val_similar, 1)).astype(np.float32)
y_val_similar = generate_data(X_val_similar)

# 70 samples (about 70%) out-of-distribution: uniform(2, 3)
n_val_out = 70
X_val_out = np.random.uniform(2, 3, (n_val_out, 1)).astype(np.float32)
y_val_out = generate_data(X_val_out)

# Convert the numpy arrays to PyTorch tensors with explicit float32 type
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_val_similar_tensor = torch.tensor(X_val_similar, dtype=torch.float32)
y_val_similar_tensor = torch.tensor(y_val_similar, dtype=torch.float32)
X_val_out_tensor = torch.tensor(X_val_out, dtype=torch.float32)
y_val_out_tensor = torch.tensor(y_val_out, dtype=torch.float32)

Let's visualize our training data and the true underlying function:

In [None]:
# Create a fine grid of x values to plot the true function
x_grid = np.linspace(-1.5, 3.5, 500).reshape(-1, 1)
y_true = x_grid**2  # True function without noise

plt.figure(figsize=(14, 7))
plt.plot(x_grid, y_true, 'g-', linewidth=2, label='True Function (y = x²)')
plt.scatter(X_train, y_train, color='blue', s=80, alpha=0.7, label='Training Data')
plt.scatter(X_val_similar, y_val_similar, color='orange', s=40, alpha=0.7, label='Validation Data (Similar Distribution)')
plt.scatter(X_val_out, y_val_out, color='red', s=40, alpha=0.7, label='Validation Data (Out-of-Distribution)')

plt.axvspan(-1, 1, alpha=0.1, color='blue', label='Training Distribution Range')
plt.axvspan(2, 3, alpha=0.1, color='red', label='Out-of-Distribution Range')

plt.title('Training and Validation Data with True Function', fontsize=16)
plt.xlabel('x', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

## 3. Define Neural Network Models

We'll create two neural network architectures to demonstrate different approaches to function approximation:

1. **Small Network:** A simple architecture with one hidden layer
2. **Large Network:** A more complex architecture with multiple hidden layers

This will help us understand the trade-offs between model complexity, generalization, and overfitting.

In [None]:
class SmallNetwork(nn.Module):
    def __init__(self):
        super(SmallNetwork, self).__init__()
        self.fc1 = nn.Linear(1, 5)  # Input dimension is 1, hidden layer has 5 neurons
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(5, 1)  # Output dimension is 1
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

class LargeNetwork(nn.Module):
    def __init__(self):
        super(LargeNetwork, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(1, 64),  # Input layer
            nn.ReLU(),
            nn.Linear(64, 64),  # Hidden layer 1
            nn.ReLU(),
            nn.Linear(64, 64),  # Hidden layer 2
            nn.ReLU(),
            nn.Linear(64, 1)    # Output layer
        )
    
    def forward(self, x):
        return self.model(x)

## 4. Training Function

Now let's define a function to train our neural networks and track their performance:

In [None]:
def train_model(model, X_train, y_train, X_val_similar, y_val_similar, X_val_out, y_val_out, 
                epochs=1000, learning_rate=0.01, weight_decay=0):
    
    # Define loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    
    # Lists to store metrics for plotting
    train_losses = []
    val_similar_losses = []
    val_out_losses = []
    
    # Training loop
    for epoch in range(epochs):
        # Set model to training mode
        model.train()
        
        # Forward pass
        outputs = model(X_train)
        loss = criterion(outputs, y_train)
        
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Set model to evaluation mode
        model.eval()
        
        # Record training and validation losses
        with torch.no_grad():
            train_loss = criterion(model(X_train), y_train).item()
            val_similar_loss = criterion(model(X_val_similar), y_val_similar).item()
            val_out_loss = criterion(model(X_val_out), y_val_out).item()
            
            train_losses.append(train_loss)
            val_similar_losses.append(val_similar_loss)
            val_out_losses.append(val_out_loss)
        
        # Print progress every 100 epochs
        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {train_loss:.4f}, '
                  f'Val Similar Loss: {val_similar_loss:.4f}, Val Out Loss: {val_out_loss:.4f}')
    
    return model, {'train': train_losses, 'val_similar': val_similar_losses, 'val_out': val_out_losses}

## 5. Train and Evaluate the Small Network

Now let's train our small network and see how well it can approximate the quadratic function:

In [None]:
small_model = SmallNetwork()
trained_small_model, small_model_history = train_model(
    small_model, X_train_tensor, y_train_tensor, 
    X_val_similar_tensor, y_val_similar_tensor,
    X_val_out_tensor, y_val_out_tensor,
    epochs=1000, learning_rate=0.01
)

Let's visualize the learning curves to see how our model trained:

In [None]:
def plot_learning_curves(history, title="Learning Curves"):
    plt.figure(figsize=(14, 7))
    
    epochs = range(1, len(history['train']) + 1)
    
    plt.plot(epochs, history['train'], 'b-', linewidth=2, label='Training Loss')
    plt.plot(epochs, history['val_similar'], 'g-', linewidth=2, label='Validation Loss (Similar Distribution)')
    plt.plot(epochs, history['val_out'], 'r-', linewidth=2, label='Validation Loss (Out-of-Distribution)')
    
    plt.title(title, fontsize=16)
    plt.xlabel('Epochs', fontsize=14)
    plt.ylabel('Loss (MSE)', fontsize=14)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.yscale('log')
    plt.show()

plot_learning_curves(small_model_history, "Small Network Learning Curves")

Now let's visualize how well our small model approximates the true function:

In [None]:
def plot_predictions(model, title="Model Predictions vs True Function"):
    # Convert grid to tensor for prediction
    x_grid_tensor = torch.tensor(x_grid, dtype=torch.float32)
    
    # Get model predictions
    with torch.no_grad():
        model.eval()
        y_pred = model(x_grid_tensor).numpy()
    
    plt.figure(figsize=(14, 7))
    
    # Plot true function
    plt.plot(x_grid, y_true, 'g-', linewidth=2, label='True Function (y = x²)')
    
    # Plot model predictions
    plt.plot(x_grid, y_pred, 'b--', linewidth=2, label='Model Predictions')
    
    # Plot data points
    plt.scatter(X_train, y_train, color='blue', s=80, alpha=0.7, label='Training Data')
    
    # Highlight distribution ranges
    plt.axvspan(-1, 1, alpha=0.1, color='blue', label='Training Distribution Range')
    plt.axvspan(2, 3, alpha=0.1, color='red', label='Out-of-Distribution Range')
    
    plt.title(title, fontsize=16)
    plt.xlabel('x', fontsize=14)
    plt.ylabel('y', fontsize=14)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.ylim(-1, 10)
    plt.show()

plot_predictions(trained_small_model, "Small Network Predictions")

## 6. Train and Evaluate the Large Network

Now let's train our larger, more complex network:

In [None]:
large_model = LargeNetwork()
trained_large_model, large_model_history = train_model(
    large_model, X_train_tensor, y_train_tensor, 
    X_val_similar_tensor, y_val_similar_tensor,
    X_val_out_tensor, y_val_out_tensor,
    epochs=1000, learning_rate=0.01
)

Let's visualize the learning curves for the large network:

In [None]:
plot_learning_curves(large_model_history, "Large Network Learning Curves")

And now let's see the predictions from our large model:

In [None]:
plot_predictions(trained_large_model, "Large Network Predictions")

## 7. Improving Generalization with Regularization

Let's try to improve our model's ability to generalize to out-of-distribution data by adding regularization. We'll train the large network again with weight decay (L2 regularization):

In [None]:
regularized_model = LargeNetwork()
trained_regularized_model, regularized_model_history = train_model(
    regularized_model, X_train_tensor, y_train_tensor, 
    X_val_similar_tensor, y_val_similar_tensor,
    X_val_out_tensor, y_val_out_tensor,
    epochs=1000, learning_rate=0.01, weight_decay=0.01  # Added weight decay
)

Let's visualize the learning curves for the regularized network:

In [None]:
plot_learning_curves(regularized_model_history, "Regularized Network Learning Curves")

And finally, let's see if regularization helped with generalizing to out-of-distribution data:

In [None]:
plot_predictions(trained_regularized_model, "Regularized Network Predictions")

## 8. Compare Model Performance

Let's quantitatively compare the performance of our three models on the training and validation datasets:

In [None]:
def evaluate_model(model, X_train, y_train, X_val_similar, y_val_similar, X_val_out, y_val_out):
    criterion = nn.MSELoss()
    model.eval()
    
    with torch.no_grad():
        # Calculate losses
        train_loss = criterion(model(X_train), y_train).item()
        val_similar_loss = criterion(model(X_val_similar), y_val_similar).item()
        val_out_loss = criterion(model(X_val_out), y_val_out).item()
    
    return {
        'Train Loss (MSE)': train_loss,
        'Validation Loss (Similar Distribution)': val_similar_loss,
        'Validation Loss (Out-of-Distribution)': val_out_loss
    }

# Evaluate all models
small_model_perf = evaluate_model(trained_small_model, X_train_tensor, y_train_tensor, 
                                 X_val_similar_tensor, y_val_similar_tensor,
                                 X_val_out_tensor, y_val_out_tensor)

large_model_perf = evaluate_model(trained_large_model, X_train_tensor, y_train_tensor, 
                                 X_val_similar_tensor, y_val_similar_tensor,
                                 X_val_out_tensor, y_val_out_tensor)

regularized_model_perf = evaluate_model(trained_regularized_model, X_train_tensor, y_train_tensor, 
                                       X_val_similar_tensor, y_val_similar_tensor,
                                       X_val_out_tensor, y_val_out_tensor)

# Create a comparison dataframe
results = pd.DataFrame({
    'Small Network': small_model_perf,
    'Large Network': large_model_perf,
    'Regularized Network': regularized_model_perf
}).T

results

Let's visualize this comparison:

In [None]:
# Plot bar chart comparison
results_melted = results.reset_index().melt(id_vars='index', var_name='Metric', value_name='Loss')

plt.figure(figsize=(14, 8))
bar_plot = sns.barplot(x='Metric', y='Loss', hue='index', data=results_melted)

plt.title('Model Performance Comparison', fontsize=16)
plt.xlabel('', fontsize=14)
plt.ylabel('Mean Squared Error (MSE)', fontsize=14)
plt.yscale('log')  # Log scale for better visualization
plt.legend(title='Model', fontsize=12)
plt.xticks(rotation=0)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 9. Business Implications

### Key Observations

1. **In-Distribution vs. Out-of-Distribution Performance:**
   - All models perform well on data similar to the training distribution
   - Performance degrades significantly on out-of-distribution data
   - Regularization improves generalization to new regions

2. **Model Complexity Trade-offs:**
   - Small model: Limited capacity, but better generalization
   - Large model: Better fit to training data, but overfits
   - Regularized model: Better balance of fit and generalization

3. **Learning Process:**
   - Models learn the quadratic relationship from limited data
   - Neural networks automatically capture non-linearity
   - Out-of-distribution performance requires careful consideration

### Business Applications

1. **Price Optimization:**
   - Learning the relationship between price and demand
   - Critical to validate predictions in new price ranges
   - Regularization helps when testing untested price points

2. **Marketing Budget Allocation:**
   - Modeling how marketing spend affects campaign performance
   - Avoid overfitting to historical spending patterns
   - Use regularization when exploring new spending levels

3. **Resource Planning:**
   - Predicting how resource allocation affects productivity
   - Ensuring models generalize to new operating conditions
   - Avoiding extrapolation errors in critical planning

### Best Practices for Business Implementation

1. **Data Collection:**
   - Sample across the full range of expected values
   - Include diverse operating conditions
   - Test model on out-of-sample data before deployment

2. **Model Selection:**
   - Match model complexity to data availability
   - Use regularization for better generalization
   - Consider ensemble methods for critical applications

3. **Operational Safeguards:**
   - Monitor predictions for out-of-distribution inputs
   - Implement confidence intervals around predictions
   - Establish business rules for extreme predictions

## 10. Learning Challenge: Marketing ROI Curve

**Scenario:** A marketing team has collected data on advertising spend vs. return on investment (ROI). The relationship is known to be non-linear, with diminishing returns at higher spending levels.

**Exercise:** 
1. Generate synthetic data that represents a typical marketing ROI curve
2. Train a neural network to learn this relationship
3. Use the model to predict the optimal marketing spend level
4. Implement regularization to improve generalization
5. Visualize and interpret the results

**Starter Code:**

In [None]:
# Generate synthetic marketing ROI data
# Typical ROI curve: initially increases, then plateaus, then decreases
def marketing_roi(spend):
    # ROI starts positive, peaks, then diminishes
    # Return is in dollars, spend is in thousands
    return 2 * spend - 0.5 * spend**2 + 0.1 * np.random.randn(*spend.shape)

# Generate training data
spend_train = np.random.uniform(0, 2, (30, 1)).astype(np.float32)  # Marketing spend from $0K to $2K
roi_train = marketing_roi(spend_train)

# Generate test data including higher spend levels
spend_test = np.linspace(0, 4, 100).reshape(-1, 1).astype(np.float32)  # Testing from $0K to $4K
roi_test = marketing_roi(spend_test)

# Visualize the data
plt.figure(figsize=(12, 7))
plt.scatter(spend_train, roi_train, color='blue', s=80, alpha=0.7, label='Training Data')
plt.plot(spend_test, roi_test, 'g-', linewidth=2, alpha=0.5, label='True ROI Curve')
plt.title('Marketing Spend vs. ROI', fontsize=16)
plt.xlabel('Marketing Spend ($K)', fontsize=14)
plt.ylabel('Return on Investment ($K)', fontsize=14)
plt.axvspan(0, 2, alpha=0.1, color='blue', label='Training Range')
plt.axvspan(2, 4, alpha=0.1, color='red', label='Extrapolation Range')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Your task: Build and train a neural network to model this relationship
# Then use it to find the optimal marketing spend level

# 1. Convert data to PyTorch tensors
# YOUR CODE HERE

# 2. Define a neural network model
# YOUR CODE HERE

# 3. Train the model
# YOUR CODE HERE

# 4. Find the optimal spend level that maximizes ROI
# YOUR CODE HERE

# 5. Visualize the results
# YOUR CODE HERE

## Conclusion

In this notebook, we've demonstrated how neural networks can learn to approximate complex non-linear functions from data - a fundamental capability that powers many business applications. Key takeaways include:

1. Neural networks automatically learn appropriate representations without manual feature engineering
2. Model complexity needs to be balanced with the amount of available training data
3. Regularization techniques improve generalization to new, unseen data points
4. Performance in out-of-distribution regions requires careful validation
5. These concepts apply directly to business problems like pricing optimization, marketing spend efficiency, and resource planning

The ability to model complex, non-linear relationships is at the heart of deep learning's value for business applications. By understanding these fundamental concepts, you can apply neural networks effectively to a wide range of business problems.