# Section 1: Battery Lifespan Analysis

In this notebook, we'll analyze battery lifespan data to make an informed purchasing decision using Maximum Likelihood Estimation (MLE) and the exponential distribution.

## Learning Objectives

By the end of this section, you will be able to:
- Understand when exponential distributions are appropriate for modeling failure times
- Apply Maximum Likelihood Estimation to real data
- Make data-driven decisions using statistical models
- Visualize and interpret probability distributions

## Setup: Import Libraries

Let's start by importing the necessary Python libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.optimize import minimize_scalar
import pandas as pd

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib for better plots
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

## Part 1: The Battery Replacement Problem

### Background

You manage a facility with 100 devices that each use a specialized battery. When a battery fails, you must replace it immediately to keep the device operational. You have two purchasing options:

- **Option A**: Buy batteries as needed at $25 each
- **Option B**: Buy batteries in bulk at $15 each, but you must purchase all 100 at once

The key question: **How long do these batteries typically last?**

### The Data

Let's load some battery failure time data (in months):

In [None]:
# Battery failure times (in months) from a sample of 20 batteries
battery_data = np.array([
    2.1, 4.3, 1.8, 6.2, 3.4, 2.9, 5.1, 1.6, 4.8, 3.7,
    2.3, 5.9, 1.9, 4.1, 3.2, 6.8, 2.7, 4.5, 3.8, 5.3
])

print(f"Number of batteries tested: {len(battery_data)}")
print(f"Failure times (months): {battery_data}")
print(f"Mean failure time: {np.mean(battery_data):.2f} months")
print(f"Standard deviation: {np.std(battery_data):.2f} months")

### Visualizing the Data

Let's examine the distribution of failure times:

In [None]:
# Create histogram of failure times
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(battery_data, bins=8, density=True, alpha=0.7, color='skyblue', edgecolor='black')
plt.xlabel('Failure Time (months)')
plt.ylabel('Density')
plt.title('Distribution of Battery Failure Times')
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.boxplot(battery_data, vert=False)
plt.xlabel('Failure Time (months)')
plt.title('Box Plot of Battery Failure Times')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Part 2: Choosing the Right Model

### Why Not a Normal Distribution?

Battery failure times have some special characteristics:
- They are always positive (can't have negative failure times)
- They often show a "memoryless" property - the probability of failure doesn't depend on how long the battery has already been working
- The distribution is typically right-skewed

### The Exponential Distribution

For failure times, the **exponential distribution** is often appropriate. It has:
- Parameter $\lambda$ (lambda): the failure rate
- Mean: $1/\lambda$
- PDF: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$

In [None]:
# Let's visualize what exponential distributions look like
x = np.linspace(0, 10, 1000)

plt.figure(figsize=(10, 6))
lambdas = [0.5, 1.0, 2.0]
colors = ['red', 'blue', 'green']

for lam, color in zip(lambdas, colors):
    y = lam * np.exp(-lam * x)
    plt.plot(x, y, color=color, linewidth=2, label=f'$\lambda$ = {lam} (mean = {1/lam:.1f})')

plt.xlabel('Time')
plt.ylabel('Probability Density')
plt.title('Exponential Distributions with Different Parameters')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Part 3: Maximum Likelihood Estimation

### The Likelihood Function

For exponential distribution with parameter $\lambda$, the likelihood of observing our data is:

$$L(\lambda) = \prod_i \lambda e^{-\lambda x_i} = \lambda^n e^{-\lambda \sum x_i}$$

The log-likelihood is:
$$\ell(\lambda) = n \log(\lambda) - \lambda \sum x_i$$

### Finding the MLE

In [None]:
def log_likelihood_exponential(lam, data):
    """
    Calculate the log-likelihood for exponential distribution
    
    Parameters:
    lam: rate parameter (lambda)
    data: array of observed failure times
    
    Returns:
    log-likelihood value
    """
    if lam <= 0:
        return -np.inf
    
    n = len(data)
    sum_data = np.sum(data)
    
    # Log-likelihood: n*log(lambda) - lambda*sum(data)
    return n * np.log(lam) - lam * sum_data

# The MLE for exponential distribution has a closed form: lambda_hat = n / sum(x_i)
n = len(battery_data)
sum_data = np.sum(battery_data)
lambda_mle = n / sum_data

print(f"Sample size (n): {n}")
print(f"Sum of failure times: {sum_data:.2f}")
print(f"MLE estimate of $\lambda$: {lambda_mle:.4f}")
print(f"Estimated mean failure time: {1/lambda_mle:.2f} months")

### Visualizing the Likelihood Function

In [None]:
# Plot the likelihood function
lambda_values = np.linspace(0.1, 0.8, 1000)
log_likelihoods = [log_likelihood_exponential(lam, battery_data) for lam in lambda_values]

plt.figure(figsize=(10, 6))
plt.plot(lambda_values, log_likelihoods, 'b-', linewidth=2)
plt.axvline(lambda_mle, color='red', linestyle='--', linewidth=2, 
            label=f'MLE: $\lambda$ = {lambda_mle:.4f}')
plt.xlabel('$\lambda$ (Rate Parameter)')
plt.ylabel('Log-Likelihood')
plt.title('Log-Likelihood Function for Exponential Distribution')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Maximum log-likelihood value: {log_likelihood_exponential(lambda_mle, battery_data):.2f}")

## Part 4: Model Validation

### Comparing Model to Data

Let's see how well our exponential model fits the observed data:

In [None]:
# Create comparison plot
plt.figure(figsize=(12, 5))

# Histogram of actual data
plt.subplot(1, 2, 1)
plt.hist(battery_data, bins=8, density=True, alpha=0.7, color='skyblue', 
         edgecolor='black', label='Observed Data')

# Overlay the fitted exponential distribution
x_model = np.linspace(0, max(battery_data) * 1.2, 1000)
y_model = lambda_mle * np.exp(-lambda_mle * x_model)
plt.plot(x_model, y_model, 'r-', linewidth=3, label=f'Exponential Model ($\lambda$={lambda_mle:.3f})')

plt.xlabel('Failure Time (months)')
plt.ylabel('Density')
plt.title('Model vs. Observed Data')
plt.legend()
plt.grid(True, alpha=0.3)

# Q-Q plot for model checking
plt.subplot(1, 2, 2)
# Generate theoretical quantiles from exponential distribution
sorted_data = np.sort(battery_data)
n = len(sorted_data)
theoretical_quantiles = stats.expon.ppf(np.arange(1, n+1)/(n+1), scale=1/lambda_mle)

plt.scatter(theoretical_quantiles, sorted_data, alpha=0.7)
plt.plot([0, max(theoretical_quantiles)], [0, max(theoretical_quantiles)], 'r--', linewidth=2)
plt.xlabel('Theoretical Quantiles (Exponential)')
plt.ylabel('Sample Quantiles')
plt.title('Q-Q Plot: Model Check')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Goodness of Fit Test

In [None]:
# Kolmogorov-Smirnov test
ks_statistic, p_value = stats.kstest(battery_data, 
                                    lambda x: stats.expon.cdf(x, scale=1/lambda_mle))

print("Kolmogorov-Smirnov Test Results:")
print(f"KS Statistic: {ks_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Interpretation: {'Good fit' if p_value > 0.05 else 'Poor fit'} ($\\alpha$ = 0.05)")

## Part 5: Making the Business Decision

### Cost Analysis

Now let's use our model to make the purchasing decision:

In [None]:
# Parameters
cost_individual = 25  # Cost per battery when buying individually
cost_bulk = 15       # Cost per battery when buying in bulk
num_devices = 100    # Number of devices
time_horizon = 12    # Analysis period (months)

# Expected number of failures per device in 12 months
# For exponential distribution: P(failure by time t) = 1 - exp(-lambda*t)
prob_failure_12_months = 1 - np.exp(-lambda_mle * time_horizon)
expected_failures_per_device = prob_failure_12_months

print(f"Model Results:")
print(f"Estimated failure rate ($\lambda$): {lambda_mle:.4f} failures/month")
print(f"Estimated mean battery life: {1/lambda_mle:.2f} months")
print(f"Probability of failure within 12 months: {prob_failure_12_months:.3f}")
print(f"Expected failures per device in 12 months: {expected_failures_per_device:.3f}")

# Total expected failures across all devices
total_expected_failures = num_devices * expected_failures_per_device

print(f"\nCost Analysis for {num_devices} devices over {time_horizon} months:")
print(f"Expected total battery replacements: {total_expected_failures:.1f}")

# Option A: Buy as needed
cost_option_a = total_expected_failures * cost_individual

# Option B: Buy in bulk (assume we need to buy full sets)
# We need enough bulk purchases to cover expected failures
bulk_sets_needed = np.ceil(total_expected_failures / num_devices)
cost_option_b = bulk_sets_needed * num_devices * cost_bulk

print(f"\nOption A (buy as needed): ${cost_option_a:.2f}")
print(f"Option B (bulk purchase): ${cost_option_b:.2f}")
print(f"Savings with Option B: ${cost_option_a - cost_option_b:.2f}")
print(f"Recommended choice: {'Option B (bulk)' if cost_option_b < cost_option_a else 'Option A (individual)'}")

### Sensitivity Analysis

Let's see how sensitive our decision is to uncertainty in the failure rate:

In [None]:
# Calculate confidence interval for lambda using bootstrap
def bootstrap_mle(data, n_bootstrap=1000):
    """Bootstrap confidence interval for exponential MLE"""
    bootstrap_lambdas = []
    n = len(data)
    
    for _ in range(n_bootstrap):
        # Resample with replacement
        bootstrap_sample = np.random.choice(data, size=n, replace=True)
        # Calculate MLE for this sample
        bootstrap_lambda = n / np.sum(bootstrap_sample)
        bootstrap_lambdas.append(bootstrap_lambda)
    
    return np.array(bootstrap_lambdas)

# Generate bootstrap samples
bootstrap_lambdas = bootstrap_mle(battery_data)
lambda_ci_lower = np.percentile(bootstrap_lambdas, 2.5)
lambda_ci_upper = np.percentile(bootstrap_lambdas, 97.5)

print(f"95% Confidence Interval for $\lambda$: [{lambda_ci_lower:.4f}, {lambda_ci_upper:.4f}]")
print(f"95% CI for mean battery life: [{1/lambda_ci_upper:.2f}, {1/lambda_ci_lower:.2f}] months")

# Cost analysis with uncertainty
lambda_scenarios = [lambda_ci_lower, lambda_mle, lambda_ci_upper]
scenario_names = ['Optimistic (lower $\lambda$)', 'Best Estimate', 'Pessimistic (higher $\lambda$)']

print(f"\nSensitivity Analysis:")
for lam, name in zip(lambda_scenarios, scenario_names):
    prob_fail = 1 - np.exp(-lam * time_horizon)
    total_failures = num_devices * prob_fail
    cost_a = total_failures * cost_individual
    bulk_sets = np.ceil(total_failures / num_devices)
    cost_b = bulk_sets * num_devices * cost_bulk
    
    print(f"{name}:")
    print(f"  Expected failures: {total_failures:.1f}")
    print(f"  Cost A: ${cost_a:.2f}, Cost B: ${cost_b:.2f}")
    print(f"  Better choice: {'B' if cost_b < cost_a else 'A'}")

## Part 6: Visualization Summary

In [None]:
# Create a comprehensive summary plot
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Data and fitted model
ax1.hist(battery_data, bins=8, density=True, alpha=0.7, color='skyblue', edgecolor='black')
x_model = np.linspace(0, max(battery_data) * 1.2, 1000)
y_model = lambda_mle * np.exp(-lambda_mle * x_model)
ax1.plot(x_model, y_model, 'r-', linewidth=3, label=f'Exponential Model')
ax1.set_xlabel('Failure Time (months)')
ax1.set_ylabel('Density')
ax1.set_title('Battery Failure Data and Fitted Model')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Survival function
x_surv = np.linspace(0, 15, 1000)
survival_prob = np.exp(-lambda_mle * x_surv)
ax2.plot(x_surv, survival_prob, 'g-', linewidth=3)
ax2.axhline(0.5, color='red', linestyle='--', alpha=0.7, label='50% survival')
ax2.axvline(1/lambda_mle, color='red', linestyle='--', alpha=0.7, label=f'Mean life: {1/lambda_mle:.1f} months')
ax2.set_xlabel('Time (months)')
ax2.set_ylabel('Survival Probability')
ax2.set_title('Battery Survival Function')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Bootstrap distribution
ax3.hist(bootstrap_lambdas, bins=50, density=True, alpha=0.7, color='orange', edgecolor='black')
ax3.axvline(lambda_mle, color='red', linestyle='-', linewidth=2, label=f'MLE: {lambda_mle:.4f}')
ax3.axvline(lambda_ci_lower, color='red', linestyle='--', alpha=0.7, label='95% CI')
ax3.axvline(lambda_ci_upper, color='red', linestyle='--', alpha=0.7)
ax3.set_xlabel('$\lambda$ (Failure Rate)')
ax3.set_ylabel('Density')
ax3.set_title('Bootstrap Distribution of $\lambda$')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Cost comparison
scenarios = ['Optimistic', 'Best Estimate', 'Pessimistic']
costs_a = []
costs_b = []

for lam in lambda_scenarios:
    prob_fail = 1 - np.exp(-lam * time_horizon)
    total_failures = num_devices * prob_fail
    cost_a = total_failures * cost_individual
    bulk_sets = np.ceil(total_failures / num_devices)
    cost_b = bulk_sets * num_devices * cost_bulk
    costs_a.append(cost_a)
    costs_b.append(cost_b)

x_pos = np.arange(len(scenarios))
width = 0.35

ax4.bar(x_pos - width/2, costs_a, width, label='Option A (Individual)', color='lightcoral')
ax4.bar(x_pos + width/2, costs_b, width, label='Option B (Bulk)', color='lightblue')
ax4.set_xlabel('Scenario')
ax4.set_ylabel('Total Cost ($)')
ax4.set_title('Cost Comparison Across Scenarios')
ax4.set_xticks(x_pos)
ax4.set_xticklabels(scenarios)
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Summary and Conclusions

Based on our Maximum Likelihood Estimation analysis:

In [None]:
print("="*60)
print("BATTERY REPLACEMENT DECISION ANALYSIS - SUMMARY")
print("="*60)
print(f"Data: {len(battery_data)} battery failure times")
print(f"Model: Exponential distribution")
print(f"Estimated failure rate ($\lambda$): {lambda_mle:.4f} failures/month")
print(f"Estimated mean battery life: {1/lambda_mle:.2f} months")
print(f"95% CI for mean life: [{1/lambda_ci_upper:.2f}, {1/lambda_ci_lower:.2f}] months")
print()
print("BUSINESS RECOMMENDATION:")
print(f"For {num_devices} devices over {time_horizon} months:")

# Final recommendation based on best estimate
prob_fail_best = 1 - np.exp(-lambda_mle * time_horizon)
total_failures_best = num_devices * prob_fail_best
cost_a_best = total_failures_best * cost_individual
bulk_sets_best = np.ceil(total_failures_best / num_devices)
cost_b_best = bulk_sets_best * num_devices * cost_bulk

if cost_b_best < cost_a_best:
    print(f"✓ Choose OPTION B (Bulk Purchase)")
    print(f"  Expected cost: ${cost_b_best:.2f}")
    print(f"  Savings: ${cost_a_best - cost_b_best:.2f}")
else:
    print(f"✓ Choose OPTION A (Buy as Needed)")
    print(f"  Expected cost: ${cost_a_best:.2f}")
    print(f"  Savings: ${cost_b_best - cost_a_best:.2f}")

print()
print("KEY INSIGHTS:")
print("• Exponential distribution provides good fit for failure time data")
print("• MLE gives us principled parameter estimation")
print("• Bootstrap provides uncertainty quantification")
print("• Decision is robust across reasonable parameter uncertainty")
print("="*60)

## Next Steps

In practice, you might want to:
1. Collect more data to reduce uncertainty
2. Consider other distributions (Weibull, Gamma) for comparison
3. Account for bulk purchase storage costs
4. Model seasonal variations in failure rates
5. Incorporate supplier reliability and lead times

This analysis demonstrates how statistical modeling and MLE can inform real business decisions with quantified uncertainty.