<a href="https://colab.research.google.com/github/artkula/ML-retreat-tekmek-2025/blob/main/linear_and_logistic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning for Materials Science: Linear and Logistic Regression

---

## Learning Objectives

This notebook demonstrates machine learning applications in materials engineering through two case studies:

**Part 1: Linear Regression for Stress-Strain Modeling**
1. Feature importance analysis using R² for understanding material property influence
2. Polynomial regression for capturing elastic-plastic-necking transitions
3. Regularization techniques (L2/Ridge) to prevent overfitting
4. Bias-variance tradeoff in model complexity selection

**Part 2: Binary Classification for Multi-Axial Yield Prediction**
1. Von Mises yield criterion as a physics-based decision boundary
2. Logistic regression with polynomial features for nonlinear classification
3. Decision threshold optimization for engineering risk management
4. Model validation against theoretical predictions

**Key Principle**: Machine learning performance improves when informed by physical understanding of the material system.

---

## Table of Contents

**Part 1: Stress-Strain Modeling via Polynomial Regression**
- 1.1 Constitutive Model and Data Generation
- 1.2 Feature Importance Analysis (R²)
- 1.3 Linear Regression Framework
- 1.4 Underfitting: Linear Model Limitations
- 1.5 Regularization Effects
- 1.6 Polynomial Features and Overfitting
- 1.7 Interactive Feature Engineering

**Part 2: Multi-Axial Yield Prediction via Logistic Regression**
- 2.1 Multi-Axial Loading and Failure Criteria
- 2.2 Von Mises Yield Criterion (Plane Stress)
- 2.3 Experimental Data Generation
- 2.4 Interactive Yield Surface Exploration
- 2.5 Decision Boundary Optimization
- 2.6 Logistic Regression Theory
- 2.7 Model Training and Validation
- 2.8 Learned vs. Theoretical Boundaries
- 2.9 Threshold Selection for Risk Management

**Conclusions**: Key Insights and Engineering Applications

---

## Setup: Import Required Libraries

Before we begin, we need to import all the necessary Python libraries for data manipulation, visualization, machine learning, and interactive widgets.

In [1]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

# Optional seaborn
try:
    import seaborn as sns
except ImportError:
    pass  # Seaborn optional, matplotlib defaults are fine

# Machine learning
from sklearn.linear_model import LinearRegression, Ridge, LogisticRegression
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GroupShuffleSplit, cross_val_score
from sklearn.metrics import (
    mean_squared_error,
    r2_score,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    roc_curve,
    confusion_matrix,
    classification_report,
    log_loss
)

# Interactive widgets
import ipywidgets as widgets
from IPython.display import display, clear_output
from ipywidgets import interact

# Plotting configuration
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11
try:
    sns.set_style("whitegrid")
except NameError:
    pass  # sns is not imported if seaborn was not installed

# Set random seed for reproducibility
np.random.seed(42)

# Font configuration for Greek symbols and subscripts
plt.rcParams.update({
    "font.family": "DejaVu Sans",
    "mathtext.fontset": "dejavusans",
    "axes.unicode_minus": False
})

## Notation & Units

### Physical Quantities

| Symbol | Quantity | Units (code) | Units (display) |
|--------|----------|--------------|-----------------|
| $\sigma$ | Stress | Pa | MPa, GPa |
| $\varepsilon$ | Strain | dimensionless |  -  |
| $E$ | Young's modulus | Pa | GPa |
| $\sigma_y$ | Yield strength | Pa | MPa |
| $K$, $n$ | Hardening parameters | Pa, dimensionless | MPa,  -  |
| $C$ | Carbon content | wt% | wt% |
| $d$ | Grain size | μm | μm |

### Machine Learning Notation

| Symbol | Quantity |
|--------|----------|
| $\lambda$ | Ridge regularization parameter |
| $R^2$ | Coefficient of determination |
| RMSE | Root mean squared error |

### Units Policy

**Internal representation:** All stress values stored in **Pa** (Pascals).
but converted to **MPa** (÷10⁶) or **GPa** (÷10⁹) in plots and table to avoid numerical precision issues while keeping printouts readable.

---

# Part 1: Polynomial Regression for Stress-Strain Modeling

---

## 1.1 Constitutive Model and Data Generation

### Engineering Context

Tensile testing provides the fundamental stress-strain relationship σ(ε) for material characterization. This relationship exhibits three distinct regimes:

**1. Elastic Region (ε < ε_y)**
$$\sigma = E \varepsilon$$

where E is Young's modulus and ε_y is the yield strain.

**2. Plastic Region (ε_y ≤ ε ≤ ε_u)**

Hollomon Power-Law Hardening (Teaching Form):

**Note**: The classical Hollomon form is σ = K εₚⁿ. This notebook uses σ = σᵧ + K(ε - εᵧ)ⁿ for pedagogical clarity.

Hollomon power-law hardening:
$$\sigma = \sigma_y + K(\varepsilon - \varepsilon_y)^n$$

where:
- σ_y = yield strength
- K = strength coefficient
- n = strain hardening exponent (≈ 0.2-0.3 for low-carbon steel)
- ε_u = uniform elongation (necking initiation)

**3. Necking Region (ε > ε_u)**

Engineering stress decreases as deformation localizes:
$$\sigma = \sigma_u \exp[-\alpha(\varepsilon - \varepsilon_u)]$$

where σ_u is the ultimate tensile strength and α controls the softening rate.

> **⚠️ Pedagogical Model Warning**
>
> These coefficients are **scaled for teaching purposes only**. Do not use these values for design, certification, or engineering calculations. Consult materials handbooks and standards for real applications.

### Machine Learning Challenge

**Goal**: Learn σ(ε) from experimental data without explicit knowledge of the piecewise model.

**Key Questions**:
1. Can polynomial regression capture this nonlinear, multi-regime behavior?
2. What polynomial degree balances accuracy and generalization?
3. How does regularization affect the learned relationship?
4. Which material parameters (if any) improve predictions from a single test?

---

Configure material properties and plotting parameters for all experiments in this notebook.

In [None]:
# =============================================================================
# CONFIGURATION: Adjust these parameters to explore different scenarios
# =============================================================================

# Material properties (low-carbon steel)
E = 210e9          # Young's modulus (Pa)
sigma_y_base = 250e6  # Base yield strength (Pa)
sigma_u = 400e6    # Ultimate tensile strength (Pa)

# Hardening parameters
K = 600e9          # Strength coefficient (Pa)
n = 0.15           # Strain hardening exponent (dimensionless)

# Necking parameters
alpha = 20.0       # Exponential softening rate (dimensionless)

# Material variability ranges
carbon_range = (0.05, 0.30)     # Carbon content range (wt%) – broadened to introduce more variability
grain_size_range = (10, 100)    # Grain size range (micrometers)

# Hall-Petch effect (toy coefficients for pedagogy)
base_strength = 200e6      # Pa
hall_petch_coeff = 50e6    # Pa – larger toy coefficient to increase grain‑size effect
carbon_effect_coeff = 2e9  # Pa – strong carbon dependence for demonstration

# Data generation
n_specimens = 30   # Number of synthetic specimens
n_points = 100     # Points per stress-strain curve
noise_level = 5e6  # Stress measurement noise (Pa)

# Train/test splitting
test_size = 0.25   # Fraction of data for testing
random_seed = 42   # Random seed for reproducibility

# Part 2: Failure criterion
sigma_y_failure = 250e6  # Yield strength for classification (Pa)
n_samples_failure = 300  # Number of synthetic stress states

# Unit conversion helpers (prevent Pa/MPa errors)
def MPa(x):
    """Convert Pa to MPa"""
    return np.asarray(x) / 1e6

def GPa(x):
    """Convert Pa to GPa"""
    return np.asarray(x) / 1e9

# Helper function for polynomial + ridge pipeline (used throughout Part 1)
def make_poly_ridge_pipeline(degree: int, alpha: float):
    """
    Create a pipeline for polynomial regression with ridge regularization.

    Parameters:
    - degree: Polynomial degree for feature expansion
    - alpha: Ridge regularization strength (L2 penalty)

    Returns:
    - Pipeline with polynomial features, standard scaling, and ridge regression
    """
    return Pipeline([
        ("poly",   PolynomialFeatures(degree=degree, include_bias=False)),
        ("scaler", StandardScaler()),
        ("ridge",  Ridge(alpha=float(alpha), fit_intercept=True, max_iter=10000))
    ])

print("Configuration loaded successfully.")
print(f"  Material: Low-carbon steel (E={E/1e9:.0f} GPa, σ_y={sigma_y_base/1e6:.0f} MPa)")
print(f"  Specimens: {n_specimens} with {n_points} points each")
print(f"  Random seed: {random_seed}")

### Data Generation and Visualization

We will generate realistic stress-strain data that mimics actual tensile testing of low-carbon steel. The data includes elastic, plastic, and necking regions.

In [None]:

# Material properties for low-carbon steel (using E from config)
sigma_y = 250e6 + np.random.normal(0, 20e6)   # Yield strength (Pa)
sigma_u = 400e6 + np.random.normal(0, 30e6) # Ultimate tensile strength (Pa)

# Override config values for Part 1 synthetic data generation
# Work hardening (Hollomon) parameters for teaching curve
K = sigma_u * 1.5
n = 0.25
epsilon_y = sigma_y / E
epsilon_u = 0.15 + np.random.normal(0, 0.01)  # uniform elongation (~ ultimate point)

# --- Strain sampling with extra density around yield and no duplicated endpoints ---
n_points = 150
n_elastic = int(n_points * 0.20)
n_knee    = int(n_points * 0.30)
n_plastic = int(n_points * 0.30)
n_neck    = n_points - (n_elastic + n_knee + n_plastic)

knee_hi = min(epsilon_u, epsilon_y * 1.12)

epsilon_elastic = np.linspace(0.0, epsilon_y, n_elastic, endpoint=False)
epsilon_knee    = np.linspace(epsilon_y, knee_hi,  n_knee, endpoint=False)
epsilon_plastic = np.linspace(knee_hi,  epsilon_u, n_plastic, endpoint=False)
epsilon_neck    = np.linspace(epsilon_u, epsilon_u * 1.20, n_neck, endpoint=True)

strain_data = np.concatenate([epsilon_elastic, epsilon_knee, epsilon_plastic, epsilon_neck])

# --- Clean piecewise truth σ_true(ε) ---
sigma_true = np.empty_like(strain_data)

mask_elastic = strain_data < epsilon_y
mask_plastic = (strain_data >= epsilon_y) & (strain_data <= epsilon_u)
mask_neck    = strain_data > epsilon_u

sigma_true[mask_elastic] = E * strain_data[mask_elastic]
eps_p = np.maximum(strain_data[mask_plastic] - epsilon_y, 0.0)
sigma_true[mask_plastic] = sigma_y + K * eps_p**n

sigma_at_u = sigma_y + K * np.maximum(epsilon_u - epsilon_y, 0.0)**n
sigma_true[mask_neck] = sigma_at_u * np.exp(-alpha * (strain_data[mask_neck] - epsilon_u))

# --- Add measurement noise ONLY to measured stress ---
noise = np.random.normal(0, 20e6, size=strain_data.size)
stress_noisy = sigma_true + noise

# DataFrame with properties and test conditions
df = pd.DataFrame({
    'strain': strain_data,
    'stress': stress_noisy,        # measured (noisy)
    'stress_true': sigma_true,     # clean reference
    'yield_strength': sigma_y,
    'uts': sigma_u,
    'epsilon_y': epsilon_y,
    'epsilon_u': epsilon_u,
    'carbon_content': 0.18,
    'grain_size_um': 40.0,
    'temperature_C': 23.0,
    'strain_rate': 1e-4,
    'surface_roughness': np.random.uniform(0.2, 0.6, strain_data.size),
    'specimen_thickness': 5.0,
    'specimen_width': 10.0,
    'heat_treatment': 'as_rolled'
})

# Smooth reference curve for plotting
strain_grid = np.linspace(0, strain_data.max(), 600)
sigma_ref = np.empty_like(strain_grid)

m_e = strain_grid < epsilon_y
m_p = (strain_grid >= epsilon_y) & (strain_grid <= epsilon_u)
m_n = strain_grid > epsilon_u

sigma_ref[m_e] = E * strain_grid[m_e]
sigma_ref[m_p] = sigma_y + K * np.maximum(strain_grid[m_p] - epsilon_y, 0.0)**n
sigma_ref[m_n] = sigma_at_u * np.exp(-alpha * (strain_grid[m_n] - epsilon_u))

fit_curve = pd.DataFrame({'strain_grid': strain_grid, 'sigma_eng_grid': sigma_ref})

# --- Visualization ---
plt.figure(figsize=(10, 6))
plt.scatter(df['strain'], df['stress'] / 1e6, color='orange', alpha=0.6, edgecolors='k', s=26, label='Measured data')
plt.plot(fit_curve['strain_grid'], fit_curve['sigma_eng_grid'] / 1e6, 'b-', linewidth=2, alpha=0.85, label='True behavior')

plt.axvline(epsilon_y, color='g', linestyle='--', alpha=0.7, label=f'Yield (ε_y={epsilon_y:.4f})')
plt.axvline(epsilon_u, color='r', linestyle='--', alpha=0.7, label=f'Ultimate (ε_u={epsilon_u:.3f})')
plt.axhline(sigma_y / 1e6, color='g', linestyle=':', alpha=0.5)
plt.axhline(sigma_u / 1e6, color='r', linestyle=':', alpha=0.5)

plt.xlabel("Strain (dimensionless)")
plt.ylabel("Engineering Stress (MPa)")
plt.title("Stress vs Strain: clean truth vs noisy observations")
plt.legend(fontsize=9, loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()


---

## 1.2 Feature Importance Analysis Using R²

### Motivation

Before building complex models, quantify which parameters influence stress. The coefficient of determination (R²) measures the proportion of variance explained by a single feature:

$$R^2 = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_i (y_i - \bar{y})^2}$$

where:
- $y_i$ = measured stress (target variable)
- $\hat{y}_i$ = stress predicted by linear regression using one feature
- $\bar{y}$ = mean of measured stress values

**Interpretation**:
- R² = 1: Feature perfectly predicts target
- R² = 0: Feature no better than mean prediction
- R² < 0: Feature worse than mean (poor model)

### Expected Results

For a **single tensile test** (constant material properties):
- **Strain**: Should dominate (R² ≈ 0.5-0.7) - stress fundamentally depends on strain
- **Material parameters** (C content, grain size): R² ≈ 0 - constant within one test
- **Geometry** (thickness, width): R² ≈ 0 - normalized stress independent of dimensions

To assess material parameter importance, we would need data from **multiple specimens** with varying composition and microstructure.

---

In [None]:
# Generate example data for R² demonstration
x_demo = np.linspace(0, 10, 10)
true_slope = 2.0
true_intercept = 1.0
y_demo = true_slope * x_demo + true_intercept + np.random.normal(scale=2.0, size=len(x_demo))
y_mean_demo = np.mean(y_demo)

# Rotation center
x_mid = (x_demo.min() + x_demo.max()) / 2
y_mid = true_slope * x_mid + true_intercept

def plot_rotated_line(slope):
    """Interactive function to show how R² changes with model slope."""
    clear_output(wait=True)

    # Rotated model prediction
    y_model = slope * (x_demo - x_mid) + y_mid

    # Compute R² score
    r2 = r2_score(y_demo, y_model)

    # Plotting
    plt.figure(figsize=(10, 6))
    plt.scatter(x_demo, y_demo, label='Data', color='orange', s=80, zorder=3, edgecolors='k')
    plt.plot(x_demo, y_model, label=f'Model (Slope = {slope:.2f})', color='blue', linewidth=2)

    # Vertical residuals
    for xi, yi_data, yi_model in zip(x_demo, y_demo, y_model):
        plt.plot([xi, xi], [yi_data, yi_model], 'r--', linewidth=1, alpha=0.6)

    # Mean line
    plt.axhline(y_mean_demo, color='green', linestyle=':', linewidth=2,
                label=f'Mean = {y_mean_demo:.2f}')

    # Residual indicator
    plt.plot([], [], 'r--', linewidth=1, label='Residuals')

    plt.xlabel("Input Feature", fontsize=12)
    plt.ylabel("Output", fontsize=12)
    plt.title(f"R² Score: {r2:.3f}\n(Higher is better, max = 1.0)",
              fontsize=14, fontweight='bold')
    plt.legend(fontsize=10, loc='best')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

    if r2 > 0.9:
        print("✅ Excellent fit! R² > 0.9 indicates the model explains >90% of variance.")
    elif r2 > 0.7:
        print("🟢 Good fit! R² > 0.7 indicates the model captures most of the pattern.")
    elif r2 > 0.3:
        print("🟡 Moderate fit. R² > 0.3 indicates some predictive power.")
    else:
        print("🔴 Poor fit. R² < 0.3 indicates little predictive power.")

# Interactive slider
slope_slider = widgets.FloatSlider(
    value=true_slope, min=-5.0, max=5.0, step=0.1,
    description='Slope:', continuous_update=False, style={'description_width': 'initial'}
)

print("📊 Interactive R² Demonstration")
print("   Use the slider to adjust the model slope and observe how R² changes.")
print("   Try to maximize R² by finding the best fit!")
print()
interact(plot_rotated_line, slope=slope_slider)

### Computing R² for Material Parameters

Now we will calculate R² for each material parameter to see which ones best predict stress. From materials science, we know that certain parameters should matter more than others:

- **Strain**: Should strongly correlate with stress (fundamental stress-strain relationship)
- **Carbon content**: Affects strength through solid solution strengthening
- **Grain size**: Hall-Petch relationship (strength ∝ 1/√d)
- **Temperature**: Higher temperatures reduce strength
- **Specimen geometry**: Should have minimal effect on normalized stress

We will also create some engineered features based on physical principles:
- **grain_size_inv_sqrt**: 1/√(grain_size) for Hall-Petch relationship
- **temp_deviation**: Temperature deviation from room temperature

---

**R² with Different Units (quick note)**

- **R² is unitless and scale-invariant.** Rescaling a feature (e.g., converting strain from fraction to %) **does not change R²**.

- **Why this matters:** You can **compare univariate R² across features with different units**. R² measures the fraction of variance explained, not a per-unit effect, so strain's R² as fraction (0.05) equals its R² as percentage (5%).

- **If "influence per typical change" is needed:** Standardize inputs to z-scores (mean=0, std=1) and compare **standardized coefficients**.
  - **Why z-scores?** They put all features on the same scale (1 unit = 1 standard deviation), making coefficients directly comparable.
  - **Standardized coefficients** show the change in outcome per 1-SD change in predictor, revealing which variable has more impact for its typical variation range.



In [None]:
# Create engineered features based on physical principles
df['grain_size_inv_sqrt'] = 1 / np.sqrt(df['grain_size_um'])
df['temp_deviation'] = abs(df['temperature_C'] - 23.0)

# Select features to evaluate
features_to_test = [
    'strain',
    'carbon_content',
    'grain_size_um',
    'grain_size_inv_sqrt',
    'temperature_C',
    'temp_deviation',
    'strain_rate',
    'surface_roughness',
    'specimen_thickness',
    'specimen_width'
]

# Calculate R² for each feature
r2_scores = {}
for feature in features_to_test:
    if feature in df.columns:
        X = df[[feature]].values.reshape(-1, 1)
        y = df['stress'].values

        # Fit simple linear regression
        model = LinearRegression()
        model.fit(X, y)
        y_pred = model.predict(X)

        # Calculate R²
        r2 = r2_score(y, y_pred)
        r2_scores[feature] = r2

# Sort by R² score
r2_df = pd.DataFrame(list(r2_scores.items()), columns=['Feature', 'R² Score'])
r2_df = r2_df.sort_values('R² Score', ascending=False)

print("📊 R² Scores for Material Parameters")
print("=" * 50)
print(r2_df.to_string(index=False))
print("=" * 50)
print("\n💡 Interpretation:")
print("   - strain shows dominant R² because stress fundamentally depends on strain")
print("   - Material properties (carbon, grain size) have much lower R²")
print("   - This is expected: during a SINGLE test, material properties are constant")
print("   - To see their effect, we would need data from MULTIPLE specimens")
print("   - Specimen geometry and surface roughness have minimal predictive power")

### Visualizing R² Scores

We will create a bar chart to visualize the R² scores and see which features matter most.

In [None]:
plt.figure(figsize=(12, 6))
# Calculate dynamic x-axis limits to accommodate negative R²
xmin = min(-0.2, r2_df['R² Score'].min() - 0.05)
xmax = 1.0

colors = ['green' if x > 0.5 else 'orange' if x > 0.1 else 'red' for x in r2_df['R² Score']]
bars = plt.barh(r2_df['Feature'], r2_df['R² Score'], color=colors, edgecolor='black', alpha=0.8)

# Add value labels on bars
for i, (feature, score) in enumerate(zip(r2_df['Feature'], r2_df['R² Score'])):
    plt.text(score + 0.01, i, f'{score:.3f}', va='center', fontsize=10, fontweight='bold')

plt.xlabel('R² Score (Higher = Better Predictor)', fontsize=12)
plt.ylabel('Material Parameter', fontsize=12)
plt.title('Feature Importance: R² Scores for Predicting Stress\n(Single Tensile Test Data)',
          fontsize=14, fontweight='bold')
plt.xlim([xmin, xmax])
plt.axvline(0.5, color='black', linestyle='--', alpha=0.3, linewidth=1)
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("\n🎯 Key Insight:")
print("   Strain dominates because stress is fundamentally a function of strain.")
print("   Other material parameters are constant in this single test.")
print("   To assess their importance, we would need multi-specimen testing.")

---

## 1.2.1 The Critical Limitation: Why Material Parameters Show R² ≈ 0

### The Problem

The R² analysis above reveals a puzzling result: only strain matters. Material properties (carbon content, grain size) show R² ≈ 0.

**Is this correct?** Do carbon and grain size really not affect stress?

**No!** This is a **data collection artifact**, not a physical truth.

### Why This Happens

In a **single tensile test**:
- Specimen has ONE carbon content (e.g., 0.18%)
- Specimen has ONE grain size (e.g., 40 μm)
- These values are **constant** throughout the test
- ML algorithm sees: strain varies (0 → 0.2), stress varies (0 → 400 MPa)
- ML algorithm sees: carbon = 0.18% always, stress varies
- Conclusion: carbon does not predict stress (R² = 0)

**This is wrong!** Carbon DOES affect strength, but we need **multiple specimens** to see it.

### The Physical Reality

Yield strength depends on microstructure:

**Carbon content effect** (Solid solution strengthening):
$$\Delta \sigma_y^{\text{carbon}} \approx 2000 \text{ MPa} \times C_{\text{wt\%}}$$

Example: 0.20% C adds ~400 MPa compared to pure Fe

**Grain size effect** (Hall-Petch relationship):
$$\Delta \sigma_y^{\text{grain}} = k_y d^{-1/2}$$

where d = grain diameter, k_y ≈ 0.6 MPa·m^(1/2) for steel

Example: 20 μm grains → +134 MPa compared to 80 μm grains

**Combined effect on stress-strain curve**:
$$\sigma_y(C, d) = \sigma_{y,\text{base}} + 2000C + \frac{k_y}{\sqrt{d}}$$

This shifts the entire stress-strain curve vertically!

### The Solution: Multi-Specimen Testing

To quantify material parameter effects, we need:
- **Multiple specimens** with varying C and d
- Each specimen produces a stress-strain curve
- ML can now see: high C → higher curve, small d → higher curve

**We will demonstrate this.**

---


### The Single-Specimen Limitation

**Critical Observation**: Material parameters show R² ≈ 0 above.

**Why?** They are constant in a single test. We need multiple specimens.

---


### Multi-Specimen Analysis: What to Expect

**Experimental design**:
- Generate 30 specimens with:
  - Carbon content: 0.10% to 0.30% (3× range)
  - Grain size: 20 to 80 μm (4× range)
- Each specimen: ~50 strain measurements
- Total: ~1500 data points

**Expected stress-strain behavior**:
- Higher C → Higher σ_y → Entire curve shifts up
- Smaller d → Higher σ_y → Entire curve shifts up
- Elastic modulus E: Constant (210 GPa for all steel)
- Strain at failure: Similar (~0.15-0.20)

**Expected R² results**:
- Strain: Still dominant (R² ≈ 0.7) - fundamental σ-ε relationship
- Carbon content: NOW non-zero (R² ≈ 0.2-0.4)
- Grain size (1/√d): NOW non-zero (R² ≈ 0.1-0.3)
- Combined model: Even better

**Key learning**: Material parameters only emerge with multi-specimen data!

---


In [None]:
# Generate 30 tensile tests with varying material properties
# Necking rate parameter (same as single specimen)
alpha = 10.0
np.random.seed(42)

n_specimens = 30
n_points_per_test = 50
all_data = []

for spec_id in range(n_specimens):
    carbon = np.random.uniform(carbon_range[0], carbon_range[1])  # wt%
    grain_size = np.random.uniform(grain_size_range[0], grain_size_range[1])  # μm

    base_strength = base_strength  # use config value
    carbon_effect = carbon * carbon_effect_coeff
    hall_petch = hall_petch_coeff / np.sqrt(grain_size)

    sigma_y_spec = base_strength + carbon_effect + hall_petch  # yield strength influenced by C and grain size
    sigma_u_spec = sigma_y_spec * 1.5  # ultimate taken as 1.5× yield

    E_spec = 210e9
    K_spec = sigma_u_spec * 1.2  # adjust K for moderate hardening
    n_spec = 0.25
    eps_y_spec = sigma_y_spec / E_spec
    eps_u_spec = 0.20 + np.random.normal(0, 0.015)  # uniform elongation with variation

    strains = np.linspace(0, eps_u_spec * 1.1, n_points_per_test)

    for eps in strains:
        if eps < eps_y_spec:
            stress = E_spec * eps
        elif eps <= eps_u_spec:
            eps_p = eps - eps_y_spec
            stress = sigma_y_spec + K_spec * eps_p**n_spec
        else:
            sigma_u = sigma_y_spec + K_spec * (eps_u_spec - eps_y_spec)**n_spec
            stress = sigma_u * np.exp(-alpha * (eps - eps_u_spec))

        stress += np.random.normal(0, noise_level)

        all_data.append({
            'specimen_id': spec_id,
            'strain': eps,
            'stress': stress,
            'carbon_content': carbon,
            'grain_size_um': grain_size,
            'grain_size_inv_sqrt': 1 / np.sqrt(grain_size),
            'yield_strength': sigma_y_spec
        })

df_multi = pd.DataFrame(all_data)

print(f"Generated {n_specimens} specimens, {len(df_multi)} total points")
print(f"Carbon: {df_multi['carbon_content'].min():.2f}-{df_multi['carbon_content'].max():.2f}%")
print(f"Yield: {df_multi['yield_strength'].min()/1e6:.0f}-{df_multi['yield_strength'].max()/1e6:.0f} MPa")

Visualize all 30 stress-strain curves to observe material variability across specimens.

In [None]:
# Visualize the 30 stress-strain curves
plt.figure(figsize=(12, 7))

for spec_id in range(n_specimens):
    spec_data = df_multi[df_multi['specimen_id'] == spec_id]
    carbon = spec_data['carbon_content'].iloc[0]

    plt.plot(spec_data['strain'], spec_data['stress'] / 1e6,
             alpha=0.6, linewidth=1.5,
             color=plt.cm.viridis(carbon / 0.30))

plt.xlabel('Strain', fontsize=12)
plt.ylabel('Stress (MPa)', fontsize=12)
plt.title(f'{n_specimens} Tensile Tests: Varying Carbon Content and Grain Size',
          fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)

# Add colorbar
sm = plt.cm.ScalarMappable(cmap=plt.cm.viridis,
                           norm=plt.Normalize(vmin=0.10, vmax=0.30))
sm.set_array([])
cbar = plt.colorbar(sm, ax=plt.gca())
cbar.set_label('Carbon Content (%)', fontsize=11)

plt.tight_layout()
plt.show()

print("\n📊 Observation:")
print("   Higher carbon → Higher yield strength → Shifted curves")
print("   This variation allows ML to learn material parameter effects")

Compute R² scores for each feature across multiple specimens to quantify their predictive power.

In [None]:
# R² analysis on multi-specimen dataset
features_multi = ['strain', 'carbon_content', 'grain_size_um', 'grain_size_inv_sqrt']

r2_scores_multi = {}
for feature in features_multi:
    X = df_multi[[feature]].values
    y = df_multi['stress'].values

    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)
    r2_scores_multi[feature] = r2_score(y, y_pred)

r2_df_multi = pd.DataFrame(list(r2_scores_multi.items()),
                           columns=['Feature', 'R² Score'])
r2_df_multi = r2_df_multi.sort_values('R² Score', ascending=False)

# Comparison plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Single specimen (from earlier r2_df)
colors1 = ['green' if x > 0.5 else 'orange' if x > 0.1 else 'red'
           for x in r2_df['R² Score']]
ax1.barh(r2_df['Feature'], r2_df['R² Score'], color=colors1,
         edgecolor='black', alpha=0.8)
ax1.set_xlabel('R² Score', fontsize=11)
ax1.set_title('Single Specimen\n(Constant properties)',
             fontsize=12, fontweight='bold')
ax1.set_xlim([0, 1])
ax1.grid(axis='x', alpha=0.3)

# Multi specimen
colors2 = ['green' if x > 0.5 else 'orange' if x > 0.1 else 'red'
           for x in r2_df_multi['R² Score']]
ax2.barh(r2_df_multi['Feature'], r2_df_multi['R² Score'], color=colors2,
         edgecolor='black', alpha=0.8)
ax2.set_xlabel('R² Score', fontsize=11)
ax2.set_title(f'{n_specimens} Specimens\n(Varying properties)',
             fontsize=12, fontweight='bold')
ax2.set_xlim([0, 1])
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("KEY INSIGHT: The Single-Specimen Trap")
print("="*70)

# Get strain R² from single specimen
strain_r2_single = r2_df[r2_df['Feature']=='strain']['R² Score'].values[0]

print(f"\nSingle specimen:")
print(f"  Strain R² = {strain_r2_single:.3f}")
print(f"  Material params R² ≈ 0 (constant in single test)")

print(f"\n{n_specimens} specimens:")
print(f"  Strain R² = {r2_scores_multi['strain']:.3f}")
print(f"  Carbon R² = {r2_scores_multi['carbon_content']:.3f}")
print(f"  Grain size R² = {r2_scores_multi['grain_size_inv_sqrt']:.3f}")

print("\n💡 Material parameters only emerge with multi-specimen data!")
print("="*70)

---

## 1.3 Linear Regression Framework

### Model Definition

Linear regression fits a hyperplane to minimize prediction error:

$$h_\theta(x) = \theta_0 + \sum_{j=1}^{n} \theta_j x_j$$

**Note**: "Linear" refers to parameters θ, not features x. We can model nonlinear relationships using polynomial features: x → [x, x², x³, ...].

### Cost Function

Mean squared error (MSE) quantifies model performance:

$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$

### Optimization via Gradient Descent

Iteratively update parameters to minimize J(θ):

$$\theta_j := \theta_j - \alpha \frac{\partial J}{\partial \theta_j}$$

where:
- α = learning rate (step size)
- $\frac{\partial J}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$

**Learning rate selection**:
- Too small → slow convergence
- Too large → oscillation or divergence
- Optimal → rapid convergence to global minimum (convex J(θ))

---

### What to Expect

**What we will do**:
- Interactively adjust learning rate α
- Watch cost function J(θ) decrease over iterations
- Observe convergence behavior

**Expected outcomes**:
- α too small → Slow, steady convergence
- α optimal → Rapid convergence in ~20-50 iterations
- α too large → Oscillation or divergence

**Goal**: Find α that balances speed and stability.

---


### Interactive Gradient Descent Demonstration

We will visualize how gradient descent works by watching the algorithm find the optimal parameters. First, set the learning rate using the slider below.

In [None]:
# Generate demonstration data for gradient descent
x_gd = np.linspace(0, 10, 20)
y_gd = 2.0 * x_gd + 1.0 + np.random.normal(0, 2.0, size=len(x_gd))

def run_gradient_descent(alpha, max_iters=100):
    """Simulate gradient descent for simple linear regression, with overflow/NaN guards.
    Keeps return signature unchanged and preserves list lengths for plotting."""
    # Fixed seed for reproducible demo - prevents random appearance on slider moves
    np.random.seed(42)
    theta0 = float(np.random.randn())
    theta1 = float(np.random.randn())

    m = len(x_gd)

    cost_history = []
    theta0_history = [theta0]
    theta1_history = [theta1]

    # Soft ceiling to avoid axis/ticker overflows downstream
    CLIP_MAX = 1e20

    for iteration in range(max_iters):
        # Predictions
        y_pred = theta0 + theta1 * x_gd

        # Cost (MSE/2) using stable dot product
        err = y_pred - y_gd
        cost = float((err @ err) / (2.0 * m))

        # Guard against non-finite or absurdly large costs
        if (not np.isfinite(cost)) or (cost <= 0.0) or (cost > CLIP_MAX):
            cost_history.append(np.nan)
            # pad remaining to keep plotting code happy
            remaining = max_iters - (iteration + 1)
            if remaining > 0:
                cost_history.extend([np.nan] * remaining)
                theta0_history.extend([theta0] * remaining)
                theta1_history.extend([theta1] * remaining)
            break

        cost_history.append(cost)

        # Gradients
        grad_theta0 = float(err.mean())
        grad_theta1 = float((err * x_gd).mean())

        # Update steps with basic sanity checks
        step0 = alpha * grad_theta0
        step1 = alpha * grad_theta1

        if (not np.isfinite(step0)) or (not np.isfinite(step1)) \
           or (abs(step0) > CLIP_MAX) or (abs(step1) > CLIP_MAX):
            # Divergence detected; pad and stop
            remaining = max_iters - (iteration + 1)
            if remaining > 0:
                cost_history.extend([np.nan] * remaining)
                theta0_history.extend([theta0] * remaining)
                theta1_history.extend([theta1] * remaining)
            break

        # Parameter update
        theta0 = float(theta0 - step0)
        theta1 = float(theta1 - step1)

        theta0_history.append(theta0)
        theta1_history.append(theta1)

    return theta0, theta1, cost_history, theta0_history, theta1_history


# Interactive gradient descent with live cost visualization
def interactive_gd_with_plot(alpha_log):
    alpha = 10**alpha_log
    clear_output(wait=True)

    # Run gradient descent
    theta0_final, theta1_final, cost_history, theta0_hist, theta1_hist = run_gradient_descent(alpha, max_iters=100)

    # Create figure with 2 subplots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

    # Left: Cost vs iteration
    valid_costs = [c for c in cost_history if np.isfinite(c) and c > 0]
    if len(valid_costs) > 0:
        iterations = range(len(valid_costs))
        ax1.plot(iterations, valid_costs, 'b-', linewidth=2, marker='o', markersize=4)
        ax1.set_xlabel('Iteration', fontsize=12)
        ax1.set_ylabel('Cost J(θ)', fontsize=12)
        ax1.set_title(f'Cost Function Convergence\nLearning Rate α = {alpha:.4f}',
                      fontsize=13, fontweight='bold')
        ax1.grid(True, alpha=0.3)

        # Use log scale if cost varies over orders of magnitude
        if len(valid_costs) > 1 and valid_costs[0] / valid_costs[-1] > 100:
            ax1.set_yscale('log')

        # Check convergence status
        # First check for divergence (NaN appeared OR cost exploded)
        if len(valid_costs) < len(cost_history):
            status = "✗ DIVERGED"
            color = 'red'
        # Check if cost exploded (increased by 100x from minimum)
        elif len(valid_costs) > 10:
            min_cost = min(valid_costs)
            if valid_costs[-1] > min_cost * 100:
                status = "✗ DIVERGED"
                color = 'red'
            # Check for good convergence
            elif valid_costs[-1] < valid_costs[0] * 0.01:
                status = "✓ CONVERGED"
                color = 'green'
            # Ran full iterations but didn't converge well
            elif len(valid_costs) >= 100:
                status = "⚠ SLOW"
                color = 'orange'
            # Stopped early without divergence
            else:
                status = "⚠ STOPPED EARLY"
                color = 'orange'
        # Too few iterations to determine
        else:
            status = "⚠ STOPPED EARLY"
            color = 'orange'

        ax1.text(0.5, 0.95, status, transform=ax1.transAxes,
                 fontsize=14, fontweight='bold', color=color,
                 ha='center', va='top',
                 bbox=dict(boxstyle='round,pad=0.5', facecolor='white', edgecolor=color, linewidth=2))

    # Right: Data with fitted line
    ax2.scatter(x_gd, y_gd, color='orange', s=80, edgecolors='k',
                zorder=3, label='Data', alpha=0.7)

    if np.isfinite(theta0_final) and np.isfinite(theta1_final):
        x_line = np.linspace(x_gd.min(), x_gd.max(), 100)
        y_line = theta0_final + theta1_final * x_line
        ax2.plot(x_line, y_line, 'g-', linewidth=3, alpha=0.8,
                 label=f'Fit: y = {theta1_final:.2f}x + {theta0_final:.2f}')

    ax2.set_xlabel('x', fontsize=12)
    ax2.set_ylabel('y', fontsize=12)
    ax2.set_title('Fitted Line', fontsize=13, fontweight='bold')
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Print summary
    print(f"Learning rate α = {alpha:.4f}")
    if len(valid_costs) > 0:
        print(f"Final cost: {valid_costs[-1]:.2f}")
        print(f"Iterations: {len(valid_costs)}")
        print(f"Cost reduction: {(1 - valid_costs[-1]/valid_costs[0])*100:.1f}%")
    else:
        print("DIVERGED - No valid cost values")

# Widget with better range
slider_alpha = widgets.FloatSlider(
    value=-0.5, min=-2, max=0.5, step=0.1,
    description='log₁₀(α):',
    continuous_update=False,
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='600px')
)

print("🎮 Interactive Gradient Descent with Live Cost Plot")
print("="*60)
print("Adjust learning rate and observe:")
print("  • Cost function convergence (left plot)")
print("  • Fitted line evolution (right plot)")
print("\nTry different values:")
print("  • log₁₀(α) = -2.0  →  α = 0.01  (too small)")
print("  • log₁₀(α) = -0.5  →  α = 0.32  (good)")
print("  • log₁₀(α) =  0.5  →  α = 3.16  (too large)")
print("="*60)
print()
interact(interactive_gd_with_plot, alpha_log=slider_alpha)


Define a group-aware data splitting function to prevent specimen data leakage during cross-validation.

In [None]:
# Group-aware train/test split helper (prevents specimen leakage)
def grouped_split(df, X_cols, y_col, group_col='specimen_id', test_size=0.25, seed=42):
    """
    Split data by groups (specimens) to avoid leakage.

    Points from the same specimen share chemistry and microstructure.
    Random row splits would leak this information into the test set.

    Parameters:
    -----------
    df : DataFrame
        Source data
    X_cols : list of str
        Feature column names
    y_col : str
        Target column name
    group_col : str
        Column defining groups (default: 'specimen_id')
    test_size : float
        Fraction for test set (default: 0.25)
    seed : int
        Random seed

    Returns:
    --------
    X_train, X_test, y_train, y_test, train_idx, test_idx
    """
    from sklearn.model_selection import GroupShuffleSplit

    groups = df[group_col].values
    gss = GroupShuffleSplit(n_splits=1, test_size=test_size, random_state=seed)
    train_idx, test_idx = next(gss.split(df, groups=groups))

    X = df[X_cols].values
    y = df[y_col].values

    return X[train_idx], X[test_idx], y[train_idx], y[test_idx], train_idx, test_idx

---

## 1.4 Linear Model on Multi-Specimen Data: Underfitting

### Hypothesis: Linear Model with Strain Only

Now that we have 30 specimens with varying material properties, we will test the simplest model:

$$\sigma = \theta_0 + \theta_1 \varepsilon$$

**This model intentionally ignores**:
- Carbon content (C)
- Grain size (d)
- Nonlinearity (ε², ε³, ...)

### What to Expect

**Underfitting in TWO ways**:
1. **Linear** assumption: Stress-strain is nonlinear (elastic + plastic + necking)
2. **Missing features**: Model ignores C and d, which we KNOW affect strength

**Prediction**: Low R² because model is too simple in multiple dimensions.

---

> **⚠️ Common Pitfall: Row-Level Splits**
>
> Points from the same specimen share chemistry and microstructure. Using `train_test_split()` on individual rows can place some points from Specimen A in training and others in testing. The model then memorizes specimen-specific patterns instead of learning true material physics.
>
> **Solution**: Always split by `specimen_id` using grouped splits to ensure entire specimens stay together in either train or test sets.

In [None]:
# Fit simple linear regression on 30-specimen data
# Model: σ = θ₀ + θ₁·ε (ONLY strain, ignoring C and d)

# Split data using grouped_split to prevent data leakage
X_train_simple, X_test_simple, y_train_simple, y_test_simple, *_ = grouped_split(
    df_multi, ['strain'], 'stress', test_size=0.2, seed=42
)

# Fit model
model_simple_multi = LinearRegression()
model_simple_multi.fit(X_train_simple, y_train_simple)

# Predictions
y_train_pred_simple = model_simple_multi.predict(X_train_simple)
y_test_pred_simple = model_simple_multi.predict(X_test_simple)

# Metrics
r2_train_simple = r2_score(y_train_simple, y_train_pred_simple)
r2_test_simple = r2_score(y_test_simple, y_test_pred_simple)
rmse_train_simple = np.sqrt(mean_squared_error(y_train_simple, y_train_pred_simple))
rmse_test_simple = np.sqrt(mean_squared_error(y_test_simple, y_test_pred_simple))

print("=" * 70)
print("LINEAR MODEL ON 30 SPECIMENS (Strain only)")
print("=" * 70)
print(f"Model: σ = {model_simple_multi.coef_[0]/1e9:.2f} GPa × ε + {model_simple_multi.intercept_/1e6:.1f} MPa")
print()
print(f"Train R²: {r2_train_simple:.4f}")
print(f"Test R²:  {r2_test_simple:.4f}")
print(f"Train RMSE: {rmse_train_simple/1e6:.2f} MPa")
print(f"Test RMSE:  {rmse_test_simple/1e6:.2f} MPa")
print("=" * 70)
print()
print("🔴 SEVERE UNDERFITTING CONFIRMED")
print("Why are the R² scores so low?")
print("  • The linear model is too simple for the nonlinear stress-strain data.")
print("  • It completely ignores the significant effects of carbon and grain size,")
print("    which cause large variations in yield strength across specimens.")
print("=" * 70)

### What to Expect

**What we will do**:
- Apply L2 regularization to the underfitting linear model
- Test λ from 0 to 1000

**Expected outcome**:
- Performance will get **WORSE** as λ increases
- Why? Regularization reduces overfitting, not underfitting
- Our linear model is already too simple

**Key lesson**: Regularization helps complex models, not simple ones.

---


> **💡 What This Plot Shows**
>
> - **Left panel**: The shaded band shows the interquartile range over 30 specimens. Each colored line represents the same linear model (strain-only) with a different λ penalty.
> - **As λ grows**: The slope shrinks toward zero while the intercept remains free (we use `fit_intercept=True`).
> - **Why test R² falls**: This model is already too simple (underfitting). Regularization helps when you have *enough capacity*; here we are starving a starving model.
>
> **Key Insight**: Ridge regression penalizes large coefficients but will not fix a fundamentally underfit model. You need more features first.

### Visualizing Regularization Effect

We will plot how different regularization strengths affect the fitted line.

In [None]:
# --- 1.5 Visualizing Regularization Effect ---

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

# ===== 1) Choose a λ grid that actually does something =====
m = X_train_simple.shape[0]
lambda_values = np.r_[0.0, np.logspace(np.log10(m*1e-3), np.log10(m*1e3), 8)]

# ===== 2) Recompute metrics for strain-only model (with intercept) =====
rows = []
for lam in lambda_values:
    pipe = Pipeline([
        ("scaler", StandardScaler(with_mean=True, with_std=True)),
        ("ridge",  Ridge(alpha=float(lam), fit_intercept=True))
    ])
    pipe.fit(X_train_simple, y_train_simple)

    ytr = pipe.predict(X_train_simple)
    yte = pipe.predict(X_test_simple)

    r2_tr = r2_score(y_train_simple, ytr)
    r2_te = r2_score(y_test_simple,  yte)
    rmse_te = np.sqrt(mean_squared_error(y_test_simple, yte)) / 1e6

    scaler = pipe.named_steps["scaler"]
    ridge  = pipe.named_steps["ridge"]
    w_std  = float(ridge.coef_[0])
    b_std  = float(ridge.intercept_)
    sigmaX = float(scaler.scale_[0])
    muX    = float(scaler.mean_[0])

    slope_orig = w_std / sigmaX
    intercept_orig = b_std - w_std * muX / sigmaX

    rows.append({"lambda": lam, "R² train": r2_tr, "R² test": r2_te,
                 "RMSE test (MPa)": rmse_te,
                 "slope (GPa)": slope_orig / 1e9,
                 "intercept (MPa)": intercept_orig / 1e6})

df_reg_multi = pd.DataFrame(rows)
# Add slope column in MPa for consistent plotting
df_reg_multi['slope (MPa)'] = df_reg_multi['slope (GPa)'] * 1000


# ===== 3) Empirical pooled reference (median + IQR) across SPECIMENS =====
n_bins = 60
bins = np.linspace(df_multi['strain'].min(), df_multi['strain'].max(), n_bins + 1)
cats = pd.cut(df_multi['strain'], bins, include_lowest=True)

def q25(x): return np.percentile(x, 25)
def q75(x): return np.percentile(x, 75)

ref = (df_multi.assign(bin=cats)
       .groupby('bin', observed=False)
       .agg(strain_med=('strain','median'),
            stress_med=('stress','median'),
            stress_q25=('stress', q25),
            stress_q75=('stress', q75))
       .dropna())

# ===== 4) Plot: left = pooled "truth" vs ridge lines; right = metrics + slope/bias vs λ =====
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Left: empirical pooled "truth"
ax1.plot(ref['strain_med'], ref['stress_med']/1e6, 'k-', lw=2, alpha=0.9, label='Empirical median (30 specimens)')
ax1.fill_between(ref['strain_med'], ref['stress_q25']/1e6, ref['stress_q75']/1e6,
                 color='gray', alpha=0.15, label='IQR across specimens')

# Ridge lines with intercept across calibrated λ grid
x_fit = np.linspace(df_multi['strain'].min(), df_multi['strain'].max(), 300).reshape(-1, 1)
colors = plt.cm.viridis(np.linspace(0, 1, len(lambda_values)))
for lam, color in zip(lambda_values, colors):
    pipe = Pipeline([
        ("scaler", StandardScaler(with_mean=True, with_std=True)),
        ("ridge",  Ridge(alpha=float(lam), fit_intercept=True))
    ])
    pipe.fit(X_train_simple, y_train_simple)
    y_line = pipe.predict(x_fit)
    lab = f"λ={lam:.3g}" if lam > 0 else "λ=0 (OLS)"
    ax1.plot(x_fit.ravel(), y_line/1e6, color=color, lw=2, alpha=0.9, label=lab)

ax1.set_xlabel('Strain')
ax1.set_ylabel('Stress (MPa)')
ax1.set_title('Ridge on underfitting model (strain only, bias enabled)')
# Legend positioned for clarity
ax1.legend(fontsize=8, ncol=2, loc='lower right')
ax1.grid(True, alpha=0.3)

# Right: R² on primary axis, other metrics on secondary axis
# Plot the lambda=0 case at a small value on the log scale for visibility
plot_lambda = df_reg_multi['lambda'].replace(0, 1e-1)

# R² on left axis
ax2.set_xlabel('Regularization Strength (λ)')
ax2.set_title('Performance & parameter shrinkage vs λ (test set)')
ax2.grid(True, which='both', alpha=0.3)
l1 = ax2.semilogx(plot_lambda, df_reg_multi['R² test'], 'o-', lw=2, ms=7, color='tab:blue', label='R² (test)')
ax2.set_ylabel('R² Score (Test Set)', color='tab:blue')
ax2.tick_params(axis='y', labelcolor='tab:blue')

# MPa metrics on right axis
ax2b = ax2.twinx()
l2 = ax2b.semilogx(plot_lambda, df_reg_multi['RMSE test (MPa)'], 's--', lw=2, ms=7, color='tab:orange', label='RMSE (MPa, test)')
l3 = ax2b.semilogx(plot_lambda, df_reg_multi['slope (MPa)'], '^-.', lw=2, ms=6, color='tab:green', label='Slope (MPa)')
l4 = ax2b.semilogx(plot_lambda, df_reg_multi['intercept (MPa)'], 'v-.', lw=2, ms=6, color='tab:red', label='Intercept (MPa)')
ax2b.set_ylabel('Value in Megapascals (MPa)', color='tab:red')
ax2b.tick_params(axis='y', labelcolor='tab:red')


# Combined legend
lines = l1 + l2 + l3 + l4
labels = [l.get_label() for l in lines]
ax2.legend(lines, labels, loc='upper right')

plt.tight_layout()
plt.show()

print("="*80)
print("REGULARIZATION ON UNDERFITTING MODEL (strain-only, bias enabled; λ grid ~ m)")
print("="*80)
print(df_reg_multi.drop(columns=['slope (GPa)']).to_string(index=False)) # Display table without GPa column
print("="*80)



### What to Expect

**What we will do**:
- Test polynomial degrees 1 to 20
- Track train vs test R²
- Identify bias-variance tradeoff

**Expected outcomes**:
- d=1,2 → Underfitting (low R² on both train and test)
- d=3-5 → Sweet spot (high R², small train-test gap)
- d>8 → Overfitting risk (large train-test gap)

**Goal**: Find optimal complexity balancing bias and variance.

---


---

## 1.6 Polynomial Features + Material Parameters Regression: Capturing Nonlinear Behavior

### Why Polynomial Features?

The stress-strain relationship in materials is fundamentally nonlinear:
- **Elastic region**: σ ∝ ε (linear)
- **Plastic region**: σ ∝ ε^n where n ≈ 0.2-0.5 (power law)
- **Necking region**: σ decreases (negative curvature)

To capture this with linear regression, we use **polynomial features**: create new features like ε², ε³, ε⁴, and fit them linearly.

**Polynomial hypothesis**:
$$\sigma = \theta_0 + \theta_1 \varepsilon + \theta_2 \varepsilon^2 + \theta_3 \varepsilon^3 + ... + \theta_d \varepsilon^d$$

This is still **linear regression** because the parameters θ appear linearly, even though the features are nonlinear functions of ε.

### The Overfitting Risk

With too many polynomial terms (high degree d), the model can fit the training data perfectly but fail to generalize:
- **Low degree (d=1,2)**: Underfitting (too simple)
- **Medium degree (d=3,4,5)**: Good balance (captures physics)
- **High degree (d>8)**: Often but not always overfitting (fits noise, not signal)

This is where **regularization becomes useful**: it prevents high-degree polynomials from overfitting by shrinking coefficients.

### Our Strategy

1. Fit polynomial models of increasing degree (d=1 to 20)
2. Compare performance WITHOUT regularization
3. Apply L2 regularization to prevent overfitting
4. Find the optimal degree-regularization combination

---

> **📊 Bias-Variance Tradeoff**
>
> As model complexity increases (higher polynomial degree):
> - **Training error decreases**: More flexible model fits training data better
> - **Test error first decreases, then increases**: The "sweet spot" balances bias and variance
>
> **Three regimes**:
> 1. **High bias (underfitting)**: Model too simple, both train/test R² are low
> 2. **Balanced**: Model captures true patterns, train/test R² are similar and high
> 3. **High variance (overfitting)**: Model memorizes noise, train R² high but test R² drops
>
> **Materials insight**: Polynomial features on strain capture the nonlinear stress-strain relationship (elastic → plastic → necking), while material features (C, d) account for specimen-to-specimen variation.

In [None]:
# Polynomial Features with Material Parameters
# Compare strain-only vs strain+material features

from sklearn.preprocessing import PolynomialFeatures

# Split multi-specimen data using grouped_split to prevent specimen leakage
X_multi = df_multi[['strain', 'carbon_content', 'grain_size_inv_sqrt']].values
y_multi = df_multi['stress'].values

X_train_multi, X_test_multi, y_train_multi, y_test_multi, _, _ = grouped_split(
    df_multi,
    ['strain', 'carbon_content', 'grain_size_inv_sqrt'],
    'stress',
    test_size=0.25,
    seed=42
)

# Store results for both approaches
results_poly_multi = []
results_bias_variance = []  # For bias-variance tradeoff

max_degree = 8  # reduced to avoid extreme overfitting

for degree in range(1, max_degree + 1):
    # APPROACH A: Polynomial features on strain only
    # Use the SAME split as X_train_multi by extracting strain column
    X_train_strain = X_train_multi[:, 0:1]  # Extract strain column from grouped split
    X_test_strain = X_test_multi[:, 0:1]    # Extract strain column from grouped split

    poly_strain = PolynomialFeatures(degree=degree, include_bias=False)
    X_train_poly_strain = poly_strain.fit_transform(X_train_strain)
    X_test_poly_strain = poly_strain.transform(X_test_strain)

    model_strain = LinearRegression()
    model_strain.fit(X_train_poly_strain, y_train_multi)
    r2_test_strain = model_strain.score(X_test_poly_strain, y_test_multi)
    r2_train_strain = model_strain.score(X_train_poly_strain, y_train_multi)

    # APPROACH B: Polynomial features on strain + material features
    poly_full = PolynomialFeatures(degree=degree, include_bias=False)
    X_train_poly_full = poly_full.fit_transform(X_train_multi)
    X_test_poly_full = poly_full.transform(X_test_multi)

    model_full = LinearRegression()
    model_full.fit(X_train_poly_full, y_train_multi)
    r2_test_full = model_full.score(X_test_poly_full, y_test_multi)
    r2_train_full = model_full.score(X_train_poly_full, y_train_multi)

    # Calculate RMSE using full model
    y_pred_train_full = model_full.predict(X_train_poly_full)
    y_pred_test_full = model_full.predict(X_test_poly_full)
    rmse_train_full = np.sqrt(np.mean((y_train_multi - y_pred_train_full)**2)) / 1e6
    rmse_test_full = np.sqrt(np.mean((y_test_multi - y_pred_test_full)**2)) / 1e6

    # Store comparison results
    results_poly_multi.append({
        'Degree': degree,
        'R² Test (Strain Only)': r2_test_strain,
        'R² Test (Strain+C+d)': r2_test_full,
        'Improvement': r2_test_full - r2_test_strain
    })

    # Store bias-variance results (using strain+materials approach)
    results_bias_variance.append({
        'Degree': degree,
        'R² Train': r2_train_full,
        'R² Test': r2_test_full,
        'RMSE Train (MPa)': rmse_train_full,  # Uses full model
        'RMSE Test (MPa)': rmse_test_full      # Uses full model
    })

# Create dataframes
df_poly_comp = pd.DataFrame(results_poly_multi)
df_poly = pd.DataFrame(results_bias_variance)

print("\n📊 POLYNOMIAL COMPARISON (30 specimens)")
print("=" * 60)
print(df_poly_comp.to_string(index=False))
print(f"\n💡 Material features improve R² by {df_poly_comp['Improvement'].mean():.3f} on average")

Visualize the comparison between strain-only and strain+material feature models across polynomial degrees.

In [None]:
# Visualize the comparison
fig, ax = plt.subplots(figsize=(12, 7))

ax.plot(df_poly_comp['Degree'], df_poly_comp['R² Test (Strain Only)'],
        'b-o', linewidth=2, markersize=8, label='Strain Only', alpha=0.7)
ax.plot(df_poly_comp['Degree'], df_poly_comp['R² Test (Strain+C+d)'],
        'r-s', linewidth=2, markersize=8, label='Strain + C + d', alpha=0.7)

# Fill between to show improvement
ax.fill_between(df_poly_comp['Degree'],
                df_poly_comp['R² Test (Strain Only)'],
                df_poly_comp['R² Test (Strain+C+d)'],
                alpha=0.2, color='green', label='Improvement from materials')

ax.set_xlabel('Polynomial Degree', fontsize=12)
ax.set_ylabel('Test R²', fontsize=12)
ax.set_title('Impact of Material Features on Model Performance\n30 Specimens',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='lower right')
ax.grid(True, alpha=0.3)
ax.set_ylim([0.5, 1.0])
ax.axhline(0.9, color='green', linestyle='--', alpha=0.3, label='Excellent fit')

plt.tight_layout()
plt.show()

print("\n📊 Visualization shows:")
print("   • Blue line: Strain-only model (ignores material properties)")
print("   • Red line: Complete model (includes C and d)")
print("   • Green area: Performance gain from material features")
print("\n💡 Material parameters consistently improve predictions!")


### Analysis of Catastrophic Overfitting

The result above reveals a critical insight: while adding material features to a low-degree polynomial model (d=1 to 4) significantly improves performance, the same features cause a catastrophic failure in high-degree models (d=5+), resulting in large negative R² scores.

**Why does this happen? The Curse of Dimensionality.**

When we apply `PolynomialFeatures` to the combined input `[strain, carbon, grain_size_inv_sqrt]`, the function generates not only powers of each feature (`strain²`, `carbon²`) but also all **interaction terms** (`strain*carbon`, `strain²*carbon`, etc.).

- **Degree 2**: 9 features
- **Degree 5**: 55 features
- **Degree 8**: 164 features

A high-degree polynomial on all three inputs creates an explosion of features. The model gains so much flexibility that it begins to fit the random noise in the training data perfectly, a phenomenon known as **overfitting**. When this over-specialized model is shown new test data, its predictions are wildly inaccurate, leading to an R² score far below zero.

**The Physics-Informed Solution**: The underlying physics does not suggest that interactions like `carbon³ * strain⁵` are meaningful. The primary nonlinearity is in the stress-strain relationship. A much better approach is to:
1. Apply polynomial features **only to the strain column**.
2. Combine these with the **linear** material property features.

This targeted feature engineering adds complexity only where physics suggests it is needed, leading to more robust and accurate models. This is the approach used in the interactive tool that follows.

### Visualizing the Bias-Variance Tradeoff

We will plot how model performance changes with polynomial degree using the **strain + material features (C, d)** model. This demonstrates the classic **bias-variance tradeoff**:

**What to expect:**
- **Low degrees (1-3)**: **Underfitting** - Model too simple, both train and test R² are low
- **Sweet spot (~3-6)**: **Good fit** - High R², small gap between train and test
- **High degrees (10+)**: **Overfitting** - Training R² → 1.0, but test R² plateaus or drops. The growing gap shows the model is memorizing training data rather than learning patterns.

**Key insight**: Even with material features, we can still overfit by making the model too complex!


In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
degrees = list(range(1, 16))  # Set x-axis from 1 to 15
ax1.plot(df_poly['Degree'], df_poly['R² Train'], 'o-', linewidth=2,
         markersize=8, label='Training R²', color='blue')
ax1.plot(df_poly['Degree'], df_poly['R² Test'], 's-', linewidth=2,
         markersize=8, label='Test R²', color='red')
ax1.fill_between(df_poly['Degree'], df_poly['R² Test'], df_poly['R² Train'],
                  alpha=0.2, color='orange', label='Overfitting gap')
ax1.set_xlabel('Polynomial Degree', fontsize=12)
ax1.set_ylabel('R² Score', fontsize=12)
ax1.set_title('Bias-Variance Tradeoff\nR² vs Model Complexity',
              fontsize=13, fontweight='bold')
ax1.legend(fontsize=10, loc='best')
ax1.grid(True, alpha=0.3)
ax1.axhline(0.95, color='green', linestyle='--', alpha=0.3)
ax1.set_xticks(degrees)
ax2.plot(df_poly['Degree'], df_poly['RMSE Train (MPa)'], 'o-', linewidth=2,
         markersize=8, label='Training RMSE', color='blue')
ax2.plot(df_poly['Degree'], df_poly['RMSE Test (MPa)'], 's-', linewidth=2,
         markersize=8, label='Test RMSE', color='red')
ax2.set_xlabel('Polynomial Degree', fontsize=12)
ax2.set_ylabel('RMSE (MPa)', fontsize=12)
ax2.set_title('Prediction Error\nRMSE vs Model Complexity',
              fontsize=13, fontweight='bold')
ax2.legend(fontsize=10, loc='best')
ax2.grid(True, alpha=0.3)
ax2.set_xticks(degrees)
plt.tight_layout()
plt.show()
optimal_idx = df_poly['R² Test'].idxmax()
optimal_degree = df_poly.loc[optimal_idx, 'Degree']
optimal_r2 = df_poly.loc[optimal_idx, 'R² Test']
print(f"\n🏆 Optimal polynomial degree: {optimal_degree}")
print(f"   Test R²: {optimal_r2:.4f}")
print(f"   Interpretation: Degree {optimal_degree} balances bias and variance optimally")

### What to Expect

Before we dive into ridge regularization on high‑degree polynomials, we will clarify the goal. We will train a high‑degree (degree 10) polynomial model on the multi‑specimen dataset using different values of the regularization strength λ.  We will then plot training and test R² as a function of λ, and inspect how the model parameters (slope and intercept) shrink with increasing λ. This helps you see how regularization controls overfitting in flexible polynomial models.

### Regularization for High-Degree Polynomials

Now we will see how L2 regularization helps when we use high-degree polynomials. We will test degree 20 (which showed overfitting) with different regularization strengths.

In [None]:
# --- Regularization for High-Degree Polynomials (degree=15) on multi-specimen data ---

from sklearn.pipeline import Pipeline

test_degree = 15  # A high degree to demonstrate overfitting
lambda_values_ridge = [0, 1e-5, 1e-4, 1e-3, 1e-2, 0.1, 1.0] # A more focused lambda grid

# Use the same grouped split from the previous section
X_train_poly_ridge, X_test_poly_ridge, y_train_poly_ridge, y_test_poly_ridge, _, _ = grouped_split(
    df_multi,
    ['strain', 'carbon_content', 'grain_size_inv_sqrt'],
    'stress',
    test_size=0.25,
    seed=42
)

def pipe_template(alpha):
    return make_poly_ridge_pipeline(test_degree, alpha)

results_ridge = []
for lam in lambda_values_ridge:
    # Use a tiny alpha instead of 0 for the "unregularized" case to avoid singularity
    alpha = 1e-9 if lam == 0 else lam

    pipe = pipe_template(alpha)
    pipe.fit(X_train_poly_ridge, y_train_poly_ridge)

    ytr = pipe.predict(X_train_poly_ridge)
    yte = pipe.predict(X_test_poly_ridge)

    r2_tr = r2_score(y_train_poly_ridge, ytr)
    r2_te = r2_score(y_test_poly_ridge,  yte)

    results_ridge.append({
        "λ": lam, # Report the intended lambda (0 for OLS)
        "R² Train": r2_tr,
        "R² Test": r2_te,
        "Gap": r2_tr - r2_te
    })

df_ridge = pd.DataFrame(results_ridge)
print(f"📊 Regularization Effect on Degree-{test_degree} Polynomial")
print("=" * 70)
print(df_ridge.to_string(index=False))
print("=" * 70)

# Plotting the results
plt.figure(figsize=(8, 6))
plt.semilogx(df_ridge["λ"].replace(0, 1e-7), df_ridge["R² Train"], 'o-', label='Train R²') # Plot lambda=0 at a small value
plt.semilogx(df_ridge["λ"].replace(0, 1e-7), df_ridge["R² Test"], 's-', label='Test R²')
plt.xlabel('Regularization Strength (λ)')
plt.ylabel('R² Score')
plt.title(f'Degree {test_degree}: R² vs. Regularization')
plt.grid(True, which='both', alpha=0.3)
plt.legend()
plt.ylim(-0.5, 1.1) # Adjust ylim to see the negative R² for the unregularized case
plt.tight_layout()
plt.show()


### Visualizing Regularization Effect on Polynomials

We will plot the fitted curves with and without regularization to see the difference.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# SELECT ONE SPECIMEN to visualize
specimen_id = int(df_multi['specimen_id'].median())
specimen_data = df_multi[df_multi['specimen_id'] == specimen_id]
specimen_C = specimen_data['carbon_content'].iloc[0]
specimen_d_inv = specimen_data['grain_size_inv_sqrt'].iloc[0]

print(f"\n📍 Visualizing regularization effect for Specimen {specimen_id}")
print(f"   Carbon content: {specimen_C:.4f}")
print(f"   Grain size (1/√d): {specimen_d_inv:.2f}")

# Fine grid for smooth curves
strain_grid = np.linspace(specimen_data['strain'].min(), specimen_data['strain'].max(), 500).reshape(-1, 1)

# Create feature grid: [strain, C, 1/√d] for the SAME specimen
C_grid = np.full_like(strain_grid, specimen_C)
d_inv_grid = np.full_like(strain_grid, specimen_d_inv)
X_grid_full = np.hstack([strain_grid, C_grid, d_inv_grid])

# Use the same degree as the previous cell
test_degree = 15

# Build and fit three models at different λ - trained on ALL specimens
# Use a tiny alpha for the "no regularization" case to avoid singularity
pipe_no_reg = make_poly_ridge_pipeline(test_degree, 1e-9)
pipe_mid    = make_poly_ridge_pipeline(test_degree, 0.01)   # Optimal regularization
pipe_strong = make_poly_ridge_pipeline(test_degree, 100.0)  # Strong regularization

pipe_no_reg.fit(X_train_poly_ridge, y_train_poly_ridge)
pipe_mid.fit(X_train_poly_ridge, y_train_poly_ridge)
pipe_strong.fit(X_train_poly_ridge, y_train_poly_ridge)

# Predictions on grid
y_no_reg  = pipe_no_reg.predict(X_grid_full)
y_optimal = pipe_mid.predict(X_grid_full)
y_strong  = pipe_strong.predict(X_grid_full)

# Plot - show ONLY the selected specimen's data
plt.figure(figsize=(12, 7))

plt.scatter(specimen_data['strain'], specimen_data['stress'] / 1e6,
            color='orange', alpha=0.8, s=80, edgecolors='k',
            label=f'Specimen {specimen_id} data', zorder=4)

plt.plot(strain_grid, y_no_reg / 1e6, 'r--', linewidth=2.5,
         label=f'Degree {test_degree}, λ≈0 (Overfitting)', alpha=0.9, zorder=3)
plt.plot(strain_grid, y_optimal / 1e6, 'g-', linewidth=3.0,
         label=f'Degree {test_degree}, λ=0.01 (Intermediate Fit)', alpha=0.9, zorder=3)
plt.plot(strain_grid, y_strong / 1e6, 'b-.', linewidth=2.5,
         label=f'Degree {test_degree}, λ=100 (Underfitting)', alpha=0.9, zorder=3)

plt.xlabel('Strain', fontsize=13)
plt.ylabel('Stress (MPa)', fontsize=13)
plt.title(f'Effect of L2 Regularization on Degree-{test_degree} Polynomial\n'
          f'Predicting for Specimen {specimen_id}',
          fontsize=13, fontweight='bold')
plt.legend(fontsize=11, loc='lower right')
plt.grid(True, alpha=0.3)
plt.ylim(bottom=0) # Ensure y-axis starts at 0 for stress
plt.tight_layout()
plt.show()

print(f"\n💡 Key insight:")
print(f"   • The unregularized model (red) is wildly unstable due to overfitting.")
print(f"   • A small amount of regularization (λ=0.01, green) stabilizes the fit dramatically.")
print(f"   • Too much regularization (λ=100, blue) over-simplifies the model, causing it to underfit.")

---

## 1.7 Interactive Feature Selection and Modeling

### Combining Materials Science with Machine Learning

Now you can experiment with different feature combinations to see how they affect model performance. This interactive tool lets you:
- **Select features**: Choose which material parameters to include (strain, carbon_content, grain_size_inv_sqrt)
- **Choose polynomial degree**: Control the complexity of strain features (ε, ε², ε³, ...)
- **Apply regularization**: Prevent overfitting with L2 penalty (Ridge regression)
- **Visualize results**: See the model fit and performance metrics

**Guidance from materials science**:
- **Strain** should always be included (fundamental σ-ε relationship)
- **Carbon content** affects strength through solid solution strengthening
- **Grain size (1/√d)** affects strength via Hall-Petch relationship
- Higher polynomial degrees capture plastic deformation and necking
- Regularization becomes important for high-degree polynomials

**What to explore**:
1. Start with strain only, degree 1 → See linear underfitting
2. Add polynomial features (degree 3-5) → Captures nonlinearity
3. Include material features (C, d) → Improves predictions across specimens
4. Increase regularization for high degrees → Prevents overfitting

Try different combinations and observe:
- Which features improve R²?
- When does adding complexity help vs hurt?
- How does regularization affect the bias-variance tradeoff?

In [None]:
# --- Controls ---
feature_options = ['strain', 'carbon_content', 'grain_size_inv_sqrt']
feature_checkboxes = [widgets.Checkbox(value=(f == 'strain'), description=f) for f in feature_options]

degree_slider = widgets.IntSlider(
    value=3, min=1, max=15, step=1,
    description='Polynomial degree:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px')
)
lambda_slider = widgets.FloatLogSlider(
    value=1e-2, base=10, min=-6, max=3, step=0.1,
    description='Regularization λ:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px')
)
button = widgets.Button(
    description="📊 Make Prediction",
    button_style='success',
    layout=widgets.Layout(width='200px', height='40px')
)
out = widgets.Output()

# -- nuke stacked handlers if the cell is re-run --
try:
    button._click_handlers.callbacks = []
except Exception:
    pass

def interactive_multi_specimen_model(_=None):
    # reentrancy guard: ignore accidental duplicate clicks/buggy double events
    if getattr(button, "_locked", False):
        return
    button._locked = True
    button.disabled = True
    try:
        with out:
            out.clear_output(wait=True)

            selected_features = [cb.description for cb in feature_checkboxes if cb.value]
            if not selected_features:
                print("⚠️  Please select at least one feature!")
                return

            degree = int(degree_slider.value)

            # Degree clamping when strain is not selected
            if 'strain' not in selected_features:
                degree = 1

            reg_lambda = float(lambda_slider.value)

            # Use grouped_split to prevent specimen leakage
            X_train, X_test, y_train, y_test, _, _ = grouped_split(
                df_multi, selected_features, 'stress', test_size=0.25, seed=42
            )

            pipe = Pipeline([
                ('poly', PolynomialFeatures(degree=degree, include_bias=False)),
                ('scaler', StandardScaler()),
                ('ridge', Ridge(alpha=reg_lambda, fit_intercept=True))
            ])
            pipe.fit(X_train, y_train)

            y_train_pred = pipe.predict(X_train)
            y_test_pred  = pipe.predict(X_test)

            r2_train = r2_score(y_train, y_train_pred)
            r2_test  = r2_score(y_test,  y_test_pred)
            rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred)) / 1e6
            gap = r2_train - r2_test
            n_features_expanded = pipe.named_steps['poly'].n_output_features_

            print(f"Features: {', '.join(selected_features)} | Degree: {degree} | λ: {reg_lambda:.1e} | Total features: {n_features_expanded}")
            print(f"R² Train: {r2_train:.4f} | R² Test: {r2_test:.4f} | Gap: {gap:.4f} | RMSE: {rmse_test:.1f} MPa")
            if gap > 0.1:
                print("⚠️ Overfitting — increase λ or decrease degree")
            elif r2_test < 0.7:
                print("⚠️ Underfitting — add features or increase degree")
            else:
                print("✅ Good balance")

            # Create figure with a unique approach to prevent double display
            from matplotlib.figure import Figure
            fig = Figure(figsize=(15, 6))
            axes = fig.subplots(1, 2)

            # Use the Figure directly instead of plt.subplots to avoid pyplot global state
            ax1, ax2 = axes

            # Left: y_true vs y_pred
            ax1.scatter(y_test/1e6, y_test_pred/1e6, alpha=0.6, s=50, edgecolors='k', linewidth=0.5)
            lims = [min(y_test.min(), y_test_pred.min())/1e6, max(y_test.max(), y_test_pred.max())/1e6]
            ax1.plot(lims, lims, 'r--', linewidth=2, alpha=0.7, label='Perfect')
            ax1.set_xlabel('True Stress (MPa)', fontsize=12)
            ax1.set_ylabel('Predicted Stress (MPa)', fontsize=12)
            ax1.set_title(f'Test Set: R² = {r2_test:.4f}', fontsize=13, fontweight='bold')
            ax1.legend(fontsize=10)
            ax1.grid(True, alpha=0.3)
            ax1.set_aspect('equal', adjustable='box')

            # Right: specimen curve if strain present, otherwise residuals
            if 'strain' in selected_features:
                specimen_id = int(df_multi["specimen_id"].median())
                d = df_multi[df_multi['specimen_id'] == specimen_id]
                fixed = {f: d[f].iloc[0] for f in selected_features}
                strain_grid = np.linspace(d['strain'].min(), d['strain'].max(), 500)
                if len(selected_features) == 1:
                    Xg = strain_grid.reshape(-1, 1)
                else:
                    cols = [strain_grid.reshape(-1, 1) if f == 'strain'
                            else np.full((500, 1), fixed[f]) for f in selected_features]
                    Xg = np.hstack(cols)
                yg = pipe.predict(Xg)
                ax2.scatter(d['strain'], d['stress']/1e6, color='orange', alpha=0.7,
                            s=60, edgecolors='k', label=f'Specimen {specimen_id}', zorder=4)
                ax2.plot(strain_grid, yg/1e6, linewidth=2.5, label='Model', zorder=3)
                ax2.set_xlabel('Strain', fontsize=12)
                ax2.set_ylabel('Stress (MPa)', fontsize=12)
                ax2.set_title(f'Specimen {specimen_id} Fit', fontsize=13, fontweight='bold')
                ax2.legend(fontsize=10, loc='lower right')
            else:
                residuals = (y_test - y_test_pred)/1e6
                ax2.hist(residuals, bins=30, alpha=0.7, edgecolor='black')
                ax2.axvline(0, linestyle='--', linewidth=2)
                ax2.set_xlabel('Residuals (MPa)', fontsize=12)
                ax2.set_ylabel('Frequency', fontsize=12)
                ax2.set_title('Residual Distribution', fontsize=13, fontweight='bold')

            ax2.grid(True, alpha=0.3)
            fig.tight_layout()

            # Display the figure - this will now only show once
            display(fig)

    finally:
        button._locked = False
        button.disabled = False

button.on_click(interactive_multi_specimen_model)

# UI
feature_box = widgets.VBox([widgets.HTML("<b>Select Features:</b>"),
                            widgets.HBox(feature_checkboxes)])
controls_box = widgets.VBox([feature_box, degree_slider, lambda_slider, button, out])
display(controls_box)


---

# Conclusions: Part 1

---

## Key Takeaways:

1. **Physics-aware features provide persistent value**: Adding `carbon_content` and `grain_size_inv_sqrt` gave a consistent R² lift across all polynomial degrees, because they encode specimen-to-specimen differences that strain alone cannot capture.

2. **Regularization helps capacity-rich models, not underfit ones**: Ridge regression is powerful when you have enough features to overfit. Applying it to a too-simple linear model (strain-only) does not improve performance - it just shrinks an already-inadequate slope.

3. **Group-aware validation is non-negotiable**: Points from the same specimen share chemistry and microstructure. Random row splits leak this information and inflate test metrics. Always use `GroupShuffleSplit` or similar grouped cross-validation for multi-curve materials data.

These principles generalize beyond stress-strain curves to any materials dataset with specimen-level heterogeneity.

---


---

# Part 2: Logistic Regression for Multi-Axial Yield Prediction

---

## 2.1 Multi-Axial Loading and Failure Criteria

### Engineering Problem

Structural components experience complex stress states (σ₁, σ₂, σ₃) from combined loading. Predicting yield initiation requires:
1. A scalar effective stress σ_eff from the stress tensor
2. A failure criterion: σ_eff ≥ σ_y

### Binary Classification Formulation

**Input**: Principal stresses (σ₁, σ₂, σ₃)
**Output**: Material state ∈ {Elastic (0), Plastic (1)}

### Physics-Informed Approach

The von Mises criterion provides a theoretical decision boundary. Our goal: determine if logistic regression can learn this nonlinear boundary from experimental data alone.

**Key Questions**:
1. Can logistic regression discover the quadratic von Mises relationship?
2. How well does the learned boundary match theoretical predictions?
3. How should decision thresholds be selected for safety-critical applications?

---

## 2.2 Von Mises Yield Criterion

### Mathematical Formulation

For a general 3D stress state with **principal stresses** σ₁, σ₂, σ₃ (stresses in the directions where shear stress is zero), the von Mises equivalent stress is:

$$\sigma_{VM} = \sqrt{\frac{1}{2}[(\sigma_1 - \sigma_2)^2 + (\sigma_2 - \sigma_3)^2 + (\sigma_3 - \sigma_1)^2]}$$

For **2D plane stress** (thin plates where σ₃ = 0), this simplifies to:

$$\sigma_{VM} = \sqrt{\sigma_1^2 - \sigma_1 \sigma_2 + \sigma_2^2}$$

**Physical interpretation**:
- σ_VM represents the "effective" stress
- It measures the magnitude of distortion energy in the material
- Materials yield when this effective stress exceeds the yield strength

### Yield Condition

The material yields when:
$$\sigma_{VM} \geq \sigma_y$$

where σ_y is the uniaxial yield strength (from a simple tensile test).

### Decision Boundary Formulation

Rearranging the yield condition as a decision boundary:
$$f(\sigma_1, \sigma_2) = \sigma_1^2 - \sigma_1 \sigma_2 + \sigma_2^2 - \sigma_y^2$$

**Interpretation**:
- **f < 0**: Inside yield surface → Elastic (Class 0)
- **f = 0**: On yield surface → At yield point
- **f > 0**: Outside yield surface → Plastic (Class 1)

### Geometric Shape

In the σ₁-σ₂ stress space, this equation defines an **ellipse** centered at the origin. This is called the **yield surface** or **yield locus**.

**Key properties**:
- Symmetric about both axes (material yields equally in tension and compression)
- Passes through (σ_y, 0) and (0, σ_y) points
- Maximum stress σ_y occurs in uniaxial loading
- Smaller yield stress required when both stresses are applied simultaneously

### Before We Code: Predictions

What should we expect when we generate experimental data and fit a machine learning model?

1. **Data distribution**: Points should cluster inside (elastic) and outside (plastic) the theoretical ellipse
2. **Model learning**: Logistic regression should discover the quadratic relationship (σ₁², σ₁σ₂, σ₂²)
3. **Decision boundary**: Learned boundary should approximate the von Mises ellipse
4. **Feature importance**: σ₁², σ₁σ₂, σ₂² should have the strongest coefficients

We will test these predictions empirically!

---

## 2.3 Multi-Axial Loading Data Generation

### Experimental Setup

We simulate a series of biaxial tension tests where specimens are loaded with different combinations of principal stresses σ₁ and σ₂. For each test, we record whether the specimen yielded.

**Data generation strategy**:
1. Sample stress combinations uniformly in σ₁-σ₂ space
2. Calculate von Mises equivalent stress for each combination
3. Label as yield (1) if σ_VM > σ_y, elastic (0) otherwise
4. Add measurement noise to simulate real experimental variability

This creates a realistic dataset that mirrors actual multi-axial testing campaigns.

In [None]:
# Generate biaxial loading data (using sigma_y_failure from config)
n_samples = 500
np.random.seed(42)

# Sample principal stresses uniformly
sigma1_data = np.random.uniform(-1.5 * sigma_y_failure, 1.5 * sigma_y_failure, n_samples)
sigma2_data = np.random.uniform(-1.5 * sigma_y_failure, 1.5 * sigma_y_failure, n_samples)

# Calculate von Mises stress
def von_mises_stress(s1, s2):
    """Calculate von Mises equivalent stress for plane stress (sigma_3=0)."""
    return np.sqrt(s1**2 - s1*s2 + s2**2)

sigma_vm_data = von_mises_stress(sigma1_data, sigma2_data)

# Label data: 1 if yielded (sigma_VM > sigma_y), 0 otherwise
yield_labels = (sigma_vm_data > sigma_y_failure).astype(int)

# Add some experimental noise to the boundary
# In real experiments, there is variability in when yielding is detected
noise_factor = 0.05  # 5% uncertainty in yield detection
for i in range(n_samples):
    # For points near the boundary, add some classification noise
    distance_to_boundary = abs(sigma_vm_data[i] - sigma_y_failure) / sigma_y_failure

    if distance_to_boundary < noise_factor:
        # Flip label with some probability for boundary points
        if np.random.rand() < 0.2:
            yield_labels[i] = 1 - yield_labels[i]

# Create DataFrame
df_failure = pd.DataFrame({
    'sigma1': sigma1_data,
    'sigma2': sigma2_data,
    'sigma_vm': sigma_vm_data,
    'yielded': yield_labels
})

# Summary statistics
n_elastic = (yield_labels == 0).sum()
n_plastic = (yield_labels == 1).sum()

print("✅ Multi-axial loading data generated")
print("=" * 60)
print(f"   Total experiments: {n_samples}")
print(f"   Elastic (Class 0): {n_elastic} ({n_elastic/n_samples*100:.1f}%)")
print(f"   Yielded (Class 1): {n_plastic} ({n_plastic/n_samples*100:.1f}%)")
print(f"   Material yield strength: {sigma_y_failure/1e6:.0f} MPa")
print("=" * 60)
print("\nFirst 5 rows:")
print(df_failure.head())


### Visualizing the Experimental Data

We will plot the experimental data along with the theoretical von Mises yield surface. This gives us a visual sense of how well real data aligns with theory.

In [None]:
# Create theoretical von Mises ellipse
theta = np.linspace(0, 2*np.pi, 200)

# Parametric equation for von Mises ellipse in 2D plane stress
# sigma1^2 - sigma1*sigma2 + sigma2^2 = sigma_y^2
# This can be parameterized as:

# Rotation angle for von Mises ellipse

# Generate ellipse points (more accurate von Mises representation)
t = np.linspace(0, 2*np.pi, 300)
r = sigma_y_failure / np.sqrt(1 - 0.5*np.sin(2*t))  # von Mises in polar form
sigma1_ellipse = r * np.cos(t)
sigma2_ellipse = r * np.sin(t)

# Visualization
fig, ax = plt.subplots(figsize=(10, 10))

# Separate elastic and plastic points
elastic_mask = df_failure['yielded'] == 0
plastic_mask = df_failure['yielded'] == 1

# Plot data points
ax.scatter(df_failure.loc[elastic_mask, 'sigma1'] / 1e6,
           df_failure.loc[elastic_mask, 'sigma2'] / 1e6,
           c='blue', alpha=0.6, s=40, edgecolors='k', linewidth=0.5,
           label='Elastic (Class 0)', marker='o')

ax.scatter(df_failure.loc[plastic_mask, 'sigma1'] / 1e6,
           df_failure.loc[plastic_mask, 'sigma2'] / 1e6,
           c='red', alpha=0.6, s=40, linewidth=0.5,
           label='Yielded (Class 1)', marker='x')

# Plot theoretical von Mises ellipse
ax.plot(sigma1_ellipse / 1e6, sigma2_ellipse / 1e6,
        'k--', linewidth=3, alpha=0.8, label='Theoretical von Mises boundary')

# Mark axes
ax.axhline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)
ax.axvline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)

# Mark uniaxial yield points
ax.plot([sigma_y_failure/1e6, sigma_y_failure/1e6],
        [0, 0], 'go', markersize=10, label='Uniaxial yield points')
ax.plot([0, 0], [sigma_y_failure/1e6, sigma_y_failure/1e6], 'go', markersize=10)
ax.plot([-sigma_y_failure/1e6, -sigma_y_failure/1e6], [0, 0], 'go', markersize=10)
ax.plot([0, 0], [-sigma_y_failure/1e6, -sigma_y_failure/1e6], 'go', markersize=10)

ax.set_xlabel('Principal Stress σ₁ (MPa)', fontsize=13)
ax.set_ylabel('Principal Stress σ₂ (MPa)', fontsize=13)
ax.set_title('Experimental Data with Theoretical von Mises Yield Surface\n' +
             f'Yield Strength = {sigma_y_failure/1e6:.0f} MPa',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='upper right')
ax.grid(True, alpha=0.3)
ax.set_aspect('equal', adjustable='box')

# Set reasonable limits
limit = 1.5 * sigma_y_failure / 1e6
ax.set_xlim([-limit, limit])
ax.set_ylim([-limit, limit])

plt.tight_layout()
plt.show()

print("\n📊 Visual Interpretation:")
print("   - Blue circles: Material remains elastic (no permanent deformation)")
print("   - Red crosses: Material yielded (permanent deformation)")
print("   - Dashed ellipse: Theoretical prediction from von Mises criterion")
print("   - Green dots: Uniaxial yield points (±σ_y on each axis)")
print("\n💡 Observations:")
print("   - Data clusters inside (elastic) and outside (plastic) the ellipse")
print("   - Some scatter near boundary due to experimental noise")
print("   - Shape is symmetric: material yields equally in all directions")


---

## 2.4 Interactive Von Mises Stress Calculator

### Understanding the Yield Surface

Before we build a machine learning model, we will develop intuition for how the von Mises criterion works. Use this interactive tool to:
- Move a point in σ₁-σ₂ stress space
- See the calculated von Mises equivalent stress
- Observe whether the material would yield at that stress state
- Visualize distance from the yield surface

**Instructions**: Drag the sliders to set σ₁ and σ₂. Watch how σ_VM changes and whether the point is inside or outside the yield ellipse.

In [None]:
def interactive_von_mises(sigma1_mpa, sigma2_mpa):
    """Interactive von Mises stress calculator with visualization."""
    clear_output(wait=True)

    # Convert to Pa
    sigma1 = sigma1_mpa * 1e6
    sigma2 = sigma2_mpa * 1e6

    # Calculate von Mises stress
    sigma_vm = von_mises_stress(sigma1, sigma2)

    # Check yield condition
    yielded = sigma_vm > sigma_y_failure
    safety_factor = sigma_y_failure / sigma_vm

    # Decision boundary value
    f_boundary = sigma1**2 - sigma1*sigma2 + sigma2**2 - sigma_y_failure**2
    boundary_status = "(INSIDE - Elastic)" if f_boundary < 0 else "(OUTSIDE - Plastic)"

    # Yield prediction label
    yield_prediction = "    MATERIAL YIELDS (Class 1)" if yielded else "    MATERIAL IS ELASTIC (Class 0)"

    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

    # Left plot: Stress space with yield surface
    ax1.plot(sigma1_ellipse / 1e6, sigma2_ellipse / 1e6,
             'k--', linewidth=2, alpha=0.8, label='von Mises yield surface')

    # Current point
    color = 'red' if yielded else 'blue'
    marker = 'x' if yielded else 'o'
    state_label = "YIELDED" if yielded else "ELASTIC"
    ax1.plot(sigma1_mpa, sigma2_mpa, marker, markersize=20,
             markeredgewidth=3, color=color,
             label=f"Current state: {state_label}")

    # Line from origin to point
    ax1.plot([0, sigma1_mpa], [0, sigma2_mpa], 'g-', linewidth=2, alpha=0.5)

    # Radial line to yield surface
    if sigma1**2 + sigma2**2 > 0:
        scale = sigma_y_failure / von_mises_stress(sigma1, sigma2)
        sigma1_boundary = sigma1 * scale / 1e6
        sigma2_boundary = sigma2 * scale / 1e6
        ax1.plot([sigma1_mpa, sigma1_boundary],
                 [sigma2_mpa, sigma2_boundary],
                 'r--', linewidth=2, alpha=0.7, label='Distance to boundary')

    ax1.axhline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)
    ax1.axvline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)

    ax1.set_xlabel('σ₁ (MPa)', fontsize=12)
    ax1.set_ylabel('σ₂ (MPa)', fontsize=12)
    ax1.set_title('Principal Stress Space', fontsize=13, fontweight='bold')
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal', adjustable='box')
    limit = 500
    ax1.set_xlim([-limit, limit])
    ax1.set_ylim([-limit, limit])

    # Right plot: Information panel
    ax2.axis('off')

    # Build clean info text
    info_text = "\n" + "=" * 60 + "\n"
    info_text += " VON MISES STRESS ANALYSIS\n"
    info_text += "=" * 60 + "\n\n"

    info_text += " INPUT STRESS STATE:\n"
    info_text += f"   σ₁ = {sigma1_mpa:.1f} MPa\n"
    info_text += f"   σ₂ = {sigma2_mpa:.1f} MPa\n\n"

    info_text += " CALCULATED VON MISES STRESS:\n"
    info_text += f"   σ_VM = {sigma_vm/1e6:.1f} MPa\n\n"

    info_text += " MATERIAL YIELD STRENGTH:\n"
    info_text += f"   σ_y = {sigma_y_failure/1e6:.0f} MPa\n\n"

    info_text += " DECISION BOUNDARY VALUE:\n"
    info_text += f"   f(σ₁,σ₂) = {f_boundary/1e12:.2f} × 10¹² Pa²\n"
    info_text += f"   {boundary_status}\n\n"

    info_text += " SAFETY FACTOR:\n"
    info_text += f"   SF = {safety_factor:.2f}\n\n"

    info_text += " YIELD PREDICTION:\n"
    info_text += f" {yield_prediction}\n\n"

    info_text += "-" * 60 + "\n"
    info_text += " INTERPRETATION\n"
    info_text += "-" * 60 + "\n"

    if yielded:
        info_text += f" Equivalent stress ({sigma_vm/1e6:.1f} MPa) EXCEEDS\n"
        info_text += f" yield strength ({sigma_y_failure/1e6:.0f} MPa).\n"
        info_text += " → PERMANENT DEFORMATION WILL OCCUR\n"
    else:
        info_text += f" Equivalent stress ({sigma_vm/1e6:.1f} MPa) is BELOW\n"
        info_text += f" yield strength ({sigma_y_failure/1e6:.0f} MPa).\n"
        info_text += " → Material remains ELASTIC\n"

    info_text += f"\n Safety margin: {abs(safety_factor - 1)*100:.1f}%\n"
    info_text += "=" * 60 + "\n"

    ax2.text(0.05, 0.5, info_text, fontsize=10, family='monospace',
             verticalalignment='center', transform=ax2.transAxes,
             bbox=dict(boxstyle='round,pad=1', facecolor='lightblue' if not yielded else 'lightcoral',
                      edgecolor='black', linewidth=2))

    plt.tight_layout()
    plt.show()

# Create interactive sliders
sigma1_slider = widgets.FloatSlider(
    value=200, min=-450, max=450, step=10,
    description='σ₁ (MPa):', continuous_update=False,
    style={'description_width': 'initial'}, layout=widgets.Layout(width='500px')
)

sigma2_slider = widgets.FloatSlider(
    value=150, min=-450, max=450, step=10,
    description='σ₂ (MPa):', continuous_update=False,
    style={'description_width': 'initial'}, layout=widgets.Layout(width='500px')
)

print("🎮 Interactive von Mises Stress Calculator")
print("   Move the sliders to explore different stress states!")
print()
interact(interactive_von_mises, sigma1_mpa=sigma1_slider, sigma2_mpa=sigma2_slider)

---

## 2.5 Manual Decision Boundary Optimization

### Understanding the Learning Problem

In machine learning, we do not manually position the decision boundary. Instead, the algorithm learns it from data by minimizing a **cost function**. But before we let the algorithm do this automatically, we will manually try to find a good boundary ourselves.

**Your challenge**: Move and rotate the ellipse to minimize misclassification cost. This hands-on experience will help you understand:
- What makes one decision boundary better than another
- How cost functions quantify model performance
- Why optimization algorithms are needed for complex problems

**Cost function preview**: For each point, the cost increases if:
- Elastic point (blue) is classified as yielded (outside ellipse)
- Yielded point (red) is classified as elastic (inside ellipse)

Try to minimize the total cost displayed in the title!

In [None]:
def manual_boundary_optimizer(scale_factor, rotation_deg):
    """Manual optimization of decision boundary by moving/rotating ellipse."""
    clear_output(wait=True)

    # Transform ellipse with current parameters
    rotation_rad = np.deg2rad(rotation_deg)

    # Original von Mises ellipse points
    t = np.linspace(0, 2*np.pi, 300)
    r_original = sigma_y_failure / np.sqrt(1 - 0.5*np.sin(2*t))

    # Scale the ellipse
    r_scaled = r_original * scale_factor

    # Apply rotation
    sigma1_transformed = r_scaled * np.cos(t + rotation_rad)
    sigma2_transformed = r_scaled * np.sin(t + rotation_rad)

    # Calculate cost (misclassification count)
    # For each data point, check if it is correctly classified
    cost = 0
    correct_classifications = 0

    for _, row in df_failure.iterrows():
        s1, s2 = row['sigma1'], row['sigma2']
        true_label = row['yielded']

        # Rotate point back to ellipse frame
        s1_rot = s1 * np.cos(-rotation_rad) - s2 * np.sin(-rotation_rad)
        s2_rot = s1 * np.sin(-rotation_rad) + s2 * np.cos(-rotation_rad)

        # Check if inside transformed ellipse
        vm_transformed = von_mises_stress(s1_rot, s2_rot) / scale_factor
        predicted_label = 1 if vm_transformed > sigma_y_failure else 0

        # Calculate cost (using log-loss style)
        if predicted_label == true_label:
            correct_classifications += 1
        else:
            cost += 1  # Simple misclassification count

    accuracy = correct_classifications / len(df_failure)

    # Visualization
    fig, ax = plt.subplots(figsize=(11, 11))

    # Plot data
    elastic_mask = df_failure['yielded'] == 0
    plastic_mask = df_failure['yielded'] == 1

    ax.scatter(df_failure.loc[elastic_mask, 'sigma1'] / 1e6,
               df_failure.loc[elastic_mask, 'sigma2'] / 1e6,
               c='blue', alpha=0.6, s=40, edgecolors='k', linewidth=0.5,
               label='Elastic (Class 0)', marker='o', zorder=3)

    ax.scatter(df_failure.loc[plastic_mask, 'sigma1'] / 1e6,
               df_failure.loc[plastic_mask, 'sigma2'] / 1e6,
               c='red', alpha=0.6, s=40, linewidth=0.5,
               label='Yielded (Class 1)', marker='x', zorder=3)

    # Plot original von Mises ellipse (gray, dashed)
    ax.plot(sigma1_ellipse / 1e6, sigma2_ellipse / 1e6,
            'gray', linestyle=':', linewidth=2, alpha=0.5,
            label='Original von Mises', zorder=1)

    # Plot transformed ellipse (green, solid)
    ax.plot(sigma1_transformed / 1e6, sigma2_transformed / 1e6,
            'g-', linewidth=3, alpha=0.8,
            label=f'Adjusted boundary (Scale={scale_factor:.2f}, Rot={rotation_deg:.0f}°)',
            zorder=2)

    ax.axhline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)
    ax.axvline(0, color='gray', linestyle='-', linewidth=0.5, alpha=0.5)

    ax.set_xlabel('σ₁ (MPa)', fontsize=12)
    ax.set_ylabel('σ₂ (MPa)', fontsize=12)
    ax.set_title(f'Manual Boundary Optimization\n' +
                 f'Cost (Misclassifications): {cost}   |   Accuracy: {accuracy*100:.1f}%   |   ' +
                 f'Goal: Minimize Cost!',
                 fontsize=13, fontweight='bold')
    ax.legend(fontsize=10, loc='upper right')
    ax.grid(True, alpha=0.3)
    ax.set_aspect('equal', adjustable='box')

    limit = 500
    ax.set_xlim([-limit, limit])
    ax.set_ylim([-limit, limit])

    plt.tight_layout()
    plt.show()

    # Feedback
    if cost < 20:
        print("🏆 Excellent! Cost < 20: You've found a nearly optimal boundary!")
    elif cost < 50:
        print("✅ Good work! Cost < 50: Boundary fits data reasonably well.")
    elif cost < 100:
        print("🟡 Moderate fit. Cost < 100: Try adjusting scale or rotation.")
    else:
        print("🔴 High cost. Try: scale ≈ 1.0, rotation ≈ 0° for von Mises data.")

    print(f"\n📊 Current configuration:")
    print(f"   Scale factor: {scale_factor:.2f}")
    print(f"   Rotation: {rotation_deg:.0f}°")
    print(f"   Misclassifications: {cost} / {len(df_failure)}")
    print(f"   Accuracy: {accuracy*100:.1f}%")

# Create sliders
scale_slider = widgets.FloatSlider(
    value=1.0, min=0.5, max=1.5, step=0.05,
    description='Scale factor:', continuous_update=False,
    style={'description_width': 'initial'}, layout=widgets.Layout(width='500px')
)

rotation_slider = widgets.FloatSlider(
    value=0, min=-45, max=45, step=5,
    description='Rotation (deg):', continuous_update=False,
    style={'description_width': 'initial'}, layout=widgets.Layout(width='500px')
)

print("🎮 Manual Decision Boundary Optimization")
print("   Move and rotate the ellipse to minimize misclassification cost!")
print("   The green line is your adjusted boundary.")
print()
interact(manual_boundary_optimizer, scale_factor=scale_slider, rotation_deg=rotation_slider)


---

## 2.6 Logistic Regression for Binary Classification

### From Linear to Probabilistic Prediction

Logistic regression maps linear combinations to probabilities via the sigmoid function:

$$h_\theta(x) = \sigma(\theta^T x) = \frac{1}{1 + e^{-(\theta^T x)}}$$

**Properties**:
- Output ∈ (0, 1): Interpretable as P(y=1|x)
- Decision boundary: {x : θᵀx = 0}
- Smooth gradient everywhere (unlike step function)

### Feature Engineering for Von Mises

Linear logistic regression cannot learn the nonlinear von Mises boundary. We use polynomial features:

**Original**: x = [σ₁, σ₂]
**Augmented**: x' = [1, σ₁, σ₂, σ₁², σ₁σ₂, σ₂²]

This allows the model to learn:
$$\theta_0 + \theta_1\sigma_1 + \theta_2\sigma_2 + \theta_3\sigma_1^2 + \theta_4\sigma_1\sigma_2 + \theta_5\sigma_2^2$$

**Expected coefficients** (from von Mises: σ₁² - σ₁σ₂ + σ₂² = σ_y²):
- θ₃ ≈ θ₅ > 0 (quadratic terms)
- θ₄ < 0 (interaction term)
- θ₁ ≈ θ₂ ≈ 0 (material isotropy)

### Cross-Entropy Loss

For binary classification, use log-loss instead of MSE:

$$J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(h_\theta(x^{(i)})) + (1-y^{(i)}) \log(1 - h_\theta(x^{(i)}))]$$

**Why cross-entropy?**
- MSE is non-convex for logistic regression
- Cross-entropy is convex → guaranteed global minimum
- Penalizes confident wrong predictions exponentially

### Gradient Descent

Update rule has identical form to linear regression:

$$\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$$

but h_θ(x) is now sigmoid-transformed.

---

### Visualizing Sigmoid and Cost Functions

Before training the model, we will visualize these key mathematical components.

In [None]:
# Create visualization of sigmoid and cost functions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Left plot: Sigmoid function
z_vals = np.linspace(-10, 10, 200)
sigmoid_vals = 1 / (1 + np.exp(-z_vals))

ax1.plot(z_vals, sigmoid_vals, 'b-', linewidth=3, label='σ(z) = 1/(1+e^(-z))')
ax1.axhline(0.5, color='red', linestyle='--', alpha=0.7, label='Decision threshold')
ax1.axvline(0, color='gray', linestyle='--', alpha=0.5)

# Mark key points
ax1.plot(0, 0.5, 'ro', markersize=10, label='σ(0) = 0.5')
ax1.plot([-5, 5], [sigmoid_vals[20], sigmoid_vals[180]], 'go', markersize=8)

ax1.set_xlabel('z = θᵀx (linear combination)', fontsize=12)
ax1.set_ylabel('σ(z) = P(y=1|x)', fontsize=12)
ax1.set_title('Sigmoid Function\nMaps linear output to probability [0,1]',
              fontsize=13, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_ylim([-0.1, 1.1])

# Right plot: Cost function components
h_vals = np.linspace(0.01, 0.99, 100)  # Predicted probabilities

cost_y1 = -np.log(h_vals)  # Cost when y=1
cost_y0 = -np.log(1 - h_vals)  # Cost when y=0

ax2.plot(h_vals, cost_y1, 'r-', linewidth=3, label='Cost when y=1: -log(h)')
ax2.plot(h_vals, cost_y0, 'b-', linewidth=3, label='Cost when y=0: -log(1-h)')

# Mark key points
ax2.axvline(0.5, color='gray', linestyle='--', alpha=0.5, label='Decision threshold')

ax2.set_xlabel('Predicted Probability h(x)', fontsize=12)
ax2.set_ylabel('Cost', fontsize=12)
ax2.set_title('Log-Loss Cost Function Components\nPenalizes confident wrong predictions',
              fontsize=13, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 5])

plt.tight_layout()
plt.show()

print("📊 Interpretation:")
print("\nLeft plot (Sigmoid):")
print("   - Maps linear combination z to probability [0, 1]")
print("   - z = 0 gives 50% probability (decision boundary)")
print("   - Large positive z → high confidence in Class 1")
print("   - Large negative z → high confidence in Class 0")
print("\nRight plot (Cost function):")
print("   - Red: Cost when true label is 1 (yielded)")
print("     → Low cost if we predict high probability (correct)")
print("     → High cost if we predict low probability (wrong)")
print("   - Blue: Cost when true label is 0 (elastic)")
print("     → Low cost if we predict low probability (correct)")
print("     → High cost if we predict high probability (wrong)")
print("\n💡 Key insight: Cost increases exponentially for confident wrong predictions!")

---

## 2.7 Model Training and Evaluation

### Feature Engineering for von Mises

To help logistic regression discover the quadratic von Mises relationship, we create polynomial features:

**Original features**: [σ₁, σ₂]
**Engineered features**: [1, σ₁, σ₂, σ₁², σ₁σ₂, σ₂²]

This allows the model to learn:
$$\theta_0 + \theta_1 \sigma_1 + \theta_2 \sigma_2 + \theta_3 \sigma_1^2 + \theta_4 \sigma_1\sigma_2 + \theta_5 \sigma_2^2$$

**Expected result**: The learned coefficients should approximate the von Mises formula:
$$\sigma_1^2 - \sigma_1\sigma_2 + \sigma_2^2 - \sigma_y^2 = 0$$

We will train the model and see!

In [None]:
# Prepare features and labels
X_failure = df_failure[['sigma1', 'sigma2']].values
y_failure = df_failure['yielded'].values

# Split into train and test sets
X_train_f, X_test_f, y_train_f, y_test_f = train_test_split(
    X_failure, y_failure, test_size=0.2, random_state=42, stratify=y_failure
)

# Standardize features (important for logistic regression)
scaler_failure = StandardScaler()
X_train_f_scaled = scaler_failure.fit_transform(X_train_f)
X_test_f_scaled = scaler_failure.transform(X_test_f)

# Create polynomial features (degree 2)
poly_failure = PolynomialFeatures(degree=2, include_bias=True)
X_train_f_poly = poly_failure.fit_transform(X_train_f_scaled)
X_test_f_poly = poly_failure.transform(X_test_f_scaled)

print("📊 Feature engineering:")
print(f"   Original features: {X_train_f.shape[1]}")
print(f"   Polynomial features: {X_train_f_poly.shape[1]}")
print(f"   Feature names: {poly_failure.get_feature_names_out(['σ1', 'σ2'])}")
print(f"\n📦 Data split:")
print(f"   Training samples: {len(X_train_f)} ({len(y_train_f[y_train_f==1])} yielded, {len(y_train_f[y_train_f==0])} elastic)")
print(f"   Test samples: {len(X_test_f)} ({len(y_test_f[y_test_f==1])} yielded, {len(y_test_f[y_test_f==0])} elastic)")

# Train logistic regression model
print("\n🔄 Training logistic regression model...")
model_failure = LogisticRegression(max_iter=1000, random_state=42)
model_failure.fit(X_train_f_poly, y_train_f)

# Predictions
y_train_pred_f = model_failure.predict(X_train_f_poly)
y_test_pred_f = model_failure.predict(X_test_f_poly)

y_train_prob_f = model_failure.predict_proba(X_train_f_poly)[:, 1]
y_test_prob_f = model_failure.predict_proba(X_test_f_poly)[:, 1]

# Evaluate performance
train_acc_f = accuracy_score(y_train_f, y_train_pred_f)
test_acc_f = accuracy_score(y_test_f, y_test_pred_f)

train_loss_f = log_loss(y_train_f, y_train_prob_f)
test_loss_f = log_loss(y_test_f, y_test_prob_f)

print("\n✅ Training complete!")
print("=" * 70)
print("📊 Model Performance:")
print("-" * 70)
print(f"   Training accuracy: {train_acc_f*100:.2f}%")
print(f"   Test accuracy: {test_acc_f*100:.2f}%")
print(f"   Training log-loss: {train_loss_f:.4f}")
print(f"   Test log-loss: {test_loss_f:.4f}")
print("=" * 70)

# Display learned coefficients
print("\n🔍 Learned Coefficients:")
print("=" * 70)

# Get feature names from polynomial features
feature_names = list(poly_failure.get_feature_names_out(['σ1', 'σ2']))

# Get all coefficients (intercept is stored separately in sklearn)
# model_failure.coef_[0] contains coefficients for all features INCLUDING the bias term from PolynomialFeatures
coefficients = model_failure.coef_[0]

# Print in a simple, clean format
print(f"{'Feature':<15} {'Coefficient':>15}")
print("-" * 70)
print(f"{'Intercept':<15} {model_failure.intercept_[0]:>15.6f}")
for fname, coef in zip(feature_names, coefficients):
    print(f"{fname:<15} {coef:>15.6f}")
print("=" * 70)

print("\n💡 Interpretation:")
print("   - Intercept: Shifts the decision boundary")
print("   - σ1, σ2: Linear terms (usually small for von Mises)")
print("   - σ1², σ2²: Quadratic terms (should be positive, ~equal)")
print("   - σ1·σ2: Interaction term (should be negative for ellipse)")
print("\n🎯 Expected pattern for von Mises:")
print("   σ1² - σ1·σ2 + σ2² - constant = 0")
print("   Our model should have σ1² ≈ σ2² > 0 and σ1·σ2 < 0")

# Confusion matrix
print("\n📊 Confusion Matrix (Test Set):")
cm = confusion_matrix(y_test_f, y_test_pred_f)
print("                 Predicted")
print("                 0 (Elastic)  1 (Yielded)")
print(f"Actual  0 (Elastic)    {cm[0,0]:4d}         {cm[0,1]:4d}")
print(f"        1 (Yielded)    {cm[1,0]:4d}         {cm[1,1]:4d}")

# Classification report
print("\n📋 Detailed Classification Report:")
print(classification_report(y_test_f, y_test_pred_f,
                          target_names=['Elastic (0)', 'Yielded (1)']))

In [None]:
# Degree-2 features on raw stresses for coefficient interpretability
print("\n" + "="*60)
print("COEFFICIENT INTERPRETABILITY: Raw Units (No Scaling)")
print("="*60)

# Train degree-2 logistic regression on RAW (unscaled) stresses for physical interpretation.
# Coefficients will differ from scaled model due to different feature scales,
# but the PATTERN (positive quadratic, negative cross-term) reveals the von Mises structure.
poly2_raw = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly_raw = poly2_raw.fit_transform(X_train_f)
X_test_poly_raw = poly2_raw.transform(X_test_f)

logit_raw = LogisticRegression(max_iter=1000, penalty="l2", C=1.0, solver="lbfgs")
logit_raw.fit(X_train_poly_raw, y_train_f)

names_raw = poly2_raw.get_feature_names_out(["σ1","σ2"])
print("\nCoefficients (raw units, degree-2):")
for name, coef in zip(["Intercept"] + list(names_raw),
                      np.r_[logit_raw.intercept_[0], logit_raw.coef_[0]]):
    print(f"  {name:>10}: {coef: .3e}")

print("\n💡 Notice:")
print("  - Quadratic terms (σ1², σ2²) have similar positive coefficients")
print("  - Cross term (σ1·σ2) has negative coefficient")
print("  - This matches von Mises pattern: σ1² - σ1·σ2 + σ2² ≈ const")
print("\nWe still use the scaled model for predictions (better numerics).")
print("="*60)


---

## 2.8 Comparing Learned vs Theoretical Boundaries

### Visualizing the Decision Boundary

Now we will visualize how well the logistic regression model learned the von Mises yield criterion. We will compare:
- **Theoretical boundary**: von Mises ellipse from materials science
- **Learned boundary**: Decision boundary from logistic regression
- **Confidence regions**: Probability contours showing model uncertainty

If the model successfully learned the physics, the learned boundary should closely match the theoretical von Mises ellipse.

In [None]:
# Create meshgrid for decision boundary visualization
x1_min, x1_max = X_failure[:, 0].min() - 50e6, X_failure[:, 0].max() + 50e6
x2_min, x2_max = X_failure[:, 1].min() - 50e6, X_failure[:, 1].max() + 50e6

xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max, 200),
                        np.linspace(x2_min, x2_max, 200))

# Prepare grid points for prediction
grid_points = np.c_[xx1.ravel(), xx2.ravel()]
grid_points_scaled = scaler_failure.transform(grid_points)
grid_points_poly = poly_failure.transform(grid_points_scaled)

# Predict probabilities
Z = model_failure.predict_proba(grid_points_poly)[:, 1]
Z = Z.reshape(xx1.shape)

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Left plot: Decision boundary with data
contour_levels = [0.1, 0.3, 0.5, 0.7, 0.9]
contour = ax1.contourf(xx1/1e6, xx2/1e6, Z, levels=contour_levels,
                       cmap='RdYlBu_r', alpha=0.6)
plt.colorbar(contour, ax=ax1, label='P(Yield)')

# Decision boundary at 0.5 probability
ax1.contour(xx1/1e6, xx2/1e6, Z, levels=[0.5], colors='green',
            linewidths=3, linestyles='solid')

# Theoretical von Mises ellipse
ax1.plot(sigma1_ellipse / 1e6, sigma2_ellipse / 1e6,
         'k--', linewidth=3, alpha=0.8, label='Theoretical von Mises')

# Data points
elastic_mask = df_failure['yielded'] == 0
plastic_mask = df_failure['yielded'] == 1

ax1.scatter(df_failure.loc[elastic_mask, 'sigma1'] / 1e6,
            df_failure.loc[elastic_mask, 'sigma2'] / 1e6,
            c='blue', alpha=0.7, s=50, edgecolors='k', linewidth=0.5,
            label='Elastic (Class 0)', marker='o', zorder=5)

ax1.scatter(df_failure.loc[plastic_mask, 'sigma1'] / 1e6,
            df_failure.loc[plastic_mask, 'sigma2'] / 1e6,
            c='red', alpha=0.7, s=50, linewidth=0.5,
            label='Yielded (Class 1)', marker='x', zorder=5)

# Add green line to legend
ax1.plot([], [], 'g-', linewidth=3, label='Learned boundary (P=0.5)')

ax1.set_xlabel('Principal Stress σ₁ (MPa)', fontsize=12)
ax1.set_ylabel('Principal Stress σ₂ (MPa)', fontsize=12)
ax1.set_title('Learned Decision Boundary vs Theoretical\nContours show yield probability',
              fontsize=13, fontweight='bold')
ax1.legend(fontsize=10, loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.set_aspect('equal', adjustable='box')

# Right plot: Probability histogram
ax2.hist(y_test_prob_f[y_test_f == 0], bins=20, alpha=0.7, color='blue',
         edgecolor='black', label='Elastic (Class 0)', density=True)
ax2.hist(y_test_prob_f[y_test_f == 1], bins=20, alpha=0.7, color='red',
         edgecolor='black', label='Yielded (Class 1)', density=True)

ax2.axvline(0.5, color='green', linestyle='--', linewidth=2, label='Decision threshold')

ax2.set_xlabel('Predicted Probability P(Yield)', fontsize=12)
ax2.set_ylabel('Density', fontsize=12)
ax2.set_title('Prediction Confidence Distribution\nTest Set',
              fontsize=13, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("📊 Visual Interpretation:")
print("\nLeft plot:")
print("   - Green solid line: Learned decision boundary (P = 0.5)")
print("   - Black dashed ellipse: Theoretical von Mises criterion")
print("   - Color contours: Probability of yielding (red = high, blue = low)")
print("   - Close alignment = model learned the physics!")
print("\nRight plot:")
print("   - Blue histogram: Predicted probabilities for elastic points")
print("   - Red histogram: Predicted probabilities for yielded points")
print("   - Good separation = model is confident in its predictions")
print("   - Overlap near 0.5 = model uncertain about boundary cases")

# Calculate how well boundaries match
# Sample points on theoretical boundary and check prediction
theta_sample = np.linspace(0, 2*np.pi, 100)
r_sample = sigma_y_failure / np.sqrt(1 - 0.5*np.sin(2*theta_sample))
boundary_points = np.column_stack([r_sample * np.cos(theta_sample),
                                   r_sample * np.sin(theta_sample)])

boundary_scaled = scaler_failure.transform(boundary_points)
boundary_poly = poly_failure.transform(boundary_scaled)
boundary_probs = model_failure.predict_proba(boundary_poly)[:, 1]

mean_prob = np.mean(boundary_probs)
std_prob = np.std(boundary_probs)

print(f"\n🎯 Boundary Alignment Analysis:")
print(f"   Mean probability on von Mises ellipse: {mean_prob:.3f}")
print(f"   Standard deviation: {std_prob:.3f}")
print(f"\n💡 Ideal result: Mean ≈ 0.5 (model boundary matches theory)")

if abs(mean_prob - 0.5) < 0.05:
    print("   ✅ Excellent alignment! Model learned von Mises criterion accurately.")
elif abs(mean_prob - 0.5) < 0.1:
    print("   🟢 Good alignment! Model captures the general yield behavior.")
else:
    print("   🟡 Moderate alignment. Model may need more training or features.")


---

## 2.9 Interactive Threshold Selection

### Beyond the Default 0.5 Threshold

By default, logistic regression uses P = 0.5 as the decision threshold. But in engineering applications, we often want to adjust this threshold based on the consequences of different errors:

**Conservative design** (safety-critical):
- Set threshold < 0.5 (e.g., 0.3)
- More likely to predict yield → larger safety margins
- Minimizes false negatives (predicting elastic when it actually yields)

**Aggressive design** (cost-sensitive):
- Set threshold > 0.5 (e.g., 0.7)
- Less likely to predict yield → smaller safety margins
- Minimizes false positives (predicting yield when it actually stays elastic)

### The Precision-Recall Tradeoff

**Precision**: Of all predicted yields, how many actually yielded?
$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}$$

**Recall**: Of all actual yields, how many did we predict?
$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}$$

Use the interactive tool below to explore how threshold selection affects model behavior.

---

### Why Recall Is Critical for Imbalanced Classes

**The problem:** When yield events are rare (e.g., 5% of samples), a naive model can achieve 95% accuracy by simply predicting "no yield" every time, yet catch **zero** actual failures.

**Why standard metrics mislead:**
- **Accuracy** looks high even when all positives are missed
- **ROC-AUC** remains optimistic because true negatives dominate
- **Precision** becomes unstable with few positive predictions

**Why recall matters for safety:**
In safety-critical applications, we must catch rare failures. **Recall** directly measures this:

$$\text{Recall} = \frac{\text{TP}}{\text{TP + FN}} = \frac{\text{True Yields Caught}}{\text{All Actual Yields}}$$

The complementary metric is **False Negative Rate (FNR)**, the fraction of failures we miss:

$$\text{FNR} = 1 - \text{Recall} = \frac{\text{FN}}{\text{TP + FN}}$$

**Practical strategy for imbalanced data:**

1. **Set a recall target** (e.g., 95% recall = 5% FNR) to ensure you catch most failures
2. **Use the Precision-Recall curve** instead of ROC to select thresholds—it focuses on positive class performance
3. **Report AUPRC** (Area Under PR Curve) alongside ROC-AUC for imbalanced datasets
4. **Use `class_weight="balanced"`** in training to penalize missed positives more heavily
5. **Choose cautious thresholds** (typically < 0.5) to boost recall when positives are scarce

**Example:** For rare yield detection, you might set threshold = 0.2 to achieve 95% recall, accepting lower precision (more false alarms) as the cost of not missing actual failures.

In [None]:
from sklearn.metrics import precision_recall_curve, roc_curve, auc

def interactive_threshold_selection(threshold):
    y_pred = (y_test_prob_f >= threshold).astype(int)
    acc = accuracy_score(y_test_f, y_pred)
    prec = precision_score(y_test_f, y_pred, zero_division=0)
    rec = recall_score(y_test_f, y_pred, zero_division=0)
    f1 = f1_score(y_test_f, y_pred, zero_division=0)
    tn, fp, fn, tp = confusion_matrix(y_test_f, y_pred).ravel()

    fig = plt.figure(constrained_layout=True, figsize=(16, 7))
    gs = fig.add_gridspec(2, 3)
    ax1 = fig.add_subplot(gs[:, 0])
    ax2 = fig.add_subplot(gs[0, 1])
    ax3 = fig.add_subplot(gs[0, 2])
    ax4 = fig.add_subplot(gs[1, 1:])

    # Decision regions
    ax1.contourf(xx1/1e6, xx2/1e6, Z, levels=[0, threshold, 1],
                 colors=['#a6cee3', '#fb9a99'], alpha=0.6)
    ax1.contour(xx1/1e6, xx2/1e6, Z, levels=[threshold], colors='green', linewidths=3)
    ax1.plot(sigma1_ellipse/1e6, sigma2_ellipse/1e6, 'k--', lw=2.5, label='von Mises (theory)')

    # Test points
    mask_correct = (y_pred == y_test_f)
    ax1.scatter(X_test_f[mask_correct & (y_test_f==0), 0]/1e6,
                X_test_f[mask_correct & (y_test_f==0), 1]/1e6,
                c='blue', s=50, edgecolors='k', linewidth=0.5, marker='o', label='TN')
    ax1.scatter(X_test_f[mask_correct & (y_test_f==1), 0]/1e6,
                X_test_f[mask_correct & (y_test_f==1), 1]/1e6,
                c='red', s=60, linewidth=1.0, marker='x', label='TP')
    ax1.scatter(X_test_f[~mask_correct & (y_test_f==0), 0]/1e6,
                X_test_f[~mask_correct & (y_test_f==0), 1]/1e6,
                c='orange', s=70, edgecolors='k', linewidth=1.0, marker='s', label='FP')
    ax1.scatter(X_test_f[~mask_correct & (y_test_f==1), 0]/1e6,
                X_test_f[~mask_correct & (y_test_f==1), 1]/1e6,
                c='purple', s=70, edgecolors='k', linewidth=1.0, marker='s', label='FN')
    ax1.set_aspect('equal', adjustable='box')
    ax1.grid(True, alpha=0.3)
    ax1.set_xlabel('σ₁ (MPa)')
    ax1.set_ylabel('σ₂ (MPa)')
    ax1.set_title(f'Threshold = {threshold:.2f}')
    ax1.legend(fontsize=8, loc='upper right')

    # Confusion matrix (same style, but with white text on dark)
    cm = np.array([[tn, fp], [fn, tp]])
    im = ax2.imshow(cm, cmap='Blues')
    vmax = cm.max()

    for i in range(2):
        for j in range(2):
            val = cm[i, j]
            # compute normalized intensity to decide text color
            c = im.cmap(im.norm(val))
            luminance = 0.299*c[0] + 0.587*c[1] + 0.114*c[2]
            text_color = 'white' if luminance < 0.5 else 'black'
            ax2.text(j, i, f'{val}', ha='center', va='center',
                     fontsize=14, color=text_color, fontweight='bold')

    ax2.set_xticks([0, 1]); ax2.set_yticks([0, 1])
    ax2.set_xticklabels(['Pred 0', 'Pred 1'])
    ax2.set_yticklabels(['True 0', 'True 1'])
    ax2.set_title(f'Acc {acc:.2f}  Prec {prec:.2f}  Rec {rec:.2f}  F1 {f1:.2f}')

    # Precision-Recall curve
    pr_prec, pr_rec, _ = precision_recall_curve(y_test_f, y_test_prob_f)
    ax3.plot(pr_rec, pr_prec, lw=2)
    ax3.scatter([rec], [prec], s=60)
    ax3.set_xlabel('Recall'); ax3.set_ylabel('Precision')
    ax3.set_title('Precision-Recall')

    # ROC curve
    fpr, tpr, roc_thr = roc_curve(y_test_f, y_test_prob_f)
    ax4.plot(fpr, tpr, lw=2, label=f'AUC = {auc(fpr, tpr):.3f}')
    idx = np.argmin(np.abs(roc_thr - threshold))
    ax4.scatter([fpr[idx]], [tpr[idx]], s=60)
    ax4.plot([0,1],[0,1],'k--', alpha=0.3)
    ax4.set_xlabel('FPR'); ax4.set_ylabel('TPR')
    ax4.set_title('ROC')
    ax4.legend()

    plt.show()

threshold_slider = widgets.FloatSlider(
    value=0.5, min=0.05, max=0.95, step=0.05,
    description='Decision threshold:', continuous_update=False,
    style={'description_width': 'initial'}, layout=widgets.Layout(width='500px')
)
print("Threshold selection - move the slider to trade precision and recall.")
interact(interactive_threshold_selection, threshold=threshold_slider)


### How to read the ROC panel (what it shows here)

- **What it plots:** the **Receiver Operating Characteristic (ROC)** curve traces model performance as the decision **threshold sweeps from 1 → 0**.  
  Each point is a pair **(FPR, TPR)** at a particular threshold.
- **The dashed diagonal** is random guessing. Curves closer to the **top-left** are better (high TPR at low FPR).  
- **AUC (area under the curve)** summarizes performance over **all thresholds**:  
  1.0 = perfect, 0.5 = random.
- **The dot on the curve** marks the **current threshold** from the slider. Moving the slider slides this dot, trading **missed yields (FN)** against **false alarms (FP)**.

**How to use it to pick a threshold**
- If you must **limit false alarms**, pick the smallest threshold that keeps **FPR ≤ target** (e.g., 5%) while giving acceptable TPR.  
- If you want a single default choice, a common rule is to maximize **Youden’s J = TPR − FPR** (point farthest above the diagonal).

---

# Conclusions: Key Insights and Engineering Applications

---

## What We Learned


### Part 2: Logistic Regression for Binary Classification

**Core principle**: Model probability of binary outcomes using sigmoid function and optimize with log-loss cost function.

**Key insights**:
1. **Sigmoid maps linear output to probabilities**: Smooth, differentiable function suitable for optimization
2. **Log-loss penalizes confident wrong predictions**: Convex cost function with single global minimum
3. **Polynomial features enable learning complex boundaries**: Model discovered quadratic von Mises relationship from data
4. **Decision boundary approximates physics**: Learned ellipse closely matches theoretical von Mises criterion
5. **Threshold selection affects error types**: Can tune for conservative (safety) vs aggressive (cost) design

**Practical lesson**: Machine learning can discover physical relationships from data, but domain knowledge (von Mises theory) helps interpret and validate results.

---

## The Most Important Lesson

### Machine Learning + Domain Knowledge = Powerful Engineering Tool

Throughout this notebook, we saw this synergy in action:

**Physics guided feature selection**:
- Hall-Petch relationship (1/√grain_size) based on dislocation theory
- Polynomial strain terms (ε², ε³) matching plastic deformation physics
- Quadratic stress terms (σ₁², σ₁σ₂, σ₂²) from von Mises distortion energy

**Materials science informed model architecture**:
- Degree 3-5 polynomials match typical stress-strain curvature
- Quadratic features sufficient for yield surface (no need for higher orders)
- Symmetric ellipse shape consistent with isotropic material behavior

---

## Key Takeaways for Engineering Practice

### 1. Start with Physical Understanding
Before applying ML, understand:
- What physical processes govern the system?
- What are the relevant length/time scales?
- What constraints must the solution satisfy?

This prevents "garbage in, garbage out" and enables intelligent feature engineering.

### 2. Validate Against Known Physics
ML models should:
- Reproduce theoretical results in idealized conditions
- Respect conservation laws and physical constraints
- Provide interpretable coefficients matching physical expectations
- Fail gracefully when extrapolating beyond training data

### 3. Quantify Uncertainty
Engineering decisions require uncertainty quantification:
- Train-test split assesses generalization
- Regularization prevents overconfidence
- Probability outputs (logistic regression) enable risk analysis
- Confidence intervals for predictions inform safety factors

### 4. Iterate Based on Physical Insights
If model performance is poor:
- Check if underfitting (too simple) or overfitting (too complex)
- Add physics-based features before arbitrary transformations
- Use error patterns to diagnose systematic issues
- Collect more data in regions where model struggles

**Remember**:
- Start simple, add complexity only when needed
- Regularization prevents overfitting, not underfitting
- Feature engineering often matters more than algorithm choice
- Validation against physical principles catches spurious correlations
- Uncertainty quantification enables responsible engineering decisions

---