<a href="https://colab.research.google.com/github/gustafbjurstam/ML-retreat-tekmek-2025/blob/main/linear_and_logistic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning for Material Science: Linear and Logistic Regression

## Learning Goals

This notebook introduces common machine learning techniques in the context of materials science and engineering. By working through realistic scenarios with engineering data, you will learn to:

- Apply data analysis tools including Principal Component Analysis and feature importance evaluation to understand which material properties matter most for predicting mechanical behavior
- Understand the concept of a cost function and how it guides the learning process in machine learning algorithms
- Master linear regression for modeling continuous relationships such as stress-strain curves in materials testing
- Apply logistic regression for binary classification problems such as predicting material failure under complex loading conditions
- Combine machine learning methods with domain knowledge from materials science to build better predictive models and gain deeper insights into material behavior

Throughout this notebook, you will see how machine learning works best when combined with physical understanding of the systems we are trying to model.

## Table of Contents

### Part 1: Linear Regression for Stress-Strain Modeling
1. Problem Introduction: Tensile Testing
2. Data Generation and Visualization
3. Data Analysis Tools
   - Principal Component Analysis
   - R-squared for Feature Importance
4. Linear Regression Fundamentals
   - Hypothesis Definition
   - Cost Function
   - Gradient Descent
5. Modeling Stress-Strain Relationships
   - Simple Linear Model
   - Polynomial Regression
   - Interactive Feature Selection

### Part 2: Logistic Regression for Failure Prediction
1. Material Failure Under Complex Loading
   - Von Mises Yield Criterion
   - Multi-axial Stress States
2. Logistic Regression Theory
   - Sigmoid Function
   - Cost Function for Classification
   - Decision Boundaries
3. Building a Failure Prediction Model
   - Training the Model
   - Comparing with Theory
   - Interactive Threshold Selection
4. Key Insights and Engineering Applications

In [None]:
#@title Setup: Import all required libraries

# Core scientific computing
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots

# Interactive widgets and display
import ipywidgets as widgets
from IPython.display import display, clear_output, Image, SVG, Video

# Machine learning - preprocessing
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

# Machine learning - decomposition
from sklearn.decomposition import PCA

# Machine learning - models
from sklearn.linear_model import LinearRegression, LogisticRegression

# Machine learning - metrics
from sklearn.metrics import r2_score, accuracy_score, classification_report

print("All libraries imported successfully!")
print("Ready to begin the machine learning journey through materials science.")

---
# Part 1: Linear Regression for Stress-Strain Modeling
---

# Problem introduction

A colleague in the materials testing laboratory has been conducting tensile tests on a low-carbon steel alloy. During a tensile test, a specimen is stretched until it breaks, and the applied force and resulting elongation are measured. These measurements allow us to determine the stress-strain relationship of the material.

**What are stress and strain?**

Stress is the force per unit area applied to a material, measured in Pascals (Pa) or megapascals (MPa). When you pull on a metal bar, the stress tells us how much force is distributed across its cross-section.

Strain is the relative deformation of the material - how much it stretches compared to its original length. If a 100 mm bar stretches to 105 mm, the strain is 0.05 or five percent. Strain is dimensionless since it is a ratio of lengths.

**Elastic versus plastic deformation**

When stress is low, materials deform elastically. This means they behave like a spring - remove the load and they return to their original shape. The relationship between stress and strain in this region is linear, following Hooke's law where stress equals Young's modulus times strain.

Beyond a certain stress level called the yield strength, materials begin to deform plastically. Plastic deformation is permanent - the material does not return to its original shape when unloaded. In this region, the stress continues to increase with strain, but the relationship is no longer linear. The material work-hardens as dislocations in its crystal structure multiply and interact.

Eventually, the specimen begins to neck - forming a localized region of reduced cross-section. After necking begins, the engineering stress (force divided by original area) actually decreases even though the true stress in the necked region continues to increase until fracture.

Your colleague has collected data from 150 measurement points during a single tensile test. However, they also recorded various material properties and testing conditions. They have shared this data with you to help analyze which factors most strongly influence the stress-strain behavior. The challenge is to determine which material parameters are important for predicting stress and which provide little useful information.

The data looks as follows:

In [None]:
#@title Generate realistic stress-strain data
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

# =============================================================================
# PHYSICS-BASED STRESS-STRAIN MODEL
# =============================================================================

def _yield_uts_epsu(carbon_wt, grain_um, temperature_C, strain_rate, heat_treatment):
    """
    Compute yield strength, UTS and uniform elongation from metallurgy principles.
    These relationships are based on established materials science:
    - Hall-Petch: strength increases with decreasing grain size
    - Carbon strengthening: higher carbon content increases strength
    - Temperature effects: strength decreases at higher temperatures
    - Strain rate sensitivity: higher rates increase flow stress
    """
    # Heat treatment effect on strength
    ht_bias = 0.0
    if heat_treatment == 'normalized':
        ht_bias = 70e6  # Normalized steel is stronger
    elif heat_treatment == 'annealed':
        ht_bias = -90e6  # Annealed steel is softer

    # Hall-Petch relationship: strength proportional to 1/sqrt(grain_size)
    k_hp = 220e6
    hall_petch = k_hp / np.sqrt(grain_um)

    # Yield strength calculation
    sy_base = 260e6
    sigma_y = (
        sy_base
        + 250e6 * carbon_wt                  # Carbon strengthening
        + hall_petch                         # Grain size effect
        + ht_bias                            # Heat treatment effect
        - 0.35e6 * (temperature_C - 23.0)    # Temperature softening
        + 25e6 * (np.log10(strain_rate) + 4) # Strain rate hardening
    )
    sigma_y = max(sigma_y, 200e6)

    # Work-hardening capacity (decreases with carbon and fine grains)
    delta_uts = 260e6 * (1.0 - 0.7 * carbon_wt) + 40e6 / np.sqrt(grain_um / 40.0)
    sigma_uts = sigma_y + delta_uts

    # Uniform elongation (ductility decreases with carbon)
    epsilon_u = (
        0.18
        - 0.12 * carbon_wt
        + 0.01 * np.log10(grain_um / 40.0)
        - 0.0006 * (temperature_C - 23.0)
    )
    epsilon_u = float(np.clip(epsilon_u, 0.05, 0.18))
    return sigma_y, sigma_uts, epsilon_u

def _true_and_engineering_stress(strain, E, sigma_y, sigma_uts, epsilon_u, carbon_wt):
    """
    Build a realistic stress-strain curve with:
    - Elastic region (linear)
    - Plastic region (work hardening)
    - Necking region (engineering stress decreases)
    """
    strain = np.asarray(strain)
    epsilon_y = sigma_y / E
    b = 12.0 + 10.0 * carbon_wt  # Work hardening rate

    sigma_true = np.empty_like(strain, dtype=float)
    for i, e in enumerate(strain):
        if e <= epsilon_y:
            # Elastic region: Hooke's law
            sigma_true[i] = E * e
        else:
            # Plastic region: Voce hardening model
            eps_p = e - epsilon_y
            eps_p_u = max(epsilon_u - epsilon_y, 1e-6)
            if e <= epsilon_u:
                sigma_true[i] = sigma_uts - (sigma_uts - sigma_y) * np.exp(-b * eps_p / eps_p_u)
            else:
                # Post-uniform elongation
                sigma_true[i] = sigma_uts * (1.0 + 0.04 * (e - epsilon_u))

    # Engineering stress includes necking softening
    ksoft = 8.0 + 18.0 * carbon_wt
    sigma_eng = sigma_true.copy()
    mask = strain > epsilon_u
    sigma_eng[mask] = sigma_true[mask] * np.exp(-ksoft * (strain[mask] - epsilon_u))
    return sigma_true, sigma_eng, epsilon_y

def generate_realistic_stress_strain_data(N=150, seed=42):
    """
    Generate a single-specimen data set with realistic material behavior.
    """
    rng = np.random.default_rng(seed)
    E = 210e9  # Young's modulus for steel (Pa)

    # Baseline specimen properties (low-carbon steel)
    baseline = dict(
        carbon_wt=0.18,            # 0.18 wt% carbon
        grain_um=40.0,             # 40 μm grain size
        temperature_C=23.0,        # Room temperature
        strain_rate=1e-4,          # Quasi-static testing
        heat_treatment='as_rolled' # As-rolled condition
    )
    sy, uts, eps_u = _yield_uts_epsu(**baseline)

    # Generate smooth fitted curve for reference
    strain_grid = np.linspace(0.0, 0.25, 400)
    sigma_true_grid, sigma_eng_grid, eps_y = _true_and_engineering_stress(
        strain_grid, E, sy, uts, eps_u, baseline["carbon_wt"]
    )

    # Sample N points from the same specimen with measurement noise
    strain = np.sort(rng.uniform(0.0, 0.25, size=N))
    sigma_true_pts, sigma_eng_pts, _ = _true_and_engineering_stress(
        strain, E, sy, uts, eps_u, baseline["carbon_wt"]
    )
    noise = rng.normal(0.0, 0.02 * np.maximum(sigma_eng_pts, 1.0), size=N)  # 2% noise
    stress_measured = np.clip(sigma_eng_pts + noise, 0.0, None)

    df = pd.DataFrame({
        'strain': strain,
        'stress': stress_measured,
        'stress_eng_true': sigma_eng_pts,
        'stress_true': sigma_true_pts,
        'yield_strength': sy,
        'uts': uts,
        'epsilon_y': eps_y,
        'epsilon_u': eps_u,
        # Material properties
        'carbon_content': np.full(N, baseline["carbon_wt"]),
        'grain_size_um': np.full(N, baseline["grain_um"]),
        'temperature_C': np.full(N, baseline["temperature_C"]),
        'strain_rate': np.full(N, baseline["strain_rate"]),
        'surface_roughness': rng.lognormal(-1, 0.2, size=N),
        'specimen_thickness': np.full(N, 5.0),
        'specimen_width': np.full(N, 10.0),
        'heat_treatment': np.full(N, baseline["heat_treatment"]),
    })

    fit = {
        'strain_grid': strain_grid,
        'sigma_eng_grid': sigma_eng_grid
    }
    return df, fit

# Generate the data
df, fit_curve = generate_realistic_stress_strain_data(N=150, seed=42)

# Save to CSV for later use
df.to_csv("realistic_stress_data.csv", index=False)

# Preview the first few rows
print(df.head())

As we can see, this raw data contains many columns. Some represent fundamental material properties like carbon content and grain size that we know from materials science affect mechanical behavior. Others like surface roughness might have minimal impact on the bulk stress-strain response. Our goal is to use data analysis methods to identify which features truly matter for predicting stress.

## Data analysis

### Principal Component Analysis (PCA)

Principal Component Analysis is among the most widely used tools for uncovering structure in data. The main idea behind PCA is to describe data in terms of directions of highest variance. In the two-dimensional example below, we can see that data can be described by the vector direction of the highest variance (the longest arrow) and another vector perpendicular to it. Variance can be thought of as how much spread there is in the data along a particular direction.

What makes PCA powerful is that data of arbitrary dimensions can be described in terms of these variance-maximizing vector directions, called principal components. This helps us understand which combinations of features account for most of the variation in our measurements.

In [None]:
#@title Figure 1: PCA concept
display(SVG(filename='/content/GaussianScatterPCA.svg'))

Image: https://en.wikipedia.org/wiki/Principal_component_analysis#/media/File:GaussianScatterPCA.svg

PCA is useful for data analysis to determine which aspects of data carry the most information. In the example below we can see how PCA enables dimensionality reduction. As we go from three dimensions to two dimensions, we omit PC3 (not shown in the image), which is a vector orthogonal to both PC1 and PC2. As we can see, removing PC3 does not significantly impact the shape of the data cloud because PC3 represents the direction in which the data varies the least. In proper terms, we say that PC3 has the lowest explained variance.

In [None]:
#@title Figure 2: Dimensionality reduction
display(Image(filename='/content/PCA3to2.jpg'))

Image: https://medium.com/@TheDataGyan/dimensionality-reduction-with-pca-and-t-sne-in-r-2715683819

Now, let's apply PCA to our materials testing data. We'll focus on the meaningful material properties and testing conditions. The table below shows how each material parameter contributes to each principal component. The columns are ordered with respect to the explained variance of each principal component, with the percentage shown in brackets. Take a look at the data and consider which material parameters seem to explain most of the variation in our stress-strain measurements.

In [None]:
#@title PCA analysis on material properties

# Select meaningful features for PCA
# We include material properties and test conditions that we know from
# materials science can influence stress-strain behavior
feature_cols = [
    'carbon_content',
    'grain_size_um',
    'temperature_C',
    'strain_rate',
    'specimen_thickness',
    'specimen_width',
    'surface_roughness'
]

X = df[feature_cols].copy()

# Standardize features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Run PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Explained variance
explained_var = pca.explained_variance_ratio_
print(f"Explained Variance Ratio: {np.round(explained_var, 2) * 100}%")
cumulative_var = explained_var.cumsum()

# Loadings (feature contributions to each PC)
loadings = pd.DataFrame(
    pca.components_.T,
    columns=[f"PC{i+1} ({np.round(explained_var[i], 2) * 100:.0f}%)" for i in range(len(feature_cols))],
    index=feature_cols
)
print("\nPCA Loadings (feature contributions to each principal component):")
print(loadings.round(3))
print("\nInterpretation: Large absolute values indicate that a feature strongly")
print("contributes to that principal component. Notice which material properties")
print("dominate the first few components that explain most variance.")

Next, we can plot the cumulative explained variance with respect to the principal components. This tells us how many principal components we need to capture most of the variation in our material parameters. For materials testing data where some properties are held relatively constant during a single test, we might expect the first few components to capture most of the variance.

In [None]:
#@title Cumulative explained variance
plt.figure(figsize=(8, 4))
plt.plot(range(1, len(cumulative_var) + 1), cumulative_var, marker='o')
plt.xlabel("Number of Principal Components")
plt.ylabel("Cumulative Variance Explained")
plt.title("PCA: Cumulative Explained Variance")
plt.grid(True)
plt.axhline(y=0.95, color='r', linestyle='--', label='95% variance')
plt.legend()
plt.tight_layout()
plt.show()

Lastly, we can visualize the first three principal components in three-dimensional space. The data points are colored by their corresponding stress value from the data set. This gives us a sense of how the material parameters and stress are related in this lower-dimensional representation.

In [None]:
#@title 3D PCA projection

# Use same features as before
X = df[feature_cols].copy()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Run PCA (first 3 components)
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X_scaled)

explained_var = pca.explained_variance_ratio_
print(f"Explained Variance Ratio (PC1–3): {np.round(explained_var, 3) * 100}%")

# Color by stress
color_values = df['stress']

# Create interactive 3D plot
fig = go.Figure(data=[
    go.Scatter3d(
        x=X_pca[:, 0],
        y=X_pca[:, 1],
        z=X_pca[:, 2],
        mode='markers',
        marker=dict(
            size=6,
            color=color_values,
            colorscale='Viridis',
            colorbar=dict(title='Stress (Pa)'),
            opacity=0.8
        ),
        text=[f"Strain: {s:.4f}<br>Stress: {st/1e6:.1f} MPa"
              for s, st in zip(df['strain'], df['stress'])]
    )
])

fig.update_layout(
    title="Interactive 3D PCA Plot (PC1, PC2, PC3)",
    scene=dict(
        xaxis_title='PC1',
        yaxis_title='PC2',
        zaxis_title='PC3'
    ),
    margin=dict(l=0, r=0, b=0, t=30)
)

fig.show()

## R-squared (R²)

Another useful method for making sense of our materials data is R-squared, denoted as R². This metric helps us determine which individual material parameter can best predict the stress on its own, better than if we were to always guess the mean stress value regardless of the input.

The R-squared value is calculated as:

$$R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}$$

Where RSS is the residual sum of squares (how much our predictions miss the actual values) and TSS is the total sum of squares (how much the actual values vary from their mean). An R² value close to 1 means that feature explains most of the variation in stress, while a value close to 0 means it explains very little.

The interactive demo below shows how R² changes as we adjust the slope of a fitted line. The residuals (red dashed lines) show the vertical distance between each data point and the model prediction.

In [None]:
#@title R-squared demo

# Generate example data
x = np.linspace(0, 10, 10)
true_slope = 2.0
true_intercept = 1.0

# Noisy observations
y_data_r2 = true_slope * x + true_intercept + np.random.normal(scale=2.0, size=len(x))
y_mean_r2 = np.mean(y_data_r2)

# Rotation center
x_mid = (x.min() + x.max()) / 2
y_mid = true_slope * x_mid + true_intercept

def plot_rotated_line(slope):
    clear_output()
    # Rotated model prediction
    y_model_r2 = slope * (x - x_mid) + y_mid

    # Compute R² score
    r2 = r2_score(y_data_r2, y_model_r2)

    # Plotting
    plt.figure(figsize=(8, 5))
    plt.scatter(x, y_data_r2, label='Data', color='orange', zorder=3)

    # Plot model line
    plt.plot(x, y_model_r2, label=f'Model (Slope = {slope:.2f})', color='blue')

    once = True
    # Vertical residuals
    for xi, yi_data, yi_model in zip(x, y_data_r2, y_model_r2):
        if once:
            plt.plot([xi, xi], [yi_data, yi_model], 'r--', linewidth=1, label="Residuals")
            once = False
        else:
            plt.plot([xi, xi], [yi_data, yi_model], 'r--', linewidth=1)

    # Mean line
    plt.axhline(y_mean_r2, color='green', linestyle=':', label=f'Mean = {y_mean_r2:.2f}')

    plt.xlabel("Input feature")
    plt.ylabel("Output")
    plt.title(f"R² Score: {r2:.3f}")
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# Interactive slider
slope_slider = widgets.FloatSlider(
    value=true_slope,
    min=-5.0,
    max=5.0,
    step=0.1,
    description='Slope:',
    continuous_update=False
)

widgets.interact(plot_rotated_line, slope=slope_slider)

As we can see, the final R² value is maximized when we find the best line of fit. But how do we find this line mathematically? This is where linear regression comes in.

## Linear Regression

### Hypothesis definition

Linear regression is one of the most fundamental tools in machine learning and engineering. It is used for finding coefficients that fit a model to data. The model is our hypothesis - it is an analytical formula that we think can describe the data. The method is called *linear* regression because it requires that the hypothesis coefficients appear linearly. Examples of valid hypotheses are:

$$ h_{\theta}(x) = \theta_0 + \theta_1 x $$
$$ h_{\theta}(x) = \theta_0 + \theta_1 x^2 + \theta_2 \sin{x} $$

whereas invalid hypothesis examples would be:

$$ h_{\theta}(x) = x^{\theta_0} $$
$$ h_{\theta}(x) = \theta_0^{x} + \sin{(\theta_1 x)}$$

Notice that in the valid examples, the parameters θ appear linearly (multiplied by features), while in the invalid examples they appear in exponents.

For now, let's define our hypothesis as a simple linear model:

$$h_{\theta}(x) = \theta_0 + \theta_1 x$$

We will fit this to the stress and strain data. Let's first visualize the stress-strain relationship we are trying to model.

In [None]:
#@title Stress-strain data visualization

plt.figure(figsize=(8, 5))
plt.scatter(df['strain'], df['stress'] / 1e6, color='orange', alpha=0.7, edgecolors='k', s=30)
plt.plot(fit_curve['strain_grid'], fit_curve['sigma_eng_grid'] / 1e6,
         'b-', linewidth=2, alpha=0.7, label='True behavior')
plt.xlabel("Strain")
plt.ylabel("Stress (MPa)")
plt.title("Stress vs. Strain for Low-Carbon Steel")
plt.legend(['Measured points','True curve'])
plt.grid(True)
plt.tight_layout()
plt.show()

print("Notice the three distinct regions:")
print("1. Elastic region: Linear increase at low strain")
print("2. Plastic region: Nonlinear work hardening")
print("3. Necking region: Stress decreases as specimen localizes deformation")

## Cost function

To find the line of best fit, we need to define a metric that tells us what makes one line better or worse than another. Such a function is called the *cost function*. For linear regression, the cost function is typically defined as the mean squared error:

$$ J(\theta_0, \theta_1) = \frac{1}{2m} \sum^m_{i=1}(h_{\theta}(x^{(i)}) - y^{(i)})^2 $$

Where $y^{(i)}$ is the i-th stress value of $m$ points in our data set. By minimizing $J(\theta_0, \theta_1)$ we essentially minimize the average squared distance between what the model predicts, $h_{\theta}(x^{(i)})$, and what the actual measured value is, $y^{(i)}$.

The shape of the cost function depends on the data. For example, if our data set consists of only one point $(x, y) = (1,1)$, then the cost function would be:

\begin{align*}
J(\theta_0, \theta_1)
&= \frac{1}{2 \cdot 1} \left(\theta_0 + \theta_1 \cdot 1 - 1\right)^2 \\
&= \frac{1}{2} (\theta_0 + \theta_1 - 1)^2
\end{align*}

Notice that whenever $\theta_0 + \theta_1 = 1$, we have $J(\theta_0, \theta_1) = 0$. This means there is no single best line fitting this data set, which makes sense since infinitely many lines pass through a single point, and the cost function reflects this ambiguity.

## Gradient descent updates

Rather than solving for the minimum analytically, we can iteratively "move" the model coefficients toward values that reduce the cost. We do this by computing the gradient of the cost function, which tells us the direction of steepest increase. The update step is:

$$
\begin{cases}
\theta_0 := \theta_0 - \alpha \frac{\partial J}{\partial \theta_0} \\
\theta_1 := \theta_1 - \alpha \frac{\partial J}{\partial \theta_1}
\end{cases}
$$

Where the partial derivatives are:

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$$

The hyperparameter $\alpha$ is called the *learning rate*. A suitable value of $\alpha$ results in quick convergence to a solution through a process called *gradient descent*. To better understand how linear regression converges, explore the demo below. First set a learning rate value with the slider, then run the following cell to simulate gradient descent. Experiment with what happens when $\alpha$ is too large or too small.

In [None]:
#@title Set the learning rate

slider_log = widgets.FloatLogSlider(
    value=0.7,
    base=10,
    min=-1,
    max=0.5,
    step=0.05,
    description='Learning rate'
)

display(slider_log)

In [None]:
#@title Gradient descent demo

# Generate simple data for demo
m = 20
theta0_true = 2
theta1_true = 0.5
x = np.linspace(-1, 1, m)
y = theta0_true + theta1_true * x + np.random.randn(m) * 0.2

def hypothesis(x, theta0, theta1):
    return theta0 + theta1 * x

def cost(theta0, theta1):
    return np.mean((hypothesis(x, theta0, theta1) - y) ** 2) / 2

# Cost surface
theta0_vals = np.linspace(-1, 4, 100)
theta1_vals = np.linspace(-5, 5, 100)
T0, T1 = np.meshgrid(theta0_vals, theta1_vals)
J_vals = np.vectorize(cost)(T0, T1)

# Gradient descent
N = 15
alpha = slider_log.value
theta0_initial = -0.9
theta1_initial = -4.2
theta_path = [np.array([theta0_initial, theta1_initial])]
J_path = [cost(theta0_initial, theta1_initial)]

for _ in range(N - 1):
    theta0, theta1 = theta_path[-1]
    pred = hypothesis(x, theta0, theta1)
    grad0 = np.mean(pred - y)
    grad1 = np.mean((pred - y) * x)
    new_theta = np.array([theta0 - alpha * grad0, theta1 - alpha * grad1])
    theta_path.append(new_theta)
    J_path.append(cost(*new_theta))

theta_path = np.array(theta_path)
J_path = np.array(J_path)

# Create animation frames
frames = []
x_line = np.linspace(-1, 1, 100)

for i in range(1, len(theta_path) + 1):
    t0, t1 = theta_path[i - 1]
    y_pred = hypothesis(x_line, t0, t1)

    frames.append(go.Frame(
        name=f"step{i}",
        data=[
            go.Surface(x=theta0_vals, y=theta1_vals, z=J_vals,
                       colorscale='Viridis', opacity=0.8, showscale=False),
            go.Scatter3d(
                x=theta_path[:i, 0],
                y=theta_path[:i, 1],
                z=J_path[:i],
                mode='lines+markers',
                line=dict(color='red', width=4),
                marker=dict(size=5, color='red')
            ),
            go.Scatter(
                x=x, y=y, mode='markers',
                marker=dict(color='black', symbol='x', size=8),
                xaxis='x2', yaxis='y2'
            ),
            go.Scatter(
                x=x_line, y=y_pred,
                mode='lines',
                line=dict(color='red', width=3),
                xaxis='x2', yaxis='y2'
            )
        ]
    ))

# Initial traces
t0, t1 = theta_path[0]
y_pred = hypothesis(x_line, t0, t1)

fig = make_subplots(rows=1, cols=2,
                    specs=[[{"type": "surface"}, {"type": "xy"}]],
                    column_widths=[0.6, 0.4],
                    subplot_titles=("Cost Function Surface", "Line Fit During Gradient Descent"))

fig.add_trace(go.Surface(
    x=theta0_vals, y=theta1_vals, z=J_vals,
    colorscale='Viridis', opacity=0.8, showscale=False
), row=1, col=1)

fig.add_trace(go.Scatter3d(
    x=[theta_path[0, 0]],
    y=[theta_path[0, 1]],
    z=[J_path[0]],
    mode='lines+markers',
    marker=dict(size=5, color='red'),
    line=dict(color='red', width=4)
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=x, y=y, mode='markers',
    marker=dict(color='black', symbol='x', size=8),
    xaxis='x2', yaxis='y2'
), row=1, col=2)

fig.add_trace(go.Scatter(
    x=x_line, y=hypothesis(x_line, *theta_path[0]),
    mode='lines',
    line=dict(color='red', width=3),
    xaxis='x2', yaxis='y2'
), row=1, col=2)

# Animation buttons
buttons = [
    dict(label="▶️ Play",
         method="animate",
         args=[None, {
             "frame": {"duration": 1000, "redraw": True},
             "fromcurrent": True,
             "transition": {"duration": 200}
         }]
    ),
    dict(label="⏸️ Pause",
         method="animate",
         args=[[None], {
             "mode": "immediate",
             "frame": {"duration": 0, "redraw": False},
             "transition": {"duration": 0}
         }]
    ),
    dict(label="🔄 Restart",
         method="animate",
         args=[[f"step1"], {
             "frame": {"duration": 0, "redraw": True},
             "mode": "immediate",
             "transition": {"duration": 0}
         }]
    )
]

fig.update_layout(
    title=dict(text="Gradient Descent: Surface + Line Fit", x=0.5, y=0.95),
    updatemenus=[dict(
        type="buttons",
        direction="left",
        showactive=False,
        buttons=buttons,
        x=0.5,
        y=1.02,
        xanchor="center",
        yanchor="bottom",
        pad={"r": 10, "t": 10, "b": 25}
    )],
    margin=dict(t=120),
    width=1000,
    height=600,
    scene=dict(
        xaxis_title='θ₀',
        yaxis_title='θ₁',
        zaxis_title='Cost J(θ₀, θ₁)'
    ),
    xaxis2=dict(title="x", domain=[0.6, 1.0]),
    yaxis2=dict(title="y", domain=[0.0, 1.0])
)

fig.frames = frames
fig.show()

print("\nObserve how the red path descends the cost surface toward the minimum.")
print("The right plot shows how the fitted line improves at each iteration.")

## Simple stress-strain linear model

Using linear regression, we can attempt to model the stress-strain relationship with a simple linear hypothesis. However, as we will see, a purely linear model has limitations for capturing the full complexity of material behavior, particularly the plastic deformation and necking regions.

In [None]:
#@title Stress-strain simple linear model

# Prepare data
X = df['strain'].values.reshape(-1, 1)
y = df['stress'].values

# Fit linear regression
model = LinearRegression()
model.fit(X, y)

# Generate points for the line of best fit
x_fit = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_fit = model.predict(x_fit)

# Calculate R²
from sklearn.metrics import r2_score
r2 = r2_score(y, model.predict(X))

# Plot with line of best fit
plt.figure(figsize=(8, 5))
plt.scatter(df['strain'], df['stress'] / 1e6, color='orange', alpha=0.7, edgecolors='k', label='Data')
plt.plot(x_fit, y_fit / 1e6, 'r-', linewidth=2,
         label=f'Linear fit: σ = {model.coef_[0]/1e9:.2f}ε + {model.intercept_/1e6:.1f} (R²={r2:.3f})')
plt.plot(fit_curve['strain_grid'], fit_curve['sigma_eng_grid'] / 1e6,
         'b--', linewidth=1.5, alpha=0.5, label='True behavior')

plt.xlabel("Strain")
plt.ylabel("Stress (MPa)")
plt.title("Stress vs. Strain with Linear Fit")
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

print(f"\nYoung's modulus from linear fit: {model.coef_[0]/1e9:.2f} GPa")
print(f"True Young's modulus: 210 GPa")
print(f"\nThe linear model captures the overall trend but misses the nonlinear")
print(f"plastic deformation behavior. This shows the importance of choosing")
print(f"appropriate models based on physical understanding of the system.")

## R-squared (R²) — Feature importance analysis

Now that we understand linear regression and R², let's apply this tool to learn which material parameters most strongly influence stress in our data set. Recall that R² tells us how well a given material parameter can predict the stress value. Higher R² values indicate stronger predictive power, with a maximum of R² = 1 representing perfect prediction.

From materials science, we expect certain parameters to matter more than others. For example, carbon content significantly affects strength because carbon atoms impede dislocation motion in the steel crystal structure. The Hall-Petch relationship tells us that strength should be proportional to the inverse square root of grain size, so we will test that engineered feature as well.

In [None]:
#@title Create engineered features
# Add physically meaningful features based on materials science

# Material property features
df['grain_size_inv_sqrt'] = 1.0 / np.sqrt(df['grain_size_um'])
df['cross_section_area'] = df['specimen_thickness'] * df['specimen_width']
df['temp_deviation'] = df['temperature_C'] - 23.0
df['log_strain_rate'] = np.log10(df['strain_rate'])

# Polynomial strain features for capturing nonlinear behavior
# These are physically meaningful because stress-strain relationships
# exhibit nonlinear work hardening and necking behavior
df['strain_squared'] = df['strain'] ** 2
df['strain_cubed'] = df['strain'] ** 3
df['strain_fourth'] = df['strain'] ** 4

print("Created physically meaningful features:")
print("\nMaterial property features:")
print("- grain_size_inv_sqrt: Hall-Petch relationship (strength ∝ 1/√d)")
print("- cross_section_area: Geometric property")
print("- temp_deviation: Temperature effect relative to room temp")
print("- log_strain_rate: Strain rate sensitivity (logarithmic scale)")
print("\nPolynomial strain features (for capturing nonlinear behavior):")
print("- strain_squared, strain_cubed, strain_fourth")
print("  These capture the nonlinear work-hardening and necking behavior")
print("  that occurs during plastic deformation.")

In [None]:
#@title R-squared scores for material parameters

# Select meaningful features to test
features_to_test = [
    'strain',
    'carbon_content',
    'grain_size_um',
    'grain_size_inv_sqrt',
    'temperature_C',
    'temp_deviation',
    'strain_rate',
    'log_strain_rate',
    'surface_roughness',
    'cross_section_area'
]

y = df["stress"]
r2_scores = {}

for feature in features_to_test:
    if feature in df.columns:
        x_feat = df[[feature]]
        model = LinearRegression()
        model.fit(x_feat, y)
        y_pred = model.predict(x_feat)
        r2 = r2_score(y, y_pred)
        r2_scores[feature] = r2

# Sort and print
r2_sorted = dict(sorted(r2_scores.items(), key=lambda item: item[1], reverse=True))

print("R² score per feature (univariate linear regression):\n")
for feat, r2 in r2_sorted.items():
    print(f"{feat:<25}: {r2:.6f}")

print("\nPhysical interpretation:")
print("- Strain has highest R² because stress fundamentally depends on strain")
print("- Material properties show lower R² because they are constant for this")
print("  single specimen test (no variation across measurements)")
print("- In a multi-specimen study, carbon_content and grain_size would show")
print("  much stronger correlations with yield strength and UTS")

In [None]:
#@title R-squared bar chart

plt.figure(figsize=(10, 5))
plt.bar(r2_sorted.keys(), r2_sorted.values(), color='steelblue')
plt.xticks(rotation=45, ha='right')
plt.ylabel("R² Score")
plt.title("Univariate R² Scores per Feature")
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

## Combining machine learning with materials science

Throughout this notebook, we have explored how machine learning tools like PCA and linear regression can help analyze materials data. However, the most powerful approach combines these mathematical tools with domain knowledge from materials science.

Key lessons:

First, feature selection should be guided by physical principles. We know from metallurgy that carbon content, grain size, temperature, and strain rate all influence mechanical properties through well-understood mechanisms. Features like the Hall-Petch relationship (strength proportional to inverse square root of grain size) are not arbitrary - they emerge from the physics of dislocation motion in crystals.

Second, model choice matters. A simple linear model can capture the elastic region of stress-strain behavior, but fails to represent plastic deformation and necking. However, by using polynomial features of strain (strain², strain³, strain⁴), we can capture the nonlinear work-hardening and necking behavior with linear regression. This approach is called polynomial regression, which is still linear regression because the coefficients appear linearly, even though the features themselves are nonlinear functions of strain.

Third, machine learning helps us discover patterns in complex data, but domain knowledge helps us interpret whether those patterns are meaningful or spurious. The polynomial strain features are physically meaningful because they mathematically represent the nonlinear constitutive behavior of materials under plastic deformation.

In the interactive demo below, you can select different material parameters and see how well linear regression captures their relationship to stress. Try selecting just strain first to see the linear fit, then add strain_squared and strain_cubed to capture the nonlinear behavior. Pay attention to how the R² value improves dramatically when you include polynomial terms.

In [None]:
#@title Interactive feature selection and modeling OLD VERSION

# Available features for selection
features = [
    'strain',
    'carbon_content',
    'grain_size_um',
    'grain_size_inv_sqrt',
    'temperature_C',
    'temp_deviation',
    'strain_rate',
    'log_strain_rate',
    'surface_roughness'
]

y = df['stress']
strain = df['strain']

checkboxes = [widgets.Checkbox(value=(feat == 'strain'), description=feat) for feat in features]
row1 = widgets.HBox(checkboxes[:5])
row2 = widgets.HBox(checkboxes[5:])
button = widgets.Button(description="Train & Plot")
output = widgets.Output()

def on_button_clicked(b):
    with output:
        clear_output()
        selected_features = [cb.description for cb in checkboxes if cb.value]
        if not selected_features:
            print("Please select at least one feature.")
            return

        X = df[selected_features]
        model = LinearRegression()
        model.fit(X, y)
        y_pred = model.predict(X)

        r2 = r2_score(y, y_pred)
        print(f"Selected features: {selected_features}")
        print(f"R² score: {r2:.6f}")
        print(f"\nCoefficients:")
        for f, c in zip(selected_features, model.coef_):
            print(f"  {f}: {c:.4e}")
        print(f"  Intercept: {model.intercept_:.4e}")

        # Plot modeled vs. true stress-strain relationship
        plt.figure(figsize=(8, 5))
        plt.scatter(strain, y / 1e6, alpha=0.4, label='Measured stress', color='orange', s=20)
        plt.scatter(strain, y_pred / 1e6, label='Predicted stress', color='blue', s=25)
        plt.plot(fit_curve['strain_grid'], fit_curve['sigma_eng_grid'] / 1e6,
                 'k--', linewidth=1.5, alpha=0.5, label='True behavior')
        plt.xlabel("Strain")
        plt.ylabel("Stress (MPa)")
        plt.title(f"Model Performance (R² = {r2:.4f})")
        plt.legend()
        plt.grid(True)
        plt.tight_layout()
        plt.show()

        print("\nTip: To reconstruct the nonlinear stress-strain curve:")
        print("1. Start with just 'strain' to see the linear fit (poor R²)")
        print("2. Add 'strain_squared' and 'strain_cubed' to capture work hardening")
        print("3. Optionally add 'strain_fourth' for even better fit in necking region")
        print("\nMaterial properties show weak correlation because they're constant")
        print("in this single-specimen test. In multi-specimen studies, they would")
        print("strongly correlate with yield strength and ultimate tensile strength.")

button.on_click(on_button_clicked)
display(widgets.VBox([row1, row2, button, output]))

In [None]:
#@title Interactive feature selection and modeling NEW VERSION

y = df['stress']
strain = df['strain']

# Create widget for polynomial degree selection
degree_slider = widgets.IntSlider(
    value=1,
    min=1,
    max=50,
    step=1,
    description='Polynomial Degree:',
    style={'description_width': 'initial'}
)
button = widgets.Button(description="Train & Plot")
output = widgets.Output()

def on_button_clicked(b):
    with output:
        clear_output()
        degree = degree_slider.value

        # Generate polynomial features up to specified degree
        X_data = {'strain': strain}
        for d in range(2, degree + 1):
            X_data[f'strain^{d}'] = strain ** d

        X = np.column_stack(list(X_data.values()))
        feature_names = list(X_data.keys())

        # Fit model
        model = LinearRegression()
        model.fit(X, y)
        y_pred = model.predict(X)

        r2 = r2_score(y, y_pred)
        print("="*60)
        print(f"Polynomial Degree: {degree}")
        print(f"Features: {feature_names}")
        print(f"R² score: {r2:.6f}")
        print(f"\nCoefficients:")
        for f, c in zip(feature_names, model.coef_):
            if f == 'strain':
                print(f"  {f:15s}: {c/1e9:.4f} GPa")
            else:
                print(f"  {f:15s}: {c:.4e}")
        print(f"  {'Intercept':15s}: {model.intercept_/1e6:.4f} MPa")
        print("="*60)

        # Plot modeled vs. true stress-strain relationship
        plt.figure(figsize=(10, 6))
        plt.scatter(strain, y / 1e6, alpha=0.5, label='Measured stress',
                   color='orange', s=30, edgecolors='k', linewidth=0.5)

        # Sort for smooth prediction line
        sort_idx = np.argsort(strain)
        plt.plot(strain[sort_idx], y_pred[sort_idx] / 1e6,
                'r-', linewidth=2.5, label=f'Polynomial fit (degree {degree})', alpha=0.8)

        plt.plot(fit_curve['strain_grid'], fit_curve['sigma_eng_grid'] / 1e6,
                 'b--', linewidth=1.5, alpha=0.6, label='True behavior')

        plt.xlabel("Strain", fontsize=12)
        plt.ylabel("Stress (MPa)", fontsize=12)
        plt.title(f"Model Performance (R² = {r2:.4f})", fontsize=14)
        plt.legend(fontsize=10)
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()

button.on_click(on_button_clicked)
display(widgets.VBox([degree_slider, button, output]))

# Logistic Regression: Predicting Material Failure

After analyzing the stress-strain behavior, you want to predict when the material will fail under complex loading conditions. In real engineering applications, materials experience multi-axial stress states - stress acting in multiple directions simultaneously. Your goal is to develop a model that can predict whether a material will yield (permanently deform) under these conditions.

## Understanding Material Failure

When engineers design structures, they need to know when materials will fail. For ductile materials like steel, failure typically means yielding - the onset of permanent plastic deformation. Under complex loading with multiple stress components, we need a way to combine these stresses into a single value that we can compare to the material's yield strength.

The most widely used criterion for ductile materials is the **von Mises yield criterion**, which states that yielding occurs when the distortion energy in the material reaches a critical value. This is based on the physical observation that materials fail due to shear stress, not hydrostatic (uniform) pressure.

## Principal Stresses and the von Mises Criterion

In any stressed material, we can identify three perpendicular directions called principal directions, along which the stresses are purely normal (no shear). These principal stresses are denoted $\sigma_1$, $\sigma_2$, and $\sigma_3$. For a 2D stress state (like a thin plate), $\sigma_3$ = 0.

The von Mises equivalent stress combines these principal stresses into a single value:

$$\sigma_{VM} = \sqrt{\sigma_1^2 - \sigma_1\sigma_2 + \sigma_2^2}$$

This equation has deep physical meaning - it represents the stress that would cause the same distortion energy as the actual multi-axial stress state. When $\sigma_{VM}$ exceeds the material's yield strength $\sigma_y$ ($\sigma_{VM} > \sigma_y$), the material yields.

## Data: Multi-axial Loading Tests

You've conducted experiments on steel specimens under various combinations of biaxial loading (stress in two perpendicular directions). Each test applied different principal stresses and recorded whether the specimen yielded.

In [None]:
#@title Generate physically realistic failure data

def generate_realistic_failure_data(N=300, seed=42):
    """
    Generate realistic 2D stress state failure data using the von Mises criterion.

    This function creates synthetic experimental data for biaxial loading tests.
    The failure prediction is based on von Mises equivalent stress compared to
    material yield strength, with realistic experimental scatter included.

    Parameters:
    -----------
    N : int
        Number of test samples to generate
    seed : int
        Random seed for reproducibility

    Returns:
    --------
    DataFrame with columns:
        - sigma1_MPa: Principal stress in first direction (MPa)
        - sigma2_MPa: Principal stress in second direction (MPa)
        - sigma_vm_MPa: von Mises equivalent stress (MPa)
        - sigma_y_MPa: Yield strength for each specimen (MPa)
        - failure: Binary indicator (1=yielded, 0=elastic)
    """
    rng = np.random.default_rng(seed)

    # Principal stresses in Pascals (converted to MPa for output)
    # Generate stress states covering a range from zero to well beyond yield
    sigma1 = rng.uniform(0, 450e6, size=N)
    sigma2 = rng.uniform(0, 450e6, size=N)

    # Yield strength with realistic specimen-to-specimen scatter
    # Mean: 350 MPa (typical for mild steel), StdDev: 30 MPa
    sigma_y = rng.normal(350e6, 30e6, size=N)

    # Calculate von Mises equivalent stress
    # This formula represents the distortion energy in the material
    sigma_vm = np.sqrt(sigma1**2 - sigma1 * sigma2 + sigma2**2)

    # Probabilistic failure based on how close σ_vm is to σ_y
    # Using sigmoid function to model gradual transition near yield point
    # The 0.10 factor controls the "fuzziness" of the boundary (10% of yield strength)
    failure_probability = 1.0 / (1.0 + np.exp(-(sigma_vm - sigma_y) / (0.10 * sigma_y)))
    failure = rng.binomial(1, failure_probability)

    # Return data in MPa for easier interpretation
    return pd.DataFrame({
        'sigma1_MPa': sigma1 / 1e6,
        'sigma2_MPa': sigma2 / 1e6,
        'sigma_vm_MPa': sigma_vm / 1e6,
        'sigma_y_MPa': sigma_y / 1e6,
        'failure': failure
    })

# Generate the failure data
df_failure = generate_realistic_failure_data(N=300, seed=42)

# Extract key variables for later use
sigma1 = df_failure['sigma1_MPa'].values
sigma2 = df_failure['sigma2_MPa'].values
failure = df_failure['failure'].values
yield_strength = 350  # Mean yield strength in MPa

# Save the data
df_failure.to_csv('biaxial_failure_data.csv', index=False)

print(f"Generated {len(df_failure)} biaxial loading tests")
print(f"Mean yield strength: {df_failure['sigma_y_MPa'].mean():.1f} MPa")
print(f"Yield strength std dev: {df_failure['sigma_y_MPa'].std():.1f} MPa")
print(f"Number of failures: {failure.sum()}/{len(df_failure)}")
print(f"Failure rate: {failure.sum()/len(df_failure)*100:.1f}%")

The plot below shows your experimental data. Blue points remained elastic (no permanent deformation, class=0), while red points yielded (class=1). The dashed line shows the theoretical von Mises yield surface - the boundary where yielding should occur according to the theory.

In [None]:
#@title Visualize the failure data with theoretical yield surface
fig, ax = plt.subplots(figsize=(8, 8))

# Plot experimental points
colors = ['blue' if f == 0 else 'red' for f in failure]
scatter = ax.scatter(sigma1, sigma2, c=colors, alpha=0.6, s=40,
                    edgecolors='black', linewidth=0.5)

# Plot theoretical von Mises yield surface (only first quadrant portion)
# For 2D stress state, the von Mises criterion forms an ellipse
t = np.linspace(0, 2*np.pi, 200)
vm_sigma1 = yield_strength * (np.cos(t) + np.sin(t)/np.sqrt(3))
vm_sigma2 = yield_strength * (2*np.sin(t)/np.sqrt(3))

ax.plot(vm_sigma1, vm_sigma2, 'k--', linewidth=2,
        label=f'von Mises surface (σ_y = {yield_strength} MPa)')

# Add labels and formatting
ax.set_xlabel('Principal Stress σ₁ (MPa)', fontsize=12)
ax.set_ylabel('Principal Stress σ₂ (MPa)', fontsize=12)
ax.set_title('Material Failure Under Biaxial Loading\n(von Mises Criterion)', fontsize=14)
ax.grid(True, alpha=0.3)

# Focus on first quadrant where data is located
ax.set_xlim(0, 500)
ax.set_ylim(0, 500)
ax.set_aspect('equal')

# Add legend for points
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor='blue', alpha=0.6, label='Elastic (no failure)'),
                  Patch(facecolor='red', alpha=0.6, label='Yielded (failure)'),
                  plt.Line2D([0], [0], color='k', linestyle='--',
                            label=f'Theoretical yield surface')]
ax.legend(handles=legend_elements, loc='upper right')

plt.tight_layout()
plt.show()

print("\nNote: Some scatter around the theoretical boundary is expected due to:")
print("- Specimen-to-specimen variation in yield strength")
print("- Experimental measurement uncertainty")
print("- Material inhomogeneities")
print("\nWe focus on the first quadrant (positive stresses in both directions)")
print("which represents biaxial tension - the most common loading scenario.")

## Understanding the von Mises Criterion as the boundary condition

A mathematically useful reformulation of the von Mises Criterion is:

$$\sqrt{\sigma_1^2 - \sigma_1\sigma_2 + \sigma_2^2} - \sigma_{y} = 0$$

This formulation defines a decision boundary that will return negative values when inside the yield surface and positive values when outside. This will be useful when solving the logistic regression problem.

Use this interactive tool to explore how different stress states relate to the von Mises equivalent stress. Move the point to see whether it would cause yielding.

In [None]:
#@title Interactive von Mises stress calculator with color intensity

def plot_stress_state(sigma1=100.0, sigma2=100.0):
    clear_output(wait=True)

    # Calculate von Mises stress
    sigma_vm = np.sqrt(sigma1**2 - sigma1*sigma2 + sigma2**2)

    # Calculate how far from yield surface (normalized)
    distance_ratio = sigma_vm / yield_strength

    # Calculate decision boundary value
    boundary_value = sigma_vm - yield_strength

    # Color intensity based on distance from yield surface
    # Similar to the simple decision boundary demo!
    if distance_ratio > 1:  # Yielded
        # Red intensity increases with distance beyond yield
        intensity = min((distance_ratio - 1) * 2, 1.0)  # Scale for visibility
        point_color = (1.0, 1.0 - intensity, 1.0 - intensity)
        status = "YIELDED (Failure)"
    else:  # Elastic
        # Blue intensity increases with safety (further from yield)
        intensity = 1.0 - distance_ratio
        point_color = (1.0 - intensity, 1.0 - intensity, 1.0)
        status = "ELASTIC (Safe)"

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

    # Left plot: Stress state in principal stress space
    # Plot theoretical yield surface
    t = np.linspace(0, 2*np.pi, 200)
    vm_sigma1 = yield_strength * (np.cos(t) + np.sin(t)/np.sqrt(3))
    vm_sigma2 = yield_strength * (2*np.sin(t)/np.sqrt(3))
    ax1.plot(vm_sigma1, vm_sigma2, 'k--', linewidth=2, alpha=0.5,
             label='von Mises yield surface')

    # Plot current point with color intensity
    ax1.scatter(sigma1, sigma2, s=200, c=[point_color], edgecolor='black',
               linewidth=2, zorder=5)

    # Add stress state vectors
    ax1.arrow(0, 0, sigma1, 0, head_width=10, head_length=15,
             fc='gray', ec='gray', alpha=0.5)
    ax1.arrow(0, 0, 0, sigma2, head_width=10, head_length=15,
             fc='gray', ec='gray', alpha=0.5)

    # Focus on first quadrant where data is located
    ax1.set_xlim(-20, 500)
    ax1.set_ylim(-20, 500)
    ax1.set_xlabel('σ₁ (MPa)', fontsize=12)
    ax1.set_ylabel('σ₂ (MPa)', fontsize=12)
    ax1.set_title('Principal Stress Space', fontsize=14)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal')
    ax1.legend()

    # Right plot: Stress information
    ax2.axis('off')

    # Create background color for info box matching point intensity
    if distance_ratio > 1:
        bg_color = 'red'
    else:
        bg_color = 'blue'

    info_text = f"""
    Current Stress State:
    ────────────────────
    σ₁ = {sigma1:.1f} MPa
    σ₂ = {sigma2:.1f} MPa

    von Mises Equivalent Stress:
    σ_VM = √(σ₁² - σ₁σ₂ + σ₂²)
    σ_VM = {sigma_vm:.1f} MPa

    Yield Strength: σ_y = {yield_strength} MPa

    Decision Boundary Value:
    σ_VM - σ_y = {boundary_value:.1f} MPa
    {"(> 0: outside boundary)" if boundary_value > 0 else "(< 0: inside boundary)" if boundary_value < 0 else "(= 0: on boundary)"}

    Distance Ratio: {distance_ratio:.2f}
    Safety Factor: {yield_strength/sigma_vm if sigma_vm > 0 else np.inf:.2f}

    Status: {status}
    """

    # Move info box up to make room for status messages
    ax2.text(0.1, 0.65, info_text, fontsize=12, family='monospace',
            verticalalignment='center',
            bbox=dict(boxstyle='round', facecolor=bg_color, alpha=0.2))

    # Add status messages with more space at the bottom
    posx, posy = 0.35, 0.06
    if distance_ratio > 1:
        ax2.text(posx, posy+0.07, '⚠️ Material has yielded!', fontsize=16,
                color='red', weight='bold', ha='center')
        ax2.text(posx, posy, 'Color intensity shows severity\n(darker = further beyond yield)',
                fontsize=10, style='italic', ha='center', color='darkred')
    else:
        ax2.text(posx, posy+0.07, '✓ Material remains elastic', fontsize=16,
                color='green', weight='bold', ha='center')
        ax2.text(posx, posy, 'Color intensity shows safety margin\n(darker = closer to yield)',
                fontsize=10, style='italic', ha='center', color='darkgreen')

    plt.tight_layout()
    plt.show()

# Create interactive sliders
sigma1_slider = widgets.FloatSlider(
    value=100.0, min=0.0, max=450.0, step=10.0,
    description='σ₁ (MPa):', continuous_update=False
)
sigma2_slider = widgets.FloatSlider(
    value=100.0, min=0.0, max=450.0, step=10.0,
    description='σ₂ (MPa):', continuous_update=False
)

widgets.interact(plot_stress_state, sigma1=sigma1_slider, sigma2=sigma2_slider)

## Problem definition

In reality we do not move the points with respect to the decision boundary. The decision boundary is something we want to derive from data! Thus, the points stay fixed while the ellipse moves across data to find the best fit.

In the demo below you can move around the ellipse yourself and try to get as low cost as possible. What this cost is will be explained later. Pay attention to the relation between the cost and how well the ellipse fits the data.

In [None]:
#@title Manual Optimization: Move and Rotate the Decision Boundary to Minimize Cost

def plot_manual_optimization(scale=1.0, shift_x=0.0, shift_y=0.0, rotation=0.0):
    clear_output(wait=True)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

    # Use the existing failure data
    X = df_failure[['sigma1_MPa', 'sigma2_MPa']].values
    y_true = df_failure['failure'].values

    # Calculate the "yield strength" for this ellipse position/size
    adjusted_yield = yield_strength * scale

    # Generate the base ellipse (decision boundary)
    t = np.linspace(0, 2*np.pi, 200)
    base_sigma1 = adjusted_yield * (np.cos(t) + np.sin(t)/np.sqrt(3))
    base_sigma2 = adjusted_yield * (2*np.sin(t)/np.sqrt(3))

    # Apply rotation (convert degrees to radians)
    theta = np.radians(rotation)
    cos_theta = np.cos(theta)
    sin_theta = np.sin(theta)

    # Rotate the ellipse points
    ellipse_sigma1 = cos_theta * base_sigma1 - sin_theta * base_sigma2 + shift_x
    ellipse_sigma2 = sin_theta * base_sigma1 + cos_theta * base_sigma2 + shift_y

    # Calculate predictions based on adjusted and rotated boundary
    y_pred_proba = []
    for i in range(len(X)):
        # Translate point to ellipse-centered coordinates
        s1_translated = X[i, 0] - shift_x
        s2_translated = X[i, 1] - shift_y

        # Rotate point back to align with standard von Mises orientation
        s1_rotated = cos_theta * s1_translated + sin_theta * s2_translated
        s2_rotated = -sin_theta * s1_translated + cos_theta * s2_translated

        # Calculate von Mises stress in rotated frame
        sigma_vm = np.sqrt(s1_rotated**2 - s1_rotated*s2_rotated + s2_rotated**2)

        # Convert to probability using sigmoid-like function
        ratio = sigma_vm / adjusted_yield
        # Use a steep sigmoid to approximate hard classification
        prob = 1 / (1 + np.exp(-10*(ratio - 1)))
        y_pred_proba.append(prob)

    y_pred_proba = np.array(y_pred_proba)

    # Calculate logistic loss (cross-entropy)
    epsilon = 1e-7  # Small value to avoid log(0)
    cost = -np.mean(y_true * np.log(y_pred_proba + epsilon) +
                    (1 - y_true) * np.log(1 - y_pred_proba + epsilon))

    # Calculate accuracy for reference
    y_pred_binary = (y_pred_proba > 0.5).astype(int)
    accuracy = np.mean(y_pred_binary == y_true)

    # Count misclassifications
    false_positives = np.sum((y_pred_binary == 1) & (y_true == 0))
    false_negatives = np.sum((y_pred_binary == 0) & (y_true == 1))

    # Left plot: Data points and adjustable boundary
    colors = ['blue' if f == 0 else 'red' for f in y_true]
    ax1.scatter(X[:, 0], X[:, 1], c=colors, alpha=0.6, s=40,
                edgecolors='black', linewidth=0.5, label='Data points')

    # Plot user's adjusted boundary
    ax1.plot(ellipse_sigma1, ellipse_sigma2, 'orange', linewidth=3,
             label=f'Your boundary')

    # Add center point of user's ellipse
    ax1.plot(shift_x, shift_y, 'go', markersize=8, label='Ellipse center')

    ax1.set_xlabel('σ₁ (MPa)')
    ax1.set_ylabel('σ₂ (MPa)')
    ax1.set_title('Adjust the Decision Boundary')
    # Zoom in to focus on data (first quadrant) but show enough context
    # to see the ellipse structure
    ax1.set_xlim(-100, 550)
    ax1.set_ylim(-100, 550)
    ax1.set_aspect('equal')
    ax1.grid(True, alpha=0.3)
    ax1.legend(loc='upper left', fontsize=9)

    # Right plot: Cost visualization
    ax2.clear()

    # Create a bar chart showing cost components
    categories = ['Current\nCost', 'Best\nPossible']

    # Calculate best possible cost (with optimal boundary)
    y_best_proba = []
    for i in range(len(X)):
        sigma_vm = np.sqrt(X[i,0]**2 - X[i,0]*X[i,1] + X[i,1]**2)
        ratio = sigma_vm / yield_strength
        prob = 1 / (1 + np.exp(-10*(ratio - 1)))
        y_best_proba.append(prob)
    y_best_proba = np.array(y_best_proba)
    best_cost = -np.mean(y_true * np.log(y_best_proba + epsilon) +
                        (1 - y_true) * np.log(1 - y_best_proba + epsilon))

    costs = [cost, best_cost]
    colors_bar = ['orange', 'green']
    bars = ax2.bar(categories, costs, color=colors_bar, alpha=0.7)

    # Add value labels on bars
    for bar, val in zip(bars, costs):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.3f}', ha='center', va='bottom', fontsize=12, fontweight='bold')

    ax2.set_ylabel('Cross-Entropy Cost', fontsize=12)
    ax2.set_title(f'Cost Function (Lower is Better)', fontsize=14)
    ax2.set_ylim(0, max(cost * 1.2, 1.0))
    ax2.grid(True, axis='y', alpha=0.3)

    # Add metrics text
    metrics_text = f"""
    Accuracy: {accuracy:.1%}
    False Positives: {false_positives}
    False Negatives: {false_negatives}

    Parameters:
    Scale: {scale:.2f}
    Shift: ({shift_x:.0f}, {shift_y:.0f})
    Rotation: {rotation:.1f}°
    """
    ax2.text(0.98, 0.5, metrics_text, transform=ax2.transAxes,
             fontsize=10, family='monospace', ha='right',
             bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))

    # Add feedback
    cost_diff = cost - best_cost
    if cost_diff < 0.01:
        feedback = "Excellent! You've found the optimal boundary!"
        feedback_color = 'green'
    elif cost_diff < 0.05:
        feedback = "Very close! Fine-tune the parameters a bit more."
        feedback_color = 'darkgreen'
    elif cost_diff < 0.15:
        feedback = "Good progress! Keep adjusting to reduce the cost."
        feedback_color = 'orange'
    else:
        feedback = "Keep trying! Adjust scale, position, and rotation."
        feedback_color = 'red'

    ax2.text(0.5, -0.15, feedback, transform=ax2.transAxes,
             ha='center', fontsize=12, color=feedback_color, weight='bold')

    plt.tight_layout()
    plt.show()

    # Print detailed feedback
    print(f"Current Cost: {cost:.4f}")
    print(f"Optimal Cost: {best_cost:.4f}")
    print(f"Difference: {cost_diff:.4f}")
    print(f"Accuracy: {accuracy:.1%}")
    print("\nHint: The von Mises ellipse has a specific size, position, and orientation.")
    print("Watch how the cost changes as you adjust each parameter!")

# Create sliders
scale_slider = widgets.FloatSlider(
    value=1.0, min=0.5, max=1.5, step=0.02,
    description='Scale:', continuous_update=False,
    style={'description_width': 'initial'}
)

shift_x_slider = widgets.FloatSlider(
    value=-60.0, min=-100, max=100, step=5,
    description='Shift X (σ₁):', continuous_update=False,
    style={'description_width': 'initial'}
)

shift_y_slider = widgets.FloatSlider(
    value=-60.0, min=-100, max=100, step=5,
    description='Shift Y (σ₂):', continuous_update=False,
    style={'description_width': 'initial'}
)

rotation_slider = widgets.FloatSlider(
    value=90.0, min=-180, max=180, step=2,
    description='Rotation (°):', continuous_update=False,
    style={'description_width': 'initial'}
)

# Display instructions and sliders
print("=" * 60)
print("MANUAL OPTIMIZATION CHALLENGE - 4D Parameter Space")
print("=" * 60)
print("Your task: Find the optimal decision boundary by minimizing the cost!")
print("\nYou now have 4 parameters to optimize:")
print("• Scale: Changes the size of the ellipse")
print("• Shift X/Y: Moves the ellipse center in stress space")
print("• Rotation: Rotates the ellipse orientation")
print("\nThis mimics what gradient descent does automatically -")
print("searching through parameter space to minimize the cost function!")
print("=" * 60)

widgets.interact(plot_manual_optimization,
                scale=scale_slider,
                shift_x=shift_x_slider,
                shift_y=shift_y_slider,
                rotation=rotation_slider)

## Logistic Regression for Failure Prediction

Now we'll use logistic regression to learn the failure boundary from the experimental data. Unlike linear regression which predicts continuous values, logistic regression predicts probabilities - perfect for our binary classification problem (failed/not failed).

### The Physical Hypothesis

For the von Mises criterion, we know the theoretical relationship involves σ₁² - σ₁σ₂ + σ₂². Let's see if logistic regression can discover this relationship from the experimental data alone.

Our hypothesis for logistic regression is:
$$P(\text{failure}) = h_\theta(x^{(i)}) =  \frac{1}{1 + e^{-(\theta_0 + \theta_1\sigma_1 + \theta_2\sigma_2 + \theta_3\sigma_1^2 + \theta_4\sigma_1\sigma_2 + \theta_5\sigma_2^2)}}$$

This allows the model to learn polynomial relationships between the stresses. The definition of the probability is called a sigmoid function and it is the standard way the probabilities are given by the logistic regression method.

In [None]:
#@title Sigmoid plot with classification visualization

# Define sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Generate input range for the curve
z = np.linspace(-10, 10, 300)
y = sigmoid(z)

# Generate sample points to show as colored dots
z_points = np.linspace(-8, 8, 17)  # Sample points along z-axis
y_points = sigmoid(z_points)

# Calculate colors for each point based on distance from z=0
colors = []
for z_val in z_points:
    sig_val = sigmoid(z_val)

    if abs(z_val) < 0.1:  # Near zero - white
        colors.append((1.0, 1.0, 1.0))
    elif z_val > 0:  # Positive z - red (failure)
        # Intensity increases with distance from zero
        intensity = min(abs(z_val) / 8, 1.0)
        colors.append((1.0, 1.0 - intensity, 1.0 - intensity))
    else:  # Negative z - blue (safe)
        # Intensity increases with distance from zero
        intensity = min(abs(z_val) / 8, 1.0)
        colors.append((1.0 - intensity, 1.0 - intensity, 1.0))

# Plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the sigmoid curve
ax.plot(z, y, color='black', linewidth=2, zorder=1, alpha=0.7)

# Plot colored dots
for i, (z_val, y_val, color) in enumerate(zip(z_points, y_points, colors)):
    ax.scatter(z_val, y_val, s=150, c=[color], edgecolor='black',
              linewidth=1.5, zorder=3)

# Add special emphasis on z=0 point
ax.scatter(0, 0.5, s=200, c='white', edgecolor='black',
          linewidth=2, zorder=4, marker='D')

# Reference lines
ax.axvline(0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.axhline(0.5, color='gray', linestyle='--', linewidth=1, alpha=0.5)

# Add colored regions to show classification zones
ax.axvspan(-10, 0, alpha=0.1, color='blue', label='Safe region (z<0)')
ax.axvspan(0, 10, alpha=0.1, color='red', label='Failure region (z>0)')

# Labels and formatting
ax.set_xlabel('z = decision boundary value', fontsize=12)
ax.set_ylabel(r'$\sigma(z)$ = P(failure)', fontsize=12)
ax.set_title('Sigmoid Function: Converting Distance to Probability\n'
            'Color intensity shows confidence in classification', fontsize=14)

# Add text annotations
ax.text(-5, 0.15, 'High confidence\nSAFE', ha='center', fontsize=10,
        color='darkblue', weight='bold')
ax.text(5, 0.85, 'High confidence\nFAILURE', ha='center', fontsize=10,
        color='darkred', weight='bold')
ax.text(0.2, 0.52, 'Decision\nboundary', ha='left', fontsize=9,
        color='black', style='italic')

# Custom legend
from matplotlib.patches import Patch
legend_elements = [
    plt.Line2D([0], [0], color='black', linewidth=2, label=r'$\sigma(z) = \frac{1}{1 + e^{-z}}$'),
    Patch(facecolor='red', alpha=0.3, label='P(failure) > 0.5'),
    Patch(facecolor='blue', alpha=0.3, label='P(failure) < 0.5'),
    plt.scatter([], [], c='white', edgecolor='black', s=100, label='z = 0 (threshold)')
]
ax.legend(handles=legend_elements, loc='upper left')

ax.grid(True, alpha=0.3)
ax.set_xlim(-10, 10)
ax.set_ylim(-0.05, 1.05)

plt.tight_layout()
plt.show()

# Add explanatory text below
print("Key Insights:")
print("• z < 0: Material is safe (blue) - more negative = more confident")
print("• z = 0: Decision boundary (white) - P(failure) = 0.5")
print("• z > 0: Material fails (red) - more positive = more confident")
print("\nFor von Mises: z ∝ (σ_VM² - σ_y²)")
print("The sigmoid smoothly converts this continuous value into a probability!")

## Cost function

Just like in linear regression, we need to find a way to measure what makes for a good or bad decision boundary. In the logistic regression case, the cost function is defined as:

$$ J(\theta) = \frac{1}{m} \sum_{i=1}^m \left[ - y^{(i)} \log(h_\theta(x^{(i)})) - (1-y^{(i)}) \log(1-h_\theta(x^{(i)})) \right] $$

Here once again, the cost function is shaped by the data as we sum over every data point. Since $y$ can be either 0 or 1, only one of the logarithms contributes to the cost function. Now, why is $J(\theta)$ defined this way? Let's take a look at the log components plotted below. If $y=1$ (left plot) then the cost for this data point will be close to zero if the sigmoid is indeed close to 1. And sigmoid will approach a value close to one if the decision boundary $h_{\theta}(x_0, x_1)$ returns a positive value. In this situation the decision boundary predicts correctly the value $y=1$. If the opposite happens, that is, $h_{\theta}(x_0, x_1)$ is negative then the sigmoid value will be negative. In turn, the cost increases sharply. Similar reasoning follows for the case when $y=0$.

In [None]:
#@title Cost function components plot

# Define sigmoid and related functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.linspace(-10, 10, 500)
sig = sigmoid(x)
log_sig = -np.log(sig)
log_one_minus_sig = -np.log(1 - sig)

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: sigmoid and log(sigmoid)
axes[0].plot(x, sig, label=r'$\sigma(x)$', color='blue')
axes[0].plot(x, log_sig, label=r'$-\log(\sigma(x))$', color='red')
axes[0].set_title("Sigmoid and log(sigmoid)")
axes[0].legend()
axes[0].grid(True)

# Subplot 2: sigmoid and log(1 - sigmoid)
axes[1].plot(x, sig, label=r'$\sigma(x)$', color='blue')
axes[1].plot(x, log_one_minus_sig, label=r'$-\log(1 - \sigma(x))$', color='orange')
axes[1].set_title("Sigmoid and log(1 - sigmoid)")
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

## Update step

Once again, the update is defined using the gradient of the cost function $J(\theta)$. In our 2D example:

$$
\begin{cases}
\theta_0 - \alpha \dfrac{\partial J(\theta)}{\partial \theta_0} & \text{for } i = 0 \\
\theta_1 - \alpha \dfrac{\partial J(\theta)}{\partial \theta_1} & \text{for } i = 1 \\
\theta_2 - \alpha \dfrac{\partial J(\theta)}{\partial \theta_2} & \text{for } i = 2 \\
\end{cases}
$$

Where the partial derivative is defined as:

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)})-y^{(i)}\right)x_j^{(i)}
$$


Notice that this formula, like in the case of linear regression, **is of the form of a quadratic surface with a single global minimum**. In fact, **it is the same formula despite different cost function definitions**. Now we are ready to train our logistic regression model!

In [None]:
#@title Train logistic regression model

# Prepare features - include polynomial terms to capture von Mises relationship
X = df_failure[['sigma1_MPa', 'sigma2_MPa']].values
y = df_failure['failure'].values

# Create polynomial features up to degree 2
# This gives us: [1, σ₁, σ₂, σ₁², σ₁σ₂, σ₂²]
poly = PolynomialFeatures(degree=2, include_bias=True)
X_poly = poly.fit_transform(X)

# Train logistic regression
log_reg = LogisticRegression(max_iter=1000, random_state=42)
log_reg.fit(X_poly, y)

# Print the learned coefficients
feature_names = poly.get_feature_names_out(['σ₁', 'σ₂'])
print("Learned coefficients:")
print("─" * 40)
for name, coef in zip(feature_names, log_reg.coef_[0]):
    print(f"{name:8s}: {coef:8.4f}")
print(f"Intercept: {log_reg.intercept_[0]:8.4f}")

# Model accuracy
y_pred = log_reg.predict(X_poly)
accuracy = accuracy_score(y, y_pred)
print(f"\nModel accuracy: {accuracy:.2%}")

# Note about the coefficients
print("\nPhysical interpretation:")
print("The model should learn that σ₁² and σ₂² have positive coefficients")
print("while σ₁*σ₂ has a negative coefficient, matching von Mises theory.")

### Comparing Learned vs Theoretical Failure Boundary

Let's see how well the logistic regression model learned the failure criterion compared to the theoretical von Mises surface.

In [None]:
#@title Visualize learned decision boundary vs theoretical (with threshold slider)

def plot_boundary_with_threshold(threshold=0.5):
    clear_output(wait=True)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

    # Create grid for decision boundary - expanded range to prevent clipping
    xx, yy = np.meshgrid(np.linspace(-50, 550, 250),
                         np.linspace(-50, 550, 250))
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    grid_poly = poly.transform(grid_points)
    Z = log_reg.predict_proba(grid_poly)[:, 1].reshape(xx.shape)

    # Left plot: Learned boundary
    colors = ['blue' if f == 0 else 'red' for f in failure]
    ax1.scatter(sigma1, sigma2, c=colors, alpha=0.4, s=30, edgecolors='black', linewidth=0.5)

    # Add theoretical von Mises FIRST (so it appears below)
    t = np.linspace(0, 2*np.pi, 200)
    vm_sigma1 = yield_strength * (np.cos(t) + np.sin(t)/np.sqrt(3))
    vm_sigma2 = yield_strength * (2*np.sin(t)/np.sqrt(3))
    ax1.plot(vm_sigma1, vm_sigma2, 'k--', linewidth=2, alpha=0.7, label='Theoretical (von Mises)')

    # Draw contour at the selected threshold AFTER (so it appears on top)
    contour = ax1.contour(xx, yy, Z, levels=[threshold], colors='orange', linewidths=3)
    ax1.clabel(contour, inline=True, fmt=f'P={threshold:.2f}')

    ax1.set_xlabel('σ₁ (MPa)')
    ax1.set_ylabel('σ₂ (MPa)')
    ax1.set_title(f'Learned Decision Boundary (Threshold = {threshold:.2f})')
    ax1.set_xlim(0, 500)
    ax1.set_ylim(0, 500)
    ax1.set_aspect('equal')
    ax1.grid(True, alpha=0.3)

    # Update legend based on whether reference line is shown
    legend_labels = [f'Learned boundary (P={threshold:.2f})', 'Theoretical von Mises']
    ax1.legend(legend_labels)

    # Right plot: Probability heatmap
    im = ax2.contourf(xx, yy, Z, levels=20, cmap='RdBu_r', alpha=0.8)

    # Add theoretical von Mises FIRST (lower z-order)
    ax2.plot(vm_sigma1, vm_sigma2, 'k--', linewidth=2, alpha=0.7, zorder=2)

    # Draw contour at the selected threshold AFTER (higher z-order)
    threshold_contour = ax2.contour(xx, yy, Z, levels=[threshold], colors='yellow', linewidths=3, zorder=3)

    # Also show 0.5 contour for reference if threshold is different
    if abs(threshold - 0.5) > 0.01:
        reference_contour = ax2.contour(xx, yy, Z, levels=[0.5], colors='white', linewidths=1,
                                       linestyles='--', alpha=0.5, zorder=3)

    # Create legend handles list
    legend_handles = []
    legend_labels = []

    # Add current threshold line to legend
    legend_handles.append(plt.Line2D([0], [0], color='yellow', linewidth=3))
    legend_labels.append(f'Current threshold (P={threshold:.2f})')

    # Also show 0.5 contour for reference if threshold is different
    if abs(threshold - 0.5) > 0.01:
        legend_handles.append(plt.Line2D([0], [0], color='white', linewidth=1, linestyle='--'))
        legend_labels.append('Standard threshold (P=0.50)')

    # Add theoretical von Mises
    legend_handles.append(plt.Line2D([0], [0], color='black', linewidth=2, linestyle='--'))
    legend_labels.append('Theoretical von Mises')

    # Add legend to right plot
    ax2.legend(legend_handles, legend_labels, loc='upper right')

    plt.colorbar(im, ax=ax2, label='Probability of Failure')

    ax2.set_xlabel('σ₁ (MPa)')
    ax2.set_ylabel('σ₂ (MPa)')
    ax2.set_title('Failure Probability Map')
    # Expanded limits to prevent clipping at higher thresholds
    ax2.set_xlim(0, 550)
    ax2.set_ylim(0, 550)
    ax2.set_aspect('equal')
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Print information about the threshold effect
    if threshold < 0.5:
        print(f"Threshold = {threshold:.2f}: More conservative (larger safety region)")
        print("The model predicts failure for fewer cases - reducing false negatives but increasing false positives")
    elif threshold > 0.5:
        print(f"Threshold = {threshold:.2f}: Less conservative (smaller safety region)")
        print("The model predicts failure for more cases - reducing false positives but increasing false negatives")
    else:
        print(f"Threshold = {threshold:.2f}: Standard classification boundary")
        print("This is the default threshold where P(failure) = P(no failure)")

# Create threshold slider
threshold_slider = widgets.FloatSlider(
    value=0.5,
    min=0.1,
    max=0.9,
    step=0.05,
    description='Threshold:',
    continuous_update=False,
    readout_format='.2f'
)

# Display the interactive plot
print("Adjust the threshold to see how it affects the decision boundary:")
print("• Lower threshold (< 0.5): More conservative, predicts failure earlier")
print("• Higher threshold (> 0.5): Less conservative, allows higher stresses before predicting failure\n")

widgets.interact(plot_boundary_with_threshold, threshold=threshold_slider)

## Key Insights and Engineering Applications

### What We Learned

Through this exploration of linear and logistic regression applied to materials science, several key insights emerge.

First, the logistic regression successfully learned an approximation of the von Mises failure criterion from experimental data alone. The learned boundary closely matches the theoretical von Mises ellipse, validating both our experimental data and the machine learning approach.

Second, the model discovered the quadratic relationship. By including polynomial features, the model could learn that failure depends on squared stress terms and their interaction, matching the von Mises formula.

Third, physics-informed features improve learning. For both linear and logistic regression, including features based on our understanding of material mechanics helped the models learn the true relationships.

Fourth, the probability output from logistic regression provides valuable uncertainty information. Unlike a hard threshold, logistic regression gives us failure probabilities, which is useful for safety factors in design.

### Engineering Applications

These approaches are used extensively in real engineering contexts. Design optimization allows engineers to quickly evaluate if a design will fail under complex loading without running expensive physical tests.

Safety analysis benefits from the probability output, which helps engineers apply appropriate safety factors based on the consequences of failure. Material testing becomes more efficient because we can reduce the number of expensive tests by learning from existing data.

### The Key Lesson

The most important takeaway is that machine learning is most powerful when combined with domain knowledge. We used our understanding of material mechanics to choose appropriate features, select suitable model architectures, interpret results in the context of physical laws, and validate predictions against theoretical expectations.

This synergy between data-driven methods and physical understanding represents the future of engineering analysis. Machine learning provides the tools to extract patterns from complex data, while materials science provides the framework to ensure those patterns are physically meaningful and generalizable.