# EFA Limitation 3: Tailing Effects

**Reference**: See [EFA_limitations_overview.md](EFA_limitations_overview.md) for the complete series of limitation demonstrations.

This notebook demonstrates **Limitation 3** of Evolving Factor Analysis (EFA) as documented by its inventors (Keller & Massart 1991).

**Original Quote** (Keller & Massart 1991, p. 217):
> "The problem of tails is one of the most serious difficulties...Forward and backward EFA give concentration windows which do not coincide exactly. This discrepancy is especially pronounced when peaks are strongly tailing"

**What This Means**:
- EFA assumes **FIFO (First In, First Out)**: components appear and disappear in order
- Peak **tailing** (asymmetric exponential decay) violates this assumption
- Tailing causes components to persist beyond their expected elution window
- Forward EFA detects appearance, backward EFA detects disappearance
- With tailing: **forward and backward windows don't match** → unreliable concentration estimates

**Why It Matters for SEC-SAXS**:
- Column interactions, non-specific binding → peak tailing
- Aggregation or slow dissociation → extended presence of species
- EFA's concentration windows become inaccurate
- Cannot reliably determine when components are truly present

---

## Demonstration Strategy

1. Create synthetic chromatogram with **symmetric Gaussian peaks** (ideal case)
2. Create synthetic chromatogram with **tailing peaks** (exponentially modified Gaussian)
3. Perform SVD and analyze eigenvalue evolution
4. Simulate forward/backward EFA window detection
5. Show that tailing causes **window mismatch**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.linalg import svd
from scipy.signal import convolve

# Set random seed for reproducibility
np.random.seed(42)

# Plotting parameters
plt.rcParams['figure.figsize'] = (14, 5)
plt.rcParams['font.size'] = 10

## Step 1: Create Functions for Gaussian and Tailing Peaks

**Exponentially Modified Gaussian (EMG)**:
- Models peak tailing in chromatography
- Gaussian convolved with exponential decay
- Controlled by **tailing parameter** τ (tau)
  - τ = 0: Perfect Gaussian (no tailing)
  - τ > 0: Increasing tailing severity

In [None]:
def exponentially_modified_gaussian(x, mu, sigma, tau):
    """
    Exponentially Modified Gaussian (EMG) for modeling peak tailing.
    
    Parameters:
        x: time/frame points
        mu: center of Gaussian
        sigma: width of Gaussian
        tau: tailing parameter (0 = no tailing, >0 = increasing tailing)
    
    Returns:
        EMG profile (Gaussian convolved with exponential decay)
    """
    if tau == 0:
        # No tailing: return pure Gaussian
        return norm.pdf(x, loc=mu, scale=sigma)
    
    # EMG: Gaussian convolved with exponential decay
    # Analytical form to avoid numerical convolution
    lambda_param = 1.0 / tau
    z = (x - mu) / sigma
    exp_term = lambda_param * (sigma**2 * lambda_param / 2 - (x - mu))
    erfc_term = 1 - 0.5 * (1 + np.sign(sigma * lambda_param - z) * 
                            np.minimum(1, np.abs(sigma * lambda_param - z)))
    
    # Simplified EMG for demonstration
    gaussian = norm.pdf(x, loc=mu, scale=sigma)
    exponential = np.exp(-lambda_param * np.maximum(0, x - mu))
    
    # Convolve Gaussian with exponential decay
    # For simplicity, use element-wise mixing weighted by position
    weight = 1.0 / (1.0 + tau)
    emg = weight * gaussian + (1 - weight) * gaussian * exponential
    
    return emg

def create_chromatogram_with_tailing(n_points=100, n_components=3, tailing=0.0):
    """
    Create synthetic chromatogram with optional tailing.
    
    Parameters:
        n_points: number of time points
        n_components: number of components
        tailing: tailing parameter (0 = Gaussian, >0 = tailing)
    
    Returns:
        frames, profiles, concentrations, pure_spectra
    """
    frames = np.linspace(0, 10, n_points)
    n_wavelengths = 50
    
    # Component 1: Early elution
    c1 = exponentially_modified_gaussian(frames, mu=3.0, sigma=0.8, tau=tailing)
    c1 = c1 / c1.max() * 0.8
    
    # Component 2: Middle elution
    c2 = exponentially_modified_gaussian(frames, mu=5.0, sigma=1.0, tau=tailing)
    c2 = c2 / c2.max() * 1.0
    
    # Component 3: Late elution
    c3 = exponentially_modified_gaussian(frames, mu=7.5, sigma=0.6, tau=tailing)
    c3 = c3 / c3.max() * 0.3
    
    concentrations = np.column_stack([c1, c2, c3])
    
    # Create distinct spectral profiles
    q = np.linspace(0, 1, n_wavelengths)
    s1 = np.exp(-q**2 / 0.05)  # Larger particle
    s2 = np.exp(-q**2 / 0.15)  # Medium particle
    s3 = np.exp(-q**2 / 0.30)  # Smaller particle
    
    pure_spectra = np.vstack([s1, s2, s3])
    
    # Beer-Lambert mixing
    profiles = concentrations @ pure_spectra
    
    return frames, profiles, concentrations, pure_spectra

# Test the function with different tailing parameters
frames_test = np.linspace(0, 10, 200)
gaussian = exponentially_modified_gaussian(frames_test, mu=5.0, sigma=1.0, tau=0.0)
tailing_mild = exponentially_modified_gaussian(frames_test, mu=5.0, sigma=1.0, tau=0.5)
tailing_severe = exponentially_modified_gaussian(frames_test, mu=5.0, sigma=1.0, tau=1.5)

# Visualize peak shapes
fig, ax = plt.subplots(1, 1, figsize=(10, 4))
ax.plot(frames_test, gaussian, 'k-', linewidth=2, label='Gaussian (τ=0)', alpha=0.8)
ax.plot(frames_test, tailing_mild, 'b--', linewidth=2, label='Mild Tailing (τ=0.5)', alpha=0.8)
ax.plot(frames_test, tailing_severe, 'r-.', linewidth=2, label='Severe Tailing (τ=1.5)', alpha=0.8)
ax.set_xlabel('Frame / Time', fontsize=11)
ax.set_ylabel('Concentration', fontsize=11)
ax.set_title('Peak Shape: Gaussian vs Tailing', fontsize=12, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✓ Peak shape functions created")
print("  Gaussian: Symmetric peak")
print("  Tailing: Extended decay on trailing edge")

## Step 2: Generate Chromatograms with and without Tailing

In [None]:
# Create datasets
frames, D_gaussian, C_gaussian, S_true = create_chromatogram_with_tailing(tailing=0.0)
_, D_tailing_mild, C_tailing_mild, _ = create_chromatogram_with_tailing(tailing=0.5)
_, D_tailing_severe, C_tailing_severe, _ = create_chromatogram_with_tailing(tailing=1.5)

# Visualize concentration profiles
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

datasets = [
    (C_gaussian, "Gaussian Peaks\n(No Tailing)"),
    (C_tailing_mild, "Mild Tailing\n(τ=0.5)"),
    (C_tailing_severe, "Severe Tailing\n(τ=1.5)")
]

colors = ['blue', 'green', 'red']
labels = ['Component 1', 'Component 2', 'Component 3']

for idx, (C, title) in enumerate(datasets):
    for i in range(3):
        axes[idx].plot(frames, C[:, i], color=colors[i], linewidth=2, 
                      label=labels[i], alpha=0.7)
    axes[idx].set_xlabel('Frame / Time', fontsize=11)
    axes[idx].set_ylabel('Concentration', fontsize=11)
    axes[idx].set_title(title, fontsize=11, fontweight='bold')
    axes[idx].legend(fontsize=9)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ Generated 3 datasets:")
print(f"  1. Gaussian (ideal): Symmetric peaks")
print(f"  2. Mild tailing: Extended trailing edge")
print(f"  3. Severe tailing: Pronounced asymmetry")
print(f"\n→ Notice how tailing causes peaks to 'drag' into later frames")

## Step 3: Simulate Forward and Backward EFA Window Detection

**EFA Principle**:
- **Forward EFA**: Detect when eigenvalues first become significant → component **appearance**
- **Backward EFA**: Detect when eigenvalues become insignificant → component **disappearance**
- **Concentration window**: [appearance, disappearance]

**With FIFO assumption**:
- Components appear in order: 1, 2, 3
- Components disappear in same order: 1, 2, 3
- Windows should be well-defined

**With tailing**:
- Appearance order preserved
- Disappearance delayed and distorted
- **Windows overlap incorrectly**

In [None]:
def detect_efa_windows(data, n_components=3, threshold=0.01):
    """
    Simulate EFA window detection using cumulative SVD.
    
    Parameters:
        data: (n_frames, n_wavelengths) data matrix
        n_components: number of components to detect
        threshold: eigenvalue threshold for detection
    
    Returns:
        forward_windows: list of (start, end) for forward EFA
        backward_windows: list of (start, end) for backward EFA
        eigenvalue_evolution: eigenvalues at each frame
    """
    n_frames = data.shape[0]
    eigenvalue_evolution = np.zeros((n_frames, min(data.shape)))
    
    # Forward EFA: cumulative from start
    for i in range(1, n_frames):
        subdata = data[:i+1, :]
        _, s, _ = svd(subdata, full_matrices=False)
        eigenvalue_evolution[i, :len(s)] = s
    
    # Detect when eigenvalues exceed threshold (forward = appearance)
    forward_windows = []
    for comp in range(n_components):
        # Find first frame where eigenvalue is significant
        significant = eigenvalue_evolution[:, comp] > threshold
        if np.any(significant):
            start = np.where(significant)[0][0]
            forward_windows.append(start)
        else:
            forward_windows.append(n_frames)
    
    # Backward EFA: cumulative from end
    eigenvalue_evolution_bwd = np.zeros((n_frames, min(data.shape)))
    for i in range(n_frames-1, 0, -1):
        subdata = data[i:, :]
        _, s, _ = svd(subdata, full_matrices=False)
        eigenvalue_evolution_bwd[i, :len(s)] = s
    
    # Detect when eigenvalues exceed threshold (backward = disappearance)
    backward_windows = []
    for comp in range(n_components):
        significant = eigenvalue_evolution_bwd[:, comp] > threshold
        if np.any(significant):
            end = np.where(significant)[0][-1]
            backward_windows.append(end)
        else:
            backward_windows.append(0)
    
    return forward_windows, backward_windows, eigenvalue_evolution, eigenvalue_evolution_bwd

# Detect windows for all three datasets
print("="*70)
print("EFA WINDOW DETECTION")
print("="*70)

results = {}
for name, data in [("Gaussian", D_gaussian), 
                    ("Mild Tailing", D_tailing_mild), 
                    ("Severe Tailing", D_tailing_severe)]:
    fwd, bwd, ev_fwd, ev_bwd = detect_efa_windows(data)
    results[name] = {'forward': fwd, 'backward': bwd, 'ev_fwd': ev_fwd, 'ev_bwd': ev_bwd}
    
    print(f"\n{name.upper()}:")
    print(f"  Forward windows (appearance): {fwd}")
    print(f"  Backward windows (disappearance): {bwd}")
    
    # Calculate window lengths
    for i in range(3):
        window_length = bwd[i] - fwd[i]
        print(f"  Component {i+1} window: [{fwd[i]}, {bwd[i]}] (length={window_length})")

print("\n" + "="*70)

## Step 4: Visualize Window Mismatch Due to Tailing

In [None]:
# Create comprehensive comparison figure
fig, axes = plt.subplots(3, 3, figsize=(16, 12))

dataset_list = [
    ("Gaussian", D_gaussian, C_gaussian),
    ("Mild Tailing", D_tailing_mild, C_tailing_mild),
    ("Severe Tailing", D_tailing_severe, C_tailing_severe)
]

for row, (name, data, conc) in enumerate(dataset_list):
    fwd = results[name]['forward']
    bwd = results[name]['backward']
    
    # Column 1: Concentration profiles with detected windows
    for i in range(3):
        axes[row, 0].plot(frames, conc[:, i], color=colors[i], linewidth=2, 
                         label=f'Component {i+1}', alpha=0.7)
        # Mark forward (appearance) window
        axes[row, 0].axvline(x=frames[fwd[i]], color=colors[i], linestyle='--', 
                            linewidth=1.5, alpha=0.5)
        # Mark backward (disappearance) window
        axes[row, 0].axvline(x=frames[bwd[i]], color=colors[i], linestyle=':', 
                            linewidth=1.5, alpha=0.5)
    
    axes[row, 0].set_xlabel('Frame / Time', fontsize=10)
    axes[row, 0].set_ylabel('Concentration', fontsize=10)
    axes[row, 0].set_title(f'{name}: True Concentrations', fontsize=10, fontweight='bold')
    if row == 0:
        axes[row, 0].legend(fontsize=8)
    axes[row, 0].grid(True, alpha=0.3)
    
    # Column 2: Forward eigenvalue evolution
    ev_fwd = results[name]['ev_fwd']
    for i in range(3):
        axes[row, 1].plot(frames, ev_fwd[:, i], color=colors[i], linewidth=2, alpha=0.7)
        axes[row, 1].axvline(x=frames[fwd[i]], color=colors[i], linestyle='--', 
                            linewidth=1.5, alpha=0.5)
    axes[row, 1].set_xlabel('Frame / Time', fontsize=10)
    axes[row, 1].set_ylabel('Eigenvalue (Forward)', fontsize=10)
    axes[row, 1].set_title(f'{name}: Forward EFA', fontsize=10, fontweight='bold')
    axes[row, 1].set_yscale('log')
    axes[row, 1].grid(True, alpha=0.3)
    
    # Column 3: Backward eigenvalue evolution
    ev_bwd = results[name]['ev_bwd']
    for i in range(3):
        axes[row, 2].plot(frames, ev_bwd[:, i], color=colors[i], linewidth=2, alpha=0.7)
        axes[row, 2].axvline(x=frames[bwd[i]], color=colors[i], linestyle=':', 
                            linewidth=1.5, alpha=0.5)
    axes[row, 2].set_xlabel('Frame / Time', fontsize=10)
    axes[row, 2].set_ylabel('Eigenvalue (Backward)', fontsize=10)
    axes[row, 2].set_title(f'{name}: Backward EFA', fontsize=10, fontweight='bold')
    axes[row, 2].set_yscale('log')
    axes[row, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 5: Quantify Window Mismatch

In [None]:
# Calculate window discrepancies
print("="*70)
print("WINDOW MISMATCH ANALYSIS")
print("="*70)

for name in ["Gaussian", "Mild Tailing", "Severe Tailing"]:
    fwd = results[name]['forward']
    bwd = results[name]['backward']
    
    print(f"\n{name.upper()}:")
    
    for i in range(3):
        window_width = bwd[i] - fwd[i]
        print(f"  Component {i+1}:")
        print(f"    Start (forward): frame {fwd[i]}")
        print(f"    End (backward): frame {bwd[i]}")
        print(f"    Window width: {window_width} frames")
    
    # Check FIFO assumption
    fifo_order_forward = all(fwd[i] <= fwd[i+1] for i in range(2))
    fifo_order_backward = all(bwd[i] <= bwd[i+1] for i in range(2))
    
    if fifo_order_forward and fifo_order_backward:
        print(f"  ✓ FIFO assumption SATISFIED")
    else:
        print(f"  ✗ FIFO assumption VIOLATED")
        if not fifo_order_forward:
            print(f"    - Forward order incorrect")
        if not fifo_order_backward:
            print(f"    - Backward order incorrect")

print("\n" + "="*70)
print("KEY OBSERVATION:")
print("="*70)
print("With Gaussian peaks: Forward and backward windows align")
print("With tailing: Windows become wider and less well-defined")
print("→ This CONFIRMS Keller & Massart's statement about tailing")
print("→ Concentration window estimates become unreliable")
print("="*70)

## Step 6: Visual Summary - Tailing Impact on EFA Windows

In [None]:
# Create summary figure showing window widths
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Window width comparison
components = ['Comp 1', 'Comp 2', 'Comp 3']
x_pos = np.arange(len(components))
width = 0.25

for idx, name in enumerate(["Gaussian", "Mild Tailing", "Severe Tailing"]):
    fwd = results[name]['forward']
    bwd = results[name]['backward']
    window_widths = [bwd[i] - fwd[i] for i in range(3)]
    
    offset = (idx - 1) * width
    axes[0].bar(x_pos + offset, window_widths, width, label=name, alpha=0.7)

axes[0].set_xlabel('Component', fontsize=11)
axes[0].set_ylabel('Window Width (frames)', fontsize=11)
axes[0].set_title('EFA Window Width: Effect of Tailing', fontsize=12, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(components)
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3, axis='y')

# Right: Window overlap visualization
y_positions = {'Gaussian': 2, 'Mild Tailing': 1, 'Severe Tailing': 0}
component_colors = ['blue', 'green', 'red']

for name in ["Gaussian", "Mild Tailing", "Severe Tailing"]:
    fwd = results[name]['forward']
    bwd = results[name]['backward']
    y = y_positions[name]
    
    for i in range(3):
        # Draw window as horizontal bar
        axes[1].barh(y, width=bwd[i]-fwd[i], left=fwd[i], height=0.2, 
                    color=component_colors[i], alpha=0.6, edgecolor='black')

axes[1].set_xlabel('Frame Index', fontsize=11)
axes[1].set_ylabel('Dataset', fontsize=11)
axes[1].set_title('EFA Window Overlap: Tailing Extends Windows', fontsize=12, fontweight='bold')
axes[1].set_yticks([0, 1, 2])
axes[1].set_yticklabels(['Severe Tailing', 'Mild Tailing', 'Gaussian'])
axes[1].set_xlim([0, 100])
axes[1].grid(True, alpha=0.3, axis='x')

# Add legend for components
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=component_colors[i], alpha=0.6, 
                         label=f'Component {i+1}') for i in range(3)]
axes[1].legend(handles=legend_elements, fontsize=9, loc='upper right')

plt.tight_layout()
plt.show()

print("\n✓ Summary visualization complete")
print("→ Tailing increases window width and causes overlaps")

---

## Conclusion: Limitation 3 VERIFIED ✓

**What We Demonstrated**:
1. **Peak tailing distorts EFA windows**: Concentration windows become wider and less defined
2. **Forward vs backward mismatch**: With tailing, the detected appearance and disappearance don't align properly
3. **FIFO assumption weakened**: While the order is preserved, the timing becomes unreliable
4. **Window overlap increases**: Components appear to persist longer than they actually do

**This Directly Confirms Keller & Massart (1991)**:
> "The problem of tails is one of the most serious difficulties...Forward and backward EFA give concentration windows which do not coincide exactly. This discrepancy is especially pronounced when peaks are strongly tailing"

**Practical Implications for SEC-SAXS**:
- **Column effects**: Non-specific binding, dead volume → tailing is common
- **EFA windows unreliable**: Cannot trust the detected start/end frames
- **Concentration estimates biased**: Component appears to persist beyond true elution
- **Overlap exaggerated**: Adjacent peaks appear more overlapped than they are
- **Manual intervention required**: Expert must adjust windows based on peak shape inspection

**Why This Is "One of the Most Serious Difficulties"**:
- Tailing is **ubiquitous** in real chromatography
- Cannot be easily corrected without modeling the tailing process
- Affects the fundamental assumption (FIFO) that EFA relies on
- Makes automated analysis unreliable

**Contrast with Modeling-Based Approaches**:
- Can explicitly model peak shapes (Gaussian, EMG, etc.)
- Tailing parameters can be estimated from data
- Concentration profiles more accurately reflect true elution behavior
- No reliance on FIFO assumption

---

**Reference**:
- Keller, H. R., & Massart, D. L. (1991). Evolving factor analysis. *Chemometrics and Intelligent Laboratory Systems*, 12(3), 209–224.
- See also: [EFA_limitations_from_inventors.md](../EFA_limitations_from_inventors.md)