# Week 12: Functional Safety & Redundancy

### Topics Covered

- ISO 26262 framework; Redundant systems (steering, braking, compute); Fail-Operational vs. Fail-Safe design; Safety of the Intended Functionality (SOTIF)

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand the key concepts
2. Implement algorithms
3. Apply techniques to real-world problems

---

## Setup

Import required libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Circle, FancyBboxPatch, Arrow
from matplotlib.collections import PatchCollection
import itertools

# Set random seed for reproducibility
np.random.seed(42)

## 1. Functional Safety Fundamentals

Functional safety ensures that systems operate safely even in the presence of faults and failures.

### Why Functional Safety for Autonomous Vehicles?

- **Safety-Critical**: Failures can result in loss of life
- **Complex Systems**: Millions of lines of code, numerous sensors
- **Uncertain Environments**: Cannot predict all scenarios
- **Regulatory Requirements**: Legal compliance necessary for deployment

### Key Concepts

**1. Hazard**: Potential source of harm
- Example: Unintended acceleration, brake failure

**2. Risk**: Combination of probability and severity of harm
$$
\text{Risk} = \text{Probability} \times \text{Severity}
$$

**3. Safety Goal**: Top-level safety requirement to mitigate hazards
- Example: "Vehicle shall not accelerate unintentionally"

**4. Fault**: Abnormal condition that can cause failure
- **Systematic Fault**: Reproducible (e.g., software bug)
- **Random Hardware Fault**: Non-deterministic (e.g., sensor failure)

**5. Failure**: Deviation from intended behavior
- **Single Point Failure**: One fault causes safety goal violation
- **Latent Failure**: Undetected fault (dormant until second fault occurs)

---

## 2. ISO 26262 Standard

ISO 26262 is the international standard for functional safety of road vehicles.

### ASIL - Automotive Safety Integrity Level

ASIL defines the rigor required for safety measures. Four levels: A (lowest) to D (highest).

**ASIL Classification based on:**

1. **Severity (S)**: Consequence of hazardous event
   - S0: No injuries
   - S1: Light/moderate injuries
   - S2: Severe/life-threatening injuries
   - S3: Life-threatening/fatal injuries

2. **Exposure (E)**: Probability of operational situation
   - E0: Incredible (<0.1% of operating time)
   - E1: Very low probability (≈1%)
   - E2: Low probability (≈10%)
   - E3: Medium probability (≈50%)
   - E4: High probability (>50%)

3. **Controllability (C)**: Ability to avoid harm
   - C0: Controllable in general
   - C1: Simply controllable (≥99% of drivers)
   - C2: Normally controllable (≥90% of drivers)
   - C3: Difficult to control or uncontrollable (<90%)

**ASIL Determination Table:**

$$
\begin{array}{c|c|c|c|c}
S & E & C0 & C1 & C2 & C3 \\
\hline
S1 & E1-E4 & \text{QM} & \text{QM} & \text{A} & \text{B} \\
S2 & E1-E4 & \text{QM} & \text{A} & \text{B} & \text{C} \\
S3 & E1-E4 & \text{A} & \text{B} & \text{C} & \text{D}
\end{array}
$$

Where QM = Quality Management (no ASIL required)

### ISO 26262 Development Process

**V-Model:**

```
Concept → System Design → HW/SW Design → Implementation
   ↓                                              ↓
Safety Goals                              Unit Testing
   ↓                                              ↓
Requirements      ←  Integration Testing  ←  Module Testing
   ↓
Validation
```

**Key Phases:**
1. **Hazard Analysis & Risk Assessment (HARA)**: Identify hazards, determine ASIL
2. **Functional Safety Concept**: Define safety goals and requirements
3. **Technical Safety Concept**: Architectural design with safety mechanisms
4. **Implementation**: Hardware and software development
5. **Verification**: Testing against requirements
6. **Validation**: Ensure system meets safety goals

### Safety Mechanisms

**Detection:**
- Plausibility checks
- Redundancy comparison
- Watchdog timers
- Memory protection

**Mitigation:**
- Fail-safe states
- Graceful degradation
- Emergency procedures
- Driver warnings

---

## 2.1 ASIL Decomposition

High ASIL requirements are expensive. Decomposition allows splitting into lower ASILs.

**Example:** ASIL D requirement can be decomposed into:
- ASIL B(D) + ASIL B(D)  [both must be independent]
- ASIL C(D) + ASIL A(D)

**Rules:**
- Decomposed elements must be **independent** (no common cause failures)
- Notation: ASIL X(Y) means "ASIL X achieving ASIL Y through decomposition"
- Total safety integrity must equal original ASIL

In [None]:
# Implementation: ASIL Calculator and HARA Tool

class ASILCalculator:
    """Calculate ASIL rating based on Severity, Exposure, Controllability"""
    
    def __init__(self):
        # ASIL determination matrix [Severity][Exposure][Controllability]
        # S0 not included (no injuries = no ASIL)
        self.asil_table = {
            # S1: Light/moderate injuries
            'S1': {
                'E1': ['QM', 'QM', 'A', 'B'],
                'E2': ['QM', 'QM', 'A', 'B'],
                'E3': ['QM', 'QM', 'A', 'B'],
                'E4': ['QM', 'QM', 'A', 'B']
            },
            # S2: Severe injuries
            'S2': {
                'E1': ['QM', 'A', 'B', 'C'],
                'E2': ['QM', 'A', 'B', 'C'],
                'E3': ['QM', 'A', 'B', 'C'],
                'E4': ['QM', 'A', 'B', 'C']
            },
            # S3: Fatal injuries
            'S3': {
                'E1': ['A', 'B', 'C', 'D'],
                'E2': ['A', 'B', 'C', 'D'],
                'E3': ['A', 'B', 'C', 'D'],
                'E4': ['A', 'B', 'C', 'D']
            }
        }
    
    def calculate_asil(self, severity, exposure, controllability):
        """
        Calculate ASIL rating
        
        Parameters:
        - severity: 'S0', 'S1', 'S2', or 'S3'
        - exposure: 'E0', 'E1', 'E2', 'E3', or 'E4'
        - controllability: 'C0', 'C1', 'C2', or 'C3'
        
        Returns: ASIL rating ('QM', 'A', 'B', 'C', or 'D')
        """
        if severity == 'S0' or exposure == 'E0' or controllability == 'C0':
            return 'QM'
        
        c_index = int(controllability[1]) - 1  # C1->0, C2->1, C3->2
        return self.asil_table[severity][exposure][c_index]
    
    def decompose_asil(self, asil):
        """
        Show valid decompositions of ASIL rating
        
        Returns: List of valid decomposition pairs
        """
        decompositions = {
            'D': [('C', 'A'), ('B', 'B')],
            'C': [('B', 'A'), ('A', 'A')],
            'B': [('A', 'QM')],
            'A': [('QM', 'QM')]
        }
        
        if asil in decompositions:
            return [(f"ASIL {a}({asil})", f"ASIL {b}({asil})") 
                    for a, b in decompositions[asil]]
        return []


class HazardAnalysis:
    """Hazard Analysis and Risk Assessment (HARA) tool"""
    
    def __init__(self):
        self.calculator = ASILCalculator()
        self.hazards = []
    
    def add_hazard(self, name, severity, exposure, controllability, description=""):
        """Add a hazard to the analysis"""
        asil = self.calculator.calculate_asil(severity, exposure, controllability)
        
        hazard = {
            'name': name,
            'description': description,
            'severity': severity,
            'exposure': exposure,
            'controllability': controllability,
            'asil': asil
        }
        self.hazards.append(hazard)
        return hazard
    
    def generate_report(self):
        """Generate HARA report"""
        print("=" * 80)
        print("HAZARD ANALYSIS AND RISK ASSESSMENT REPORT")
        print("=" * 80)
        print(f"\nTotal Hazards Identified: {len(self.hazards)}\n")
        
        # Group by ASIL
        asil_groups = {}
        for hazard in self.hazards:
            asil = hazard['asil']
            if asil not in asil_groups:
                asil_groups[asil] = []
            asil_groups[asil].append(hazard)
        
        # Print by ASIL level (highest first)
        for asil in ['D', 'C', 'B', 'A', 'QM']:
            if asil in asil_groups:
                print(f"\n{'='*80}")
                print(f"ASIL {asil}: {len(asil_groups[asil])} hazard(s)")
                print(f"{'='*80}")
                
                for i, hazard in enumerate(asil_groups[asil], 1):
                    print(f"\n{i}. {hazard['name']}")
                    if hazard['description']:
                        print(f"   Description: {hazard['description']}")
                    print(f"   Severity: {hazard['severity']}, "
                          f"Exposure: {hazard['exposure']}, "
                          f"Controllability: {hazard['controllability']}")
                    print(f"   → ASIL: {hazard['asil']}")


# Demonstration: HARA for Autonomous Vehicle
def demonstrate_hara():
    hara = HazardAnalysis()
    
    # Add hazards
    print("Performing Hazard Analysis for Autonomous Vehicle...\n")
    
    hara.add_hazard(
        name="Unintended Acceleration",
        severity='S3',
        exposure='E4',
        controllability='C3',
        description="Vehicle accelerates without driver command at high speed"
    )
    
    hara.add_hazard(
        name="Brake System Failure",
        severity='S3',
        exposure='E4',
        controllability='C3',
        description="Complete loss of braking capability"
    )
    
    hara.add_hazard(
        name="Sensor Misdetection (Pedestrian)",
        severity='S3',
        exposure='E3',
        controllability='C3',
        description="Perception system fails to detect pedestrian in path"
    )
    
    hara.add_hazard(
        name="Lane Keeping Assist Malfunction",
        severity='S2',
        exposure='E4',
        controllability='C2',
        description="System provides incorrect steering input"
    )
    
    hara.add_hazard(
        name="Infotainment System Crash",
        severity='S1',
        exposure='E2',
        controllability='C1',
        description="Touchscreen becomes unresponsive"
    )
    
    hara.add_hazard(
        name="GPS Localization Error",
        severity='S2',
        exposure='E3',
        controllability='C2',
        description="Vehicle position estimate off by >5 meters"
    )
    
    # Generate report
    hara.generate_report()
    
    # Show ASIL distribution
    asil_counts = {}
    for hazard in hara.hazards:
        asil = hazard['asil']
        asil_counts[asil] = asil_counts.get(asil, 0) + 1
    
    # Visualize
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # ASIL distribution bar chart
    asils = ['D', 'C', 'B', 'A', 'QM']
    counts = [asil_counts.get(asil, 0) for asil in asils]
    colors = ['#d32f2f', '#f57c00', '#fbc02d', '#388e3c', '#1976d2']
    
    ax1.bar(asils, counts, color=colors, edgecolor='black', linewidth=1.5)
    ax1.set_xlabel('ASIL Level', fontsize=12, fontweight='bold')
    ax1.set_ylabel('Number of Hazards', fontsize=12, fontweight='bold')
    ax1.set_title('Hazard Distribution by ASIL', fontsize=14, fontweight='bold')
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Add count labels on bars
    for i, (asil, count) in enumerate(zip(asils, counts)):
        if count > 0:
            ax1.text(i, count + 0.1, str(count), ha='center', fontweight='bold', fontsize=11)
    
    # ASIL decomposition example
    ax2.axis('off')
    ax2.set_xlim(0, 10)
    ax2.set_ylim(0, 10)
    
    ax2.text(5, 9, 'ASIL D Decomposition Example', fontsize=14, fontweight='bold', ha='center')
    
    # Original requirement
    rect = FancyBboxPatch((3, 7), 4, 1, boxstyle="round,pad=0.1", 
                          facecolor='#d32f2f', edgecolor='black', linewidth=2)
    ax2.add_patch(rect)
    ax2.text(5, 7.5, 'ASIL D Requirement', ha='center', va='center', 
            color='white', fontweight='bold', fontsize=11)
    
    # Decomposition options
    y_pos = 5
    decompositions = [
        ('ASIL B(D)', 'ASIL B(D)', '#f57c00'),
        ('ASIL C(D)', 'ASIL A(D)', '#fbc02d')
    ]
    
    for i, (asil1, asil2, color) in enumerate(decompositions):
        y = y_pos - i * 2.5
        
        # Left component
        rect1 = FancyBboxPatch((1, y), 3, 0.8, boxstyle="round,pad=0.05",
                              facecolor=color, edgecolor='black', linewidth=1.5, alpha=0.8)
        ax2.add_patch(rect1)
        ax2.text(2.5, y + 0.4, asil1, ha='center', va='center', fontweight='bold', fontsize=10)
        
        # Plus sign
        ax2.text(5, y + 0.4, '+', ha='center', va='center', fontsize=16, fontweight='bold')
        
        # Right component
        rect2 = FancyBboxPatch((6, y), 3, 0.8, boxstyle="round,pad=0.05",
                              facecolor=color, edgecolor='black', linewidth=1.5, alpha=0.8)
        ax2.add_patch(rect2)
        ax2.text(7.5, y + 0.4, asil2, ha='center', va='center', fontweight='bold', fontsize=10)
        
        # Arrow from original to decomposition
        ax2.annotate('', xy=(2.5, y + 0.8), xytext=(5, 7),
                    arrowprops=dict(arrowstyle='->', lw=1.5, color='gray', alpha=0.6))
        ax2.annotate('', xy=(7.5, y + 0.8), xytext=(5, 7),
                    arrowprops=dict(arrowstyle='->', lw=1.5, color='gray', alpha=0.6))
    
    ax2.text(5, 0.5, 'Note: Decomposed elements must be INDEPENDENT', 
            ha='center', fontsize=10, style='italic', color='red')
    
    plt.tight_layout()
    plt.show()
    
    print("\n" + "="*80)
    print("ASIL Decomposition Examples:")
    print("="*80)
    calc = ASILCalculator()
    for asil in ['D', 'C', 'B']:
        decomps = calc.decompose_asil(asil)
        print(f"\nASIL {asil} can be decomposed into:")
        for d1, d2 in decomps:
            print(f"  • {d1} + {d2}")

demonstrate_hara()

---

## 3. Redundancy and Fault Tolerance

Redundancy is the duplication of critical components to increase reliability and safety.

### Types of Redundancy

**1. Hardware Redundancy**
- **Active (Hot) Redundancy**: All components run simultaneously, voting determines output
- **Standby (Cold) Redundancy**: Backup activates only when primary fails
- **Hybrid Redundancy**: Combination of active and standby

**2. Information Redundancy**
- Checksums, parity bits
- Error-correcting codes (ECC)
- Cyclic Redundancy Check (CRC)

**3. Time Redundancy**
- Execute operation multiple times
- Compare results to detect transient faults

**4. Analytical Redundancy**
- Use mathematical models to cross-check sensor readings
- Example: Estimate vehicle speed from wheel sensors vs. GPS vs. IMU

### Redundancy Architectures

**1-out-of-2 (1oo2)**: System works if ≥1 component works (parallel)
- Increases availability
- Used for non-safety-critical functions

**2-out-of-2 (2oo2)**: System works only if both components work (series)
- Increases safety (prevents false positives)
- Requires both to agree

**2-out-of-3 (2oo3)**: Majority voting among 3 components
- Tolerates single component failure
- Common for safety-critical systems

### Reliability Metrics

**Mean Time Between Failures (MTBF)**:
$$
MTBF = \frac{1}{\lambda}
$$
Where $\lambda$ is the failure rate.

**For parallel system (1oo2) with identical components**:
$$
\lambda_{system} = \frac{\lambda^2}{2\lambda} = \frac{\lambda}{2}
$$
$$
MTBF_{system} = \frac{3}{2\lambda}
$$

**Availability**:
$$
A = \frac{MTBF}{MTBF + MTTR}
$$
Where MTTR = Mean Time To Repair

### Automotive Redundancy Examples

**Braking System:**
- Dual-circuit hydraulic brakes
- Electronic Stability Control (ESC) redundant with ABS
- Electric parking brake as backup

**Steering System:**
- Dual power steering motors
- Mechanical fallback column
- Independent angle sensors

**Compute:**
- Lockstep CPUs (execute same instructions, compare outputs)
- Diverse redundancy (different processors, different code)
- Safety co-processor (watchdog)

**Sensors:**
- Camera + Radar + Lidar (sensor fusion)
- Multiple cameras with overlapping FOV
- Redundant IMUs and GPS receivers

---

## 3.1 Fail-Safe vs. Fail-Operational

**Fail-Safe**: System enters safe state upon failure
- Example: Railway crossing gates **close** (safe) when power fails
- For AVs: Pull over safely, hand control to driver

**Fail-Operational**: System continues operating despite failure
- Required for Level 4/5 autonomy (no driver available)
- Requires redundant actuators and compute
- More expensive but necessary for driverless operation

**Graceful Degradation**: Partial functionality maintained
- Example: If one lidar fails, reduce speed but continue driving
- Transitions: Full Function → Degraded → Minimal Risk Condition → Safe Stop

In [None]:
# Implementation: Redundancy System Simulation

class RedundantSystem:
    """Simulate redundant system with fault injection"""
    
    def __init__(self, n_components, voting_scheme='majority'):
        """
        Initialize redundant system
        
        Parameters:
        - n_components: Number of redundant components
        - voting_scheme: 'majority', 'unanimous', or 'any'
        """
        self.n_components = n_components
        self.voting_scheme = voting_scheme
        self.component_states = [True] * n_components  # True = working
        self.failure_rates = [0.001] * n_components  # Per time step
    
    def inject_fault(self, component_id):
        """Inject fault into specific component"""
        if 0 <= component_id < self.n_components:
            self.component_states[component_id] = False
    
    def repair(self, component_id):
        """Repair specific component"""
        if 0 <= component_id < self.n_components:
            self.component_states[component_id] = True
    
    def vote(self, readings):
        """
        Perform voting on component readings
        
        Parameters:
        - readings: List of sensor readings from each component
        
        Returns: (voted_value, is_valid)
        """
        # Filter out failed components
        valid_readings = [r for i, r in enumerate(readings) 
                         if self.component_states[i]]
        
        if len(valid_readings) == 0:
            return None, False
        
        if self.voting_scheme == 'majority':
            # Require majority agreement
            if len(valid_readings) < (self.n_components + 1) // 2:
                return None, False
            # Simple average (for continuous values)
            # For discrete: use mode
            return np.median(valid_readings), True
        
        elif self.voting_scheme == 'unanimous':
            # All must agree (within tolerance)
            if len(valid_readings) != self.n_components:
                return None, False
            if np.std(valid_readings) < 0.1:  # Agreement threshold
                return np.mean(valid_readings), True
            return None, False
        
        elif self.voting_scheme == 'any':
            # At least one working (1oo2, 1oo3, etc.)
            return valid_readings[0], True
    
    def compute_reliability(self, time_horizon=1000):
        """
        Compute system reliability over time using Monte Carlo
        
        Returns: Probability system is functional at time_horizon
        """
        n_trials = 1000
        functional_count = 0
        
        for _ in range(n_trials):
            # Reset states
            states = [True] * self.n_components
            
            # Simulate failures over time
            for t in range(time_horizon):
                for i in range(self.n_components):
                    if states[i] and np.random.random() < self.failure_rates[i]:
                        states[i] = False
            
            # Check if system is still functional
            working_count = sum(states)
            
            if self.voting_scheme == 'majority':
                if working_count >= (self.n_components + 1) // 2:
                    functional_count += 1
            elif self.voting_scheme == 'unanimous':
                if working_count == self.n_components:
                    functional_count += 1
            elif self.voting_scheme == 'any':
                if working_count >= 1:
                    functional_count += 1
        
        return functional_count / n_trials


# Demonstration: Compare redundancy architectures
def demonstrate_redundancy():
    print("=== Redundancy Architecture Comparison ===\n")
    
    architectures = [
        ('Single', 1, 'any'),
        ('1-out-of-2', 2, 'any'),
        ('2-out-of-2', 2, 'unanimous'),
        ('2-out-of-3', 3, 'majority'),
        ('3-out-of-3', 3, 'unanimous')
    ]
    
    time_horizon = 1000
    results = []
    
    for name, n_comp, scheme in architectures:
        system = RedundantSystem(n_comp, scheme)
        reliability = system.compute_reliability(time_horizon)
        results.append((name, reliability))
        print(f"{name:15} Reliability at t={time_horizon}: {reliability:.4f}")
    
    # Visualize
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Bar chart of reliabilities
    names, reliabilities = zip(*results)
    colors = ['#e57373', '#64b5f6', '#81c784', '#ffb74d', '#9575cd']
    
    bars = ax1.bar(range(len(names)), reliabilities, color=colors, 
                   edgecolor='black', linewidth=1.5, alpha=0.8)
    ax1.set_xticks(range(len(names)))
    ax1.set_xticklabels(names, rotation=15, ha='right')
    ax1.set_ylabel('Reliability', fontsize=12, fontweight='bold')
    ax1.set_title('System Reliability Comparison', fontsize=14, fontweight='bold')
    ax1.set_ylim([0, 1.1])
    ax1.axhline(y=0.9, color='r', linestyle='--', linewidth=1.5, alpha=0.7, label='Target: 0.9')
    ax1.legend()
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Add value labels on bars
    for i, (bar, rel) in enumerate(zip(bars, reliabilities)):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.02,
                f'{rel:.3f}', ha='center', va='bottom', fontweight='bold', fontsize=10)
    
    # Reliability over time for different architectures
    time_steps = np.linspace(0, 5000, 50)
    
    for name, n_comp, scheme in architectures[:4]:  # Plot first 4
        system = RedundantSystem(n_comp, scheme)
        rel_over_time = []
        
        for t in time_steps:
            rel = system.compute_reliability(int(t))
            rel_over_time.append(rel)
        
        ax2.plot(time_steps, rel_over_time, linewidth=2.5, marker='o', 
                markevery=5, markersize=6, label=name, alpha=0.8)
    
    ax2.set_xlabel('Time Steps', fontsize=12, fontweight='bold')
    ax2.set_ylabel('Reliability', fontsize=12, fontweight='bold')
    ax2.set_title('Reliability Degradation Over Time', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim([0, 1.05])
    
    plt.tight_layout()
    plt.show()
    
    # Voting example
    print("\n=== Voting Mechanism Demo ===\n")
    
    system_2oo3 = RedundantSystem(3, 'majority')
    
    # Scenario 1: All working, slight disagreement
    readings = [100.2, 100.1, 100.3]
    result, valid = system_2oo3.vote(readings)
    print(f"Scenario 1 - All sensors working:")
    print(f"  Readings: {readings}")
    print(f"  Voted value: {result}, Valid: {valid}\n")
    
    # Scenario 2: One sensor fails
    system_2oo3.inject_fault(2)
    readings = [100.2, 100.1, 150.0]  # Third sensor faulty
    result, valid = system_2oo3.vote(readings)
    print(f"Scenario 2 - Sensor 3 failed:")
    print(f"  Readings: {readings}")
    print(f"  Component states: {system_2oo3.component_states}")
    print(f"  Voted value: {result}, Valid: {valid}\n")
    
    # Scenario 3: Two sensors fail
    system_2oo3.inject_fault(1)
    readings = [100.2, 150.0, 150.0]
    result, valid = system_2oo3.vote(readings)
    print(f"Scenario 3 - Sensors 2 and 3 failed:")
    print(f"  Readings: {readings}")
    print(f"  Component states: {system_2oo3.component_states}")
    print(f"  Voted value: {result}, Valid: {valid}")

demonstrate_redundancy()

---

## 4. Safety of the Intended Functionality (SOTIF)

ISO/PAS 21448 addresses hazards from **functional insufficiencies** and **misuse**, not just random faults.

### ISO 26262 vs. SOTIF

| Aspect | ISO 26262 | SOTIF (ISO 21448) |
|--------|-----------|-------------------|
| **Focus** | Malfunctions due to faults | Insufficient performance & misuse |
| **Failure Type** | Random hardware, systematic software bugs | Limitations of algorithms, sensors |
| **Example** | Sensor fails (outputs garbage) | Sensor works but can't detect object in fog |
| **Root Cause** | Component failure | Design/specification insufficiency |

### SOTIF Scenarios

**1. Known Unsafe Scenarios**: Identified limitations
- Example: Camera-based detection fails in direct sunlight
- Mitigation: Use radar redundancy, avoid scenarios

**2. Unknown Unsafe Scenarios**: Not yet discovered edge cases
- Goal: Reduce through extensive testing and validation
- Methods: Simulation, on-road testing, formal verification

**3. Known Safe Scenarios**: Verified to work correctly
- Example: Lidar detection in clear weather, <100m range

**4. Unknown Safe Scenarios**: Assumed safe but not validated
- Risk: False confidence
- Mitigation: Conservative ODD (Operational Design Domain)

### SOTIF Development Process

```
1. Specify Intended Functionality
   ↓
2. Identify Hazards from Limitations
   ↓
3. Define Triggering Conditions
   ↓
4. Evaluate & Reduce Risks
   ↓
5. Validation (expand "Known Safe")
   ↓
6. Field Monitoring
```

### Example: Pedestrian Detection Insufficiency

**Scenario**: AEB (Automatic Emergency Braking) fails to detect pedestrian

**Potential Causes (SOTIF, not faults):**
- Pedestrian occluded by parked car
- Low contrast (pedestrian wearing dark clothes at night)
- Unusual pose (person lying down)
- Sensor physics limitations (radar cross-section too small)

**Mitigation:**
- Multi-modal sensing (camera + radar + lidar)
- Conservative behavior (slow down in occluded areas)
- Driver warning when confidence is low
- Limit ODD (don't operate at night without streetlights)

### Operational Design Domain (ODD)

ODD defines the conditions under which the system is designed to operate safely.

**Parameters:**
- **Geographic**: Highways only, geofenced area
- **Environmental**: Daylight, no heavy rain, temperature 0-40°C
- **Road**: Marked lanes, paved, speed ≤ 65 mph
- **Traffic**: No construction zones, low density

**Levels of Autonomy & ODD:**
- **Level 2-3**: Restricted ODD, driver must take over
- **Level 4**: Restricted ODD, system handles all situations within ODD
- **Level 5**: Unlimited ODD (theoretical)

---

## 4.1 Validation and Verification

**Verification**: "Are we building the system right?"
- Does implementation match specification?
- Methods: Unit tests, integration tests, code reviews

**Validation**: "Are we building the right system?"
- Does system meet user needs and safety goals?
- Methods: Field testing, simulation, user studies

### Testing Challenges

**Miles to Validate Safety:**
- Target: 1 fatality per 100 million miles (human baseline)
- To prove 20% better with 95% confidence: **11 billion miles** needed
- Waymo: ~20 million autonomous miles driven (as of 2020)
- Solution: Scenario-based testing, not just mileage

**Scenario-Based Testing:**
- Test specific situations (e.g., pedestrian crossing)
- Vary parameters (speed, lighting, occlusion)
- Track coverage systematically

**Simulation:**
- Faster than real-world testing
- Can test rare/dangerous scenarios
- Challenge: Simulation-to-reality gap

---

## Exercises

### Exercise 1: HARA for Adaptive Cruise Control

**Objective:** Perform hazard analysis and risk assessment for an Adaptive Cruise Control (ACC) system.

**Task:** Identify hazards, determine ASIL ratings, and propose safety mechanisms.

**Instructions:**
- List at least 5 hazards related to ACC functionality
- For each hazard, determine Severity, Exposure, and Controllability
- Calculate ASIL rating using the ASILCalculator class
- Propose at least 2 safety mechanisms for each ASIL C or D hazard
- Generate a HARA report

**Example hazards to consider:**
- ACC fails to detect stopped vehicle ahead
- ACC accelerates when driver intends to brake
- ACC sensor becomes blocked by dirt/snow
- ACC system disengages unexpectedly
- ACC maintains unsafe following distance

### Exercise 2: Fault Tree Analysis

**Objective:** Create and analyze a fault tree for a brake-by-wire system failure.

**Task:** Model the combinations of faults that lead to total braking failure.

**Instructions:**
- Define the top event: "Loss of braking capability"
- Identify intermediate events (subsystem failures)
- Define basic events (component failures)
- Use logic gates (AND, OR) to connect events
- Calculate probability of top event given component failure rates
- Identify critical single points of failure
- Propose redundancy to eliminate SPOFs

**Given component failure rates:**
- Primary brake ECU: λ = 10^-4 failures/hour
- Secondary brake ECU: λ = 10^-4 failures/hour
- Brake actuator: λ = 10^-5 failures/hour
- Power supply: λ = 10^-6 failures/hour

### Exercise 3: SOTIF Scenario Database

**Objective:** Build a scenario database for a lane-keeping assist system and categorize by SOTIF.

**Task:** Create scenarios and classify them into the four SOTIF categories.

**Instructions:**
- Define the intended functionality of the lane-keeping system
- Create at least 12 scenarios covering various conditions:
  - Road types (highway, urban, construction)
  - Weather (clear, rain, fog, snow)
  - Lighting (day, dusk, night)
  - Lane markings (clear, faded, missing)
- Classify each scenario as:
  - Known Safe
  - Known Unsafe
  - Unknown Safe
  - Unknown Unsafe
- For each "Known Unsafe" scenario, propose mitigation strategies
- Define an appropriate ODD based on your analysis

**Deliverables:**
- Scenario database (table format)
- SOTIF classification with justification
- ODD specification
- Risk mitigation plan

In [None]:
# Exercise Solutions

# Exercise 1: HARA for Adaptive Cruise Control
# TODO: Implement HARA for ACC system
#
# Example structure:
# hara_acc = HazardAnalysis()
#
# # Hazard 1: Fails to detect stopped vehicle
# hara_acc.add_hazard(
#     name="ACC fails to detect stopped vehicle ahead",
#     severity='S3',  # Fatal collision possible
#     exposure='E3',  # Medium probability (highway driving)
#     controllability='C2',  # Driver can brake but may not react in time
#     description="Sensor or processing failure prevents detection of stopped vehicle"
# )
#
# # Safety mechanisms for ASIL D:
# # 1. Sensor fusion (camera + radar + lidar) - redundant detection
# # 2. Forward Collision Warning (FCW) - alert driver before automatic intervention
# # 3. Plausibility check - verify sensor readings against multiple sources
# # 4. Safe state - If uncertainty high, disengage ACC and alert driver
#
# # Hazard 2: Unintended acceleration
# hara_acc.add_hazard(
#     name="ACC accelerates when driver presses brake",
#     severity='S3',
#     exposure='E4',  # Can happen whenever ACC is active
#     controllability='C1',  # Driver brake should override
#     description="Software error causes acceleration during brake press"
# )
#
# # Continue for all hazards...
# hara_acc.generate_report()


# Exercise 2: Fault Tree Analysis
# TODO: Implement fault tree for brake-by-wire
#
# Example structure:
# class FaultTreeNode:
#     def __init__(self, name, gate_type=None, probability=None):
#         self.name = name
#         self.gate_type = gate_type  # 'AND', 'OR', or None (basic event)
#         self.probability = probability
#         self.inputs = []
#     
#     def add_input(self, node):
#         self.inputs.append(node)
#     
#     def compute_probability(self):
#         if self.gate_type is None:
#             # Basic event
#             return self.probability
#         elif self.gate_type == 'OR':
#             # P(A ∪ B) = 1 - (1-P(A))(1-P(B))
#             prob = 1.0
#             for input_node in self.inputs:
#                 prob *= (1 - input_node.compute_probability())
#             return 1 - prob
#         elif self.gate_type == 'AND':
#             # P(A ∩ B) = P(A) * P(B) (assuming independence)
#             prob = 1.0
#             for input_node in self.inputs:
#                 prob *= input_node.compute_probability()
#             return prob
#
# # Build fault tree
# # Top event
# top = FaultTreeNode("Loss of Braking", gate_type='OR')
#
# # Intermediate: Primary brake path fails
# primary_path_fail = FaultTreeNode("Primary Path Fails", gate_type='OR')
# primary_ecu_fail = FaultTreeNode("Primary ECU Fails", probability=1e-4)
# primary_actuator_fail = FaultTreeNode("Primary Actuator Fails", probability=1e-5)
# primary_path_fail.add_input(primary_ecu_fail)
# primary_path_fail.add_input(primary_actuator_fail)
#
# # Intermediate: Secondary brake path fails
# secondary_path_fail = FaultTreeNode("Secondary Path Fails", gate_type='OR')
# secondary_ecu_fail = FaultTreeNode("Secondary ECU Fails", probability=1e-4)
# secondary_actuator_fail = FaultTreeNode("Secondary Actuator Fails", probability=1e-5)
# secondary_path_fail.add_input(secondary_ecu_fail)
# secondary_path_fail.add_input(secondary_actuator_fail)
#
# # Both paths must fail (AND gate)
# both_paths_fail = FaultTreeNode("Both Paths Fail", gate_type='AND')
# both_paths_fail.add_input(primary_path_fail)
# both_paths_fail.add_input(secondary_path_fail)
#
# # Power supply failure (single point of failure)
# power_fail = FaultTreeNode("Power Supply Fails", probability=1e-6)
#
# # Top event: Either both paths fail OR power fails
# top.add_input(both_paths_fail)
# top.add_input(power_fail)
#
# # Compute
# prob_failure = top.compute_probability()
# print(f"Probability of total brake failure: {prob_failure:.2e}")
#
# # Minimal cut sets (combinations that cause top event):
# # 1. {Power Fail} - SINGLE POINT OF FAILURE!
# # 2. {Primary ECU Fail, Secondary ECU Fail}
# # 3. {Primary ECU Fail, Secondary Actuator Fail}
# # 4. {Primary Actuator Fail, Secondary ECU Fail}
# # 5. {Primary Actuator Fail, Secondary Actuator Fail}
#
# # Mitigation: Add redundant power supply!


# Exercise 3: SOTIF Scenario Database
# TODO: Create scenario database for lane-keeping assist
#
# Example structure:
# scenarios = [
#     {
#         'id': 1,
#         'description': 'Highway, clear weather, daylight, clear lane markings',
#         'road_type': 'highway',
#         'weather': 'clear',
#         'lighting': 'day',
#         'lane_quality': 'clear',
#         'category': 'Known Safe',
#         'justification': 'Extensively tested, high confidence, within nominal ODD',
#         'mitigation': 'None required'
#     },
#     {
#         'id': 2,
#         'description': 'Highway, clear weather, night, faded lane markings',
#         'road_type': 'highway',
#         'weather': 'clear',
#         'lighting': 'night',
#         'lane_quality': 'faded',
#         'category': 'Known Unsafe',
#         'justification': 'Camera performance degrades at night with poor markings',
#         'mitigation': 'Disengage system, alert driver, use GPS/map-based lane estimation'
#     },
#     {
#         'id': 3,
#         'description': 'Highway, heavy fog, daylight, clear lane markings',
#         'road_type': 'highway',
#         'weather': 'fog',
#         'lighting': 'day',
#         'lane_quality': 'clear',
#         'category': 'Known Unsafe',
#         'justification': 'Camera visibility <50m in fog, insufficient for safe operation',
#         'mitigation': 'System detects low visibility, gracefully disengages'
#     },
#     {
#         'id': 4,
#         'description': 'Urban, clear, day, construction zone with temp markings',
#         'road_type': 'urban',
#         'weather': 'clear',
#         'lighting': 'day',
#         'lane_quality': 'conflicting',
#         'category': 'Unknown Unsafe',
#         'justification': 'Conflicting markings not extensively tested, edge case',
#         'mitigation': 'Add to test scenarios, validate performance, potentially exclude from ODD'
#     },
#     # ... add 8 more scenarios covering different combinations
# ]
#
# # Generate report
# import pandas as pd
# df = pd.DataFrame(scenarios)
# print(df[['id', 'description', 'category', 'mitigation']])
#
# # Define ODD based on Known Safe scenarios
# odd = {
#     'road_type': ['highway'],
#     'weather': ['clear', 'light_rain'],
#     'lighting': ['day'],
#     'lane_quality': ['clear'],
#     'speed_range': '50-75 mph',
#     'geographic': 'US highways with lane markings',
#     'exclusions': [
#         'Construction zones',
#         'Fog or heavy rain',
#         'Night operation',
#         'Unmarked roads',
#         'Urban streets with frequent turns'
#     ]
# }
# print("\nOperational Design Domain:")
# for key, value in odd.items():
#     print(f"  {key}: {value}")

---

## References

### Standards

1. **ISO 26262** - Road vehicles — Functional safety (2018)
   - Part 1: Vocabulary
   - Part 3: Concept phase (includes HARA)
   - Part 4: Product development at the system level
   - Part 6: Product development at the software level
   - Official: https://www.iso.org/standard/68383.html

2. **ISO/PAS 21448** - Road vehicles — Safety of the intended functionality (SOTIF) (2019)
   - Addresses performance limitations and misuse
   - Complements ISO 26262
   - Official: https://www.iso.org/standard/70939.html

3. **IEC 61508** - Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems
   - Basis for ISO 26262
   - General industry standard

### Books

4. **Mader, R., et al.** (2018). *ISO 26262 - A Practical Guide*. Springer.
   - Comprehensive guide to implementing ISO 26262
   - Real-world examples and case studies

5. **Lemieux, J.** (2020). *Safety-Critical Systems Handbook: A Straightforward Guide to Functional Safety*. Independent.
   - Clear explanations of safety concepts
   - Covers multiple safety standards

6. **Storey, N.** (1996). *Safety-Critical Computer Systems*. Addison-Wesley.
   - Foundational textbook
   - Fault tolerance architectures

7. **Dunn, W. R.** (2006). *Practical Design of Safety-Critical Computer Systems*. Reliability Press.
   - Hardware and software safety techniques
   - Redundancy design patterns

### Papers - ISO 26262 & ASIL

8. **Joshi, A., & Heimdahl, M. P. E.** (2007). "Behavioral Fault Modeling for Model-based Safety Analysis." *10th IEEE High Assurance Systems Engineering Symposium*.
   - HARA methodology
   - Fault modeling techniques

9. **Armengaud, E., et al.** (2015). "Integrated tool chain for improving traceability during the development of automotive systems." *ACM SIGSOFT Software Engineering Notes*, 40(1), 1-5.
   - Tool support for ISO 26262 compliance
   - Traceability and documentation

10. **Schmittner, C., et al.** (2016). "Application of security and safety co-analysis for autonomous vehicles." *European Conference on Software Architecture Workshops*.
    - Combined safety and security analysis
    - Relevant for connected AVs

### Papers - Redundancy & Fault Tolerance

11. **Isermann, R.** (2006). "Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance." Springer.
    - Comprehensive overview of fault diagnosis
    - Analytical redundancy techniques

12. **Narasimhan, S., & Browston, L.** (2007). "HyDE - A General Framework for Stochastic and Hybrid Modelbased Diagnosis." *DX*.
    - Hybrid diagnostics for automotive systems

13. **Blanke, M., et al.** (2006). *Diagnosis and Fault-Tolerant Control*. Springer.
    - Control-theoretic approach to fault tolerance
    - Redundancy management

### Papers - SOTIF

14. **Stolte, T., et al.** (2015). "Towards automated driving: Unmanned protective vehicle for highway hard shoulder road works." *18th International Conference on Intelligent Transportation Systems*.
    - Real-world SOTIF challenges
    - ODD definition

15. **Zendel, O., et al.** (2019). "How good is my test data? Introducing safety analysis for computer vision." *International Journal of Computer Vision*, 125(1-3), 95-109.
    - Test data quality for vision systems
    - Safety-critical ML validation

16. **Koopman, P., & Wagner, M.** (2016). "Challenges in Autonomous Vehicle Testing and Validation." *SAE International Journal of Transportation Safety*, 4(1), 15-24.
    - Validation challenges for AVs
    - Scenario-based testing

17. **Kalra, N., & Paddock, S. M.** (2016). "Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?" *Transportation Research Part A: Policy and Practice*, 94, 182-193.
    - Statistical validation challenges
    - 11 billion miles problem

### Papers - Verification & Validation

18. **Koopman, P., & Wagner, M.** (2017). "Autonomous vehicle safety: An interdisciplinary challenge." *IEEE Intelligent Transportation Systems Magazine*, 9(1), 90-96.
    - Comprehensive safety framework
    - Testing methodologies

19. **Tuncali, C. E., et al.** (2020). "Sim-ATAV: Simulation-Based Adversarial Testing Framework for Autonomous Vehicles." *ACM/IEEE International Conference on Cyber-Physical Systems*.
    - Adversarial testing for AVs
    - Finding edge cases systematically

20. **Mullins, G. E., et al.** (2018). "Adaptive Generation of Challenging Scenarios for Testing and Evaluation of Autonomous Vehicles." *Journal of Systems and Software*, 137, 197-215.
    - Scenario generation for testing
    - Coverage metrics

### Industry Reports & White Papers

21. **SAE J3016** - Taxonomy and Definitions for Terms Related to Driving Automation Systems
    - Levels of automation (0-5)
    - ODD definitions
    - Free: https://www.sae.org/standards/content/j3016_202104/

22. **Waymo Safety Report** (2020)
    - Real-world safety architecture
    - Redundancy implementation
    - https://waymo.com/safety/

23. **NHTSA - Automated Vehicles for Safety** (2020)
    - Regulatory perspective
    - Voluntary safety self-assessment
    - https://www.nhtsa.gov/vehicle-safety/automated-vehicles-safety

24. **UL 4600** - Standard for Safety for the Evaluation of Autonomous Products
    - First consensus standard for AV safety
    - https://ul.org/UL4600

### Tools & Resources

25. **FMEA-FMECA Software Tools**
    - ReliaSoft XFMEA
    - ISOTools FMEA
    - Useful for fault mode analysis

26. **Fault Tree Analysis Tools**
    - OpenFTA (open-source)
    - https://www.openfta.com/

27. **Safety Case Tools**
    - ASCE (Adelard Safety Case Editor)
    - GSN (Goal Structuring Notation) tools

28. **Simulation Platforms**
    - CARLA (open-source AV simulator)
    - LGSVL (LG Silicon Valley Lab simulator)
    - Useful for SOTIF scenario testing

### Courses

29. **Coursera - Introduction to Self-Driving Cars** (University of Toronto)
    - Module on safety assurance
    - SOTIF introduction

30. **edX - Autonomous Systems** (ETH Zurich)
    - Safety and reliability module
    - Fault-tolerant control