# 08 - Engineering & Physical Audit

**Philosophy:** Raw Data Only - Zero Scoring, Zero Decisions

**Purpose:** High-fidelity engineering documentation for each motion capture session. This notebook provides pure physical measurements, mathematical methodology, and biomechanical profiles WITHOUT synthetic quality scores or decision labels.

**Target Audience:** Researchers, biomechanists, and data scientists who need transparent access to:
- Raw capture quality metrics (SNR, missing data, jitter)
- Processing methodology (interpolation, filtering, differentiation formulas)
- Structural integrity (skeleton stability, calibration offsets)
- Kinematic extremes (peak velocities, accelerations)
- Per-joint noise profiles

**What This Report Does NOT Include:**
- Quality scores (0-100)
- Decision labels (ACCEPT/REVIEW/REJECT)
- Synthetic grades (GOLD/SILVER/BRONZE)
- Pass/Fail judgments

**References:**
- Cereatti et al. (2024) - Data lineage & provenance
- Winter (2009) - Residual analysis
- R√°cz et al. (2025) - Calibration layer
- Shoemake (1985) - Quaternion interpolation
- Savitzky & Golay (1964) - Smoothing differentiation

---

## Table of Contents

1. [Setup & Data Loading](#setup)
2. [Methodology Passport](#methodology) - Mathematical documentation
3. [Data Lineage](#lineage) - Recording provenance
4. [Capture Baseline](#baseline) - Raw state before processing
5. [Structural Integrity](#structure) - Skeleton & calibration
6. [Signal Quality Profile](#signal) - Pre-processing SNR
7. [Processing Transparency](#processing) - What was done
8. [Kinematic Extremes](#kinematics) - Processed output
9. [Per-Joint Noise Profile](#noise) - Root cause analysis
10. [Outlier Distribution](#outliers) - Frame-level patterns
11. [Excel Export](#export) - Engineering audit log

---

<a id="setup"></a>
## 1. Setup & Data Loading

Load all JSON files **once** and reuse throughout the notebook.

In [1]:
# ============================================================
# IMPORTS & PATH SETUP
# ============================================================
import os
import sys
import json
import pandas as pd
import numpy as np
from datetime import datetime
from IPython.display import display, HTML, Markdown

# Setup paths
if os.path.basename(os.getcwd()) == 'notebooks':
    PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), ".."))
else:
    PROJECT_ROOT = os.path.abspath(os.getcwd())
SRC_PATH = os.path.join(PROJECT_ROOT, "src")
if SRC_PATH not in sys.path:
    sys.path.insert(0, SRC_PATH)

# Import utility module (force reload for updates)
import importlib
if 'utils_nb07' in sys.modules:
    import utils_nb07
    importlib.reload(utils_nb07)

from utils_nb07 import (
    load_all_runs, 
    filter_complete_runs, 
    build_engineering_profile_row,
    build_subject_profile,
    extract_per_joint_noise_profile,
    extract_bone_stability_profile,
    extract_selected_segments,
    compute_noise_locality_index,
    get_git_hash,
    print_section_header,
    METHODOLOGY_PASSPORT
)

print(f"Project Root: {PROJECT_ROOT}")
print(f"Git Hash: {get_git_hash(PROJECT_ROOT)}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Project Root: c:\Users\drorh\OneDrive - Mobileye\Desktop\gaga
Git Hash: f83538a
Timestamp: 2026-02-15 13:09:04


In [2]:
# ============================================================
# LOAD ALL DATA (ONCE)
# ============================================================
DERIV_ROOT = os.path.join(PROJECT_ROOT, "derivatives")

# Load all JSON files
print("Loading JSON files...")
all_runs = load_all_runs(DERIV_ROOT)
print(f"Found {len(all_runs)} total runs")

# Filter to complete runs (require step_01 and step_06)
runs_data = filter_complete_runs(all_runs, required_steps=["step_01", "step_06"])
print(f"Complete runs: {len(runs_data)}")

# Show available steps per run
print("\nSteps available per run:")
expected_steps = ['step_01', 'step_02', 'step_03', 'step_04', 'step_05', 'step_06']

if len(all_runs) == 0:
    print("\n‚ö†Ô∏è WARNING: No runs found at all!")
    print(f"\nSearched in: {DERIV_ROOT}")
    print("\nExpected structure:")
    print("  derivatives/")
    print("    ‚îú‚îÄ‚îÄ step_01_parse/")
    print("    ‚îÇ   ‚îî‚îÄ‚îÄ {run_id}__step01_loader_report.json")
    print("    ‚îú‚îÄ‚îÄ step_02_preprocess/")
    print("    ‚îÇ   ‚îî‚îÄ‚îÄ {run_id}__preprocess_summary.json")
    print("    ‚îú‚îÄ‚îÄ step_03_resample/")
    print("    ‚îú‚îÄ‚îÄ step_04_filtering/")
    print("    ‚îú‚îÄ‚îÄ step_05_reference/")
    print("    ‚îî‚îÄ‚îÄ step_06_kinematics/")
    print("        ‚îî‚îÄ‚îÄ ultimate/")
    print("            ‚îî‚îÄ‚îÄ {run_id}__outlier_validation.json")
    print("\nPlease run the full pipeline (notebooks 01-06) first.")
else:
    for run_id, steps in all_runs.items():
        steps_list = sorted(steps.keys())
        missing = [s for s in expected_steps if s not in steps_list]
        
        # Truncate run_id for display
        display_id = run_id[:60] + "..." if len(run_id) > 60 else run_id
        print(f"\n  {display_id}")
        print(f"    ‚úÖ Found: {steps_list}")
        
        if missing:
            print(f"    ‚ö†Ô∏è Missing: {missing}")
            if run_id not in runs_data:
                print(f"    ‚Üí This run is INCOMPLETE and will be skipped")
        else:
            print(f"    ‚Üí This run is COMPLETE and will be processed")

Loading JSON files...
Found 3 total runs
Complete runs: 3

Steps available per run:

  671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001
    ‚úÖ Found: ['step_01', 'step_02', 'step_03', 'step_04', 'step_05', 'step_06']
    ‚Üí This run is COMPLETE and will be processed

  671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005
    ‚úÖ Found: ['step_01', 'step_02', 'step_03', 'step_04', 'step_05', 'step_06']
    ‚Üí This run is COMPLETE and will be processed

  671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000
    ‚úÖ Found: ['step_01', 'step_02', 'step_03', 'step_04', 'step_05', 'step_06']
    ‚Üí This run is COMPLETE and will be processed


In [3]:
# ============================================================
# BUILD ENGINEERING PROFILE DATAFRAME
# ============================================================

# Check if we have any complete runs
if len(runs_data) == 0:
    print("\n" + "="*80)
    print("‚ö†Ô∏è ERROR: NO COMPLETE RUNS FOUND")
    print("="*80)
    print("\nDiagnostics:")
    print(f"  Total runs discovered: {len(all_runs)}")
    print(f"  Runs with step_01 AND step_06: 0")
    print("\nShowing what's available for each run:\n")
    
    for run_id, steps in all_runs.items():
        print(f"  Run: {run_id[:70]}")
        print(f"    Steps found: {sorted(steps.keys())}")
        missing = [s for s in ['step_01', 'step_02', 'step_03', 'step_04', 'step_05', 'step_06'] if s not in steps]
        if missing:
            print(f"    ‚ö†Ô∏è Missing: {missing}")
        print()
    
    print("\nTo fix this issue:")
    print("  1. Ensure all pipeline steps (01-06) have been run for your sessions")
    print("  2. Check that JSON files exist in derivatives/step_01_parse/ and derivatives/step_06_kinematics/ultimate/")
    print("  3. Verify the run_id prefixes match across all steps")
    print("\nCannot proceed without complete runs. Stopping execution.")
    raise ValueError("No complete runs found. Please run the full pipeline (steps 01-06) first.")

# Extract pure physical measurements (NO SCORING)
print(f"\nBuilding engineering profiles for {len(runs_data)} complete runs...")
engineering_rows = [build_engineering_profile_row(run_id, steps) for run_id, steps in runs_data.items()]
df_engineering = pd.DataFrame(engineering_rows)

# Sort by subject and session for easy review
if len(df_engineering) > 0:
    df_engineering = df_engineering.sort_values(["Subject_ID", "Session_ID"]).reset_index(drop=True)

print(f"\n‚úÖ Engineering DataFrame: {len(df_engineering)} runs √ó {len(df_engineering.columns)} measurements")
print(f"\nColumn groups:")
print(f"  - Lineage: Run_ID, Subject_ID, Session_ID, Pipeline_Version")
print(f"  - Baseline: Duration, Sampling_Rate, Raw_Missing_%, SNR")
print(f"  - Structure: Bone_CV%, Skeleton_Segments, Height, Mass")
print(f"  - Processing: Interpolation, Filtering, Resampling")
print(f"  - Kinematics: Max_Velocity, Max_Acceleration, Path_Length")
print(f"  - Outliers: Total_Frames, Classification, Data_Retained")

# DEBUG: Check for anatomical region columns
path_cols = [c for c in df_engineering.columns if 'Path_' in c and '_m' in c]
intensity_cols = [c for c in df_engineering.columns if 'Intensity_' in c]
print(f"\n[DEBUG] Found {len(path_cols)} Path columns: {path_cols[:5]}...")
print(f"[DEBUG] Found {len(intensity_cols)} Intensity columns: {intensity_cols[:5]}...")
if len(path_cols) > 0:
    print(f"[DEBUG] Sample path values: {df_engineering[path_cols[0]].values}")
if len(intensity_cols) > 0:
    print(f"[DEBUG] Sample intensity values: {df_engineering[intensity_cols[0]].values}")


Building engineering profiles for 3 complete runs...
[DEBUG] extract_phase2_metrics: Found 'path_length' in s06
[DEBUG] Found by_region structure: {'Neck': 50.527266676734286, 'Shoulders': 47.292705996517235, 'Elbows': 60.685467516215276, 'Wrists': 117.42213285178761, 'Spine': 28.43047571703742, 'Hips': 12.197133030420384, 'Knees': 46.326368809313486, 'Ankles': 52.78297513463008}
[DEBUG] Found new intensity structure with by_region: {'Neck': 0.19812671964213033, 'Shoulders': 0.18544341141659537, 'Elbows': 0.23795889625023145, 'Wrists': 0.46043381179016807, 'Spine': 0.11148113211268472, 'Hips': 0.04782720529524707, 'Knees': 0.18165422530855205, 'Ankles': 0.2069717680016864}
[DEBUG] extract_phase2_metrics: Found 'path_length' in s06
[DEBUG] Found by_region structure: {'Neck': 49.90148006766413, 'Shoulders': 45.95326641925252, 'Elbows': 56.45029809817945, 'Wrists': 120.76143018035427, 'Spine': 26.3290052573553, 'Hips': 14.646004291906534, 'Knees': 48.881444417541104, 'Ankles': 57.1223795

---

<a id="methodology"></a>
## 2. Methodology Passport

**Purpose:** Document the mathematical methods used to derive all reported values.

This section provides explicit formulas, implementation details, and references for:
- Quaternion interpolation (SLERP)
- Angular velocity extraction
- Angular acceleration differentiation
- 3-stage signal cleaning pipeline
- Resampling strategy

In [4]:
print_section_header("METHODOLOGY PASSPORT - MATHEMATICAL DOCUMENTATION")

print("\n" + "="*80)
print("INTERPOLATION METHODS")
print("="*80)

# Rotation Interpolation
rot_method = METHODOLOGY_PASSPORT["interpolation"]["rotations"]
print(f"\nüìê Rotation Interpolation: {rot_method['method']}")
print(f"   Formula: {rot_method['formula']}")
print(f"   Constraint: {rot_method['constraint']}")
print(f"   Geodesic: {rot_method['geodesic']}")
print(f"   Reference: {rot_method['reference']}")
print(f"   Implementation: {rot_method['implementation']}")

# Position Interpolation
pos_method = METHODOLOGY_PASSPORT["interpolation"]["positions"]
print(f"\nüìê Position Interpolation: {pos_method['method']}")
print(f"   Formula: {pos_method['formula']}")
print(f"   Continuity: {pos_method['continuity']}")
print(f"   Constraint: {pos_method['constraint']}")
print(f"   Implementation: {pos_method['implementation']}")

print("\n" + "="*80)
print("DIFFERENTIATION METHODS")
print("="*80)

# Angular Velocity
ang_vel = METHODOLOGY_PASSPORT["differentiation"]["angular_velocity"]
print(f"\nüîÑ Angular Velocity: {ang_vel['method']}")
print(f"   Formula: {ang_vel['formula']}")
print(f"   Derivation: {ang_vel['derivation']}")
print(f"   Units: {ang_vel['units']}")
print(f"   Note: {ang_vel['note']}")

# Angular Acceleration
ang_accel = METHODOLOGY_PASSPORT["differentiation"]["angular_acceleration"]
print(f"\nüîÑ Angular Acceleration: {ang_accel['method']}")
print(f"   Formula: {ang_accel['formula']}")
print(f"   Window: {ang_accel['window_sec']}s ({ang_accel['window_frames']} frames @ 120Hz)")
print(f"   Polynomial Order: {ang_accel['polynomial_order']}")
print(f"   Units: {ang_accel['units']}")
print(f"   Reference: {ang_accel['reference']}")
print(f"   Implementation: {ang_accel['implementation']}")

# Linear Derivatives
lin_vel = METHODOLOGY_PASSPORT["differentiation"]["linear_velocity"]
lin_accel = METHODOLOGY_PASSPORT["differentiation"]["linear_acceleration"]
print(f"\nüìè Linear Velocity: {lin_vel['method']}")
print(f"   Formula: {lin_vel['formula']}")
print(f"   Note: {lin_vel['note']}")
print(f"\nüìè Linear Acceleration: {lin_accel['method']}")
print(f"   Formula: {lin_accel['formula']}")
print(f"   Note: {lin_accel['note']}")

print("\n" + "="*80)
print("3-STAGE SIGNAL CLEANING PIPELINE (v3.0)")
print("="*80)

filtering = METHODOLOGY_PASSPORT["filtering"]
print(f"\nPhilosophy: {filtering['philosophy']}")
print(f"Version: {filtering['pipeline_version']}")

# Stage 1
stage1 = filtering["stage1_artifact_detection"]
print(f"\nüîç Stage 1: {stage1['method']}")
print(f"   Velocity Limit: {stage1['velocity_limit_mm_s']} mm/s")
print(f"   Z-Score Threshold: {stage1['zscore_threshold']}œÉ")
print(f"   Interpolation: {stage1['interpolation']}")
print(f"   Purpose: {stage1['purpose']}")

# Stage 2
stage2 = filtering["stage2_hampel"]
print(f"\nüîç Stage 2: {stage2['method']}")
print(f"   Window Size: {stage2['window_size']} frames")
print(f"   Sigma Threshold: {stage2['n_sigma']}œÉ")
print(f"   Purpose: {stage2['purpose']}")
print(f"   Note: {stage2['note']}")

# Stage 3
stage3 = filtering["stage3_adaptive_winter"]
print(f"\nüîç Stage 3: {stage3['method']}")
print(f"   Strategy: {stage3['strategy']}")
print(f"   Frequency Range: {stage3['fmin_hz']}-{stage3['fmax_hz']} Hz")
print(f"   Filter Type: {stage3['filter_type']}")
print(f"   Filter Order: {stage3['filter_order']}")
print(f"   Implementation: {stage3['implementation']}")
print(f"   Rationale: {stage3['rationale']}")
print(f"\n   Body Regions:")
for region, description in stage3["regions"].items():
    print(f"     - {region}: {description}")

print("\n" + "="*80)
print("RESAMPLING STRATEGY")
print("="*80)

resampling = METHODOLOGY_PASSPORT["resampling"]
print(f"\nTarget Frequency: {resampling['target_fs_hz']} Hz")
print(f"Purpose: {resampling['purpose']}")
print(f"Method: {resampling['method']}")
print(f"Positions: {resampling['positions_method']}")
print(f"Rotations: {resampling['rotations_method']}")
print(f"Validation: {resampling['validation']}")

print("\n" + "="*80)
print("REFERENCE ALIGNMENT")
print("="*80)

ref_align = METHODOLOGY_PASSPORT["reference_alignment"]
print(f"\nMethod: {ref_align['method']}")
print(f"Reference: {ref_align['reference']}")
print(f"Detection: {ref_align['detection']}")
print(f"Offset Correction: {ref_align['offset_correction']}")
print(f"Bilateral Correction: {ref_align['bilateral_correction']}")

print("\n" + "="*80)

METHODOLOGY PASSPORT - MATHEMATICAL DOCUMENTATION

INTERPOLATION METHODS

üìê Rotation Interpolation: SLERP (Spherical Linear Interpolation)
   Formula: q(t) = sin((1-t)Œ∏)/sin(Œ∏) ¬∑ q‚ÇÄ + sin(tŒ∏)/sin(Œ∏) ¬∑ q‚ÇÅ
   Constraint: Maintains unit quaternion: ||q|| = 1
   Geodesic: Shortest path on SO(3) manifold
   Reference: Shoemake (1985)
   Implementation: scipy.spatial.transform.Slerp

üìê Position Interpolation: CubicSpline
   Formula: p(t) = a‚ÇÄ + a‚ÇÅt + a‚ÇÇt¬≤ + a‚ÇÉt¬≥ (piecewise)
   Continuity: C¬≤ (smooth velocity and acceleration)
   Constraint: Natural boundary conditions
   Implementation: scipy.interpolate.CubicSpline

DIFFERENTIATION METHODS

üîÑ Angular Velocity: Quaternion Derivative
   Formula: œâ = 2 ¬∑ (dq/dt) ¬∑ q*
   Derivation: qÃá via finite differences, then œâ = 2qÃáq* (quaternion conjugate)
   Units: deg/s
   Note: Extracts instantaneous axis-angle velocity from quaternion time series

üîÑ Angular Acceleration: Savitzky-Golay Filter
   Formula: Œ± = d/

---

<a id="lineage"></a>
## 3. Data Lineage & Provenance

**Purpose:** Ensure recording traceability from raw file to final result (Cereatti et al., 2024)

In [5]:
print_section_header("DATA LINEAGE & PROVENANCE")

# Display provenance info
cols_lineage = ['Run_ID', 'Subject_ID', 'Session_ID', 'Processing_Timestamp', 'Pipeline_Version', 'CSV_Source']
display(df_engineering[cols_lineage])

print(f"\nDataset Summary:")
print(f"  Total Recordings: {len(df_engineering)}")
print(f"  Unique Subjects: {df_engineering['Subject_ID'].nunique()}")
print(f"  Unique Sessions: {df_engineering['Session_ID'].nunique()}")
print(f"  Pipeline Version: {df_engineering['Pipeline_Version'].iloc[0] if len(df_engineering) > 0 else 'N/A'}")

DATA LINEAGE & PROVENANCE


Unnamed: 0,Run_ID,Subject_ID,Session_ID,Processing_Timestamp,Pipeline_Version,CSV_Source
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,671,T1,2026-02-15 12:01,v2.6_calibration_enhanced,C:\Users\drorh\OneDrive - Mobileye\Desktop\gag...
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,671,T2,2026-02-15 12:54,v2.6_calibration_enhanced,C:\Users\drorh\OneDrive - Mobileye\Desktop\gag...
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,671,T3,2026-02-15 12:58,v2.6_calibration_enhanced,C:\Users\drorh\OneDrive - Mobileye\Desktop\gag...



Dataset Summary:
  Total Recordings: 3
  Unique Subjects: 1
  Unique Sessions: 3
  Pipeline Version: v2.6_calibration_enhanced


---

<a id="baseline"></a>
## 4. Capture Baseline Profile

**Purpose:** Document the raw state of the data BEFORE any processing.

This section represents the "Ground Truth" capture quality:
- How much data was missing in the raw OptiTrack export?
- What was the inherent Signal-to-Noise Ratio (SNR) of the raw tracking?
- What was the native sampling rate and jitter?
- What was the OptiTrack system calibration error?

## SECTION 11.5: Anatomical Region View (Human-Readable)

Group path lengths by anatomical regions for easier interpretation.

In [6]:
# ============================================================
# ANATOMICAL REGION BREAKDOWN
# ============================================================

# Extract anatomical region columns
anatomical_cols = [
    "Path_Neck_m", "Path_Shoulders_m", "Path_Elbows_m", "Path_Wrists_m",
    "Path_Spine_m", "Path_Hips_m", "Path_Knees_m", "Path_Ankles_m"
]

if all(col in df_engineering.columns for col in anatomical_cols):
    # Create anatomical summary
    region_data = []
    
    for _, row in df_engineering.iterrows():
        regions = {
            "Run_ID": row["Run_ID"],
            "Neck": row["Path_Neck_m"],
            "Shoulders": row["Path_Shoulders_m"],
            "Elbows": row["Path_Elbows_m"],
            "Wrists": row["Path_Wrists_m"],
            "Spine": row["Path_Spine_m"],
            "Hips": row["Path_Hips_m"],
            "Knees": row["Path_Knees_m"],
            "Ankles": row["Path_Ankles_m"],
        }
        region_data.append(regions)
    
    df_anatomical = pd.DataFrame(region_data)
    
    print("="*80)
    print("ANATOMICAL REGION PATH LENGTHS (meters)")
    print("="*80)
    print("\nHuman-readable view of movement by body region:")
    print("\nMapping:")
    print("  ‚Ä¢ Neck       ‚Üí Neck joint")
    print("  ‚Ä¢ Shoulders  ‚Üí Left/Right shoulder joints (max)")
    print("  ‚Ä¢ Elbows     ‚Üí Left/Right forearm segments (max)")
    print("  ‚Ä¢ Wrists     ‚Üí Left/Right hand/wrist joints (max)")
    print("  ‚Ä¢ Spine      ‚Üí Mid-back / thoracic region")
    print("  ‚Ä¢ Hips       ‚Üí Pelvis + hip joints")
    print("  ‚Ä¢ Knees      ‚Üí Left/Right shin segments (max)")
    print("  ‚Ä¢ Ankles     ‚Üí Left/Right foot/ankle joints (max)")
    print("\n" + "="*80)
    
    # Display table
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
        print(df_anatomical.to_string(index=False))
    
    # Summary statistics
    print("\n" + "="*80)
    print("REGION SUMMARY (across all runs)")
    print("="*80)
    
    region_summary = df_anatomical.drop(columns=["Run_ID"]).describe().loc[["mean", "min", "max"]]
    print(region_summary.to_string())
    
    # Most active regions
    mean_by_region = df_anatomical.drop(columns=["Run_ID"]).mean().sort_values(ascending=False)
    print("\n" + "="*80)
    print("MOST ACTIVE REGIONS (ranked by average path length)")
    print("="*80)
    for i, (region, value) in enumerate(mean_by_region.items(), 1):
        print(f"  {i}. {region:12s}: {value:.2f}m")
else:
    print("‚ö†Ô∏è Anatomical region columns not found. Re-run notebook 06 with Phase 2 updates.")

ANATOMICAL REGION PATH LENGTHS (meters)

Human-readable view of movement by body region:

Mapping:
  ‚Ä¢ Neck       ‚Üí Neck joint
  ‚Ä¢ Shoulders  ‚Üí Left/Right shoulder joints (max)
  ‚Ä¢ Elbows     ‚Üí Left/Right forearm segments (max)
  ‚Ä¢ Wrists     ‚Üí Left/Right hand/wrist joints (max)
  ‚Ä¢ Spine      ‚Üí Mid-back / thoracic region
  ‚Ä¢ Hips       ‚Üí Pelvis + hip joints
  ‚Ä¢ Knees      ‚Üí Left/Right shin segments (max)
  ‚Ä¢ Ankles     ‚Üí Left/Right foot/ankle joints (max)

                                      Run_ID  Neck  Shoulders  Elbows  Wrists  Spine  Hips  Knees  Ankles
671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001 50.53      47.29   60.69  117.42  28.43 12.20  46.33   52.78
671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005 49.90      45.95   56.45  120.76  26.33 14.65  48.88   57.12
671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000 52.69      47.98   59.60  114.49  40.40 12.61  44.46   54.46

REGION SUMMARY (across all runs)
       Neck  Shoulders     Elbows      Wrists  S

## SECTION 11.6: Intensity Index (Phase 3)

**Movement intensity normalized by duration** - allows fair comparison between sessions of different lengths.

In [7]:
# ============================================================
# INTENSITY INDEX (Phase 3)
# ============================================================

intensity_cols = [
    "Intensity_Neck_m_per_s", "Intensity_Shoulders_m_per_s", "Intensity_Elbows_m_per_s", "Intensity_Wrists_m_per_s",
    "Intensity_Spine_m_per_s", "Intensity_Hips_m_per_s", "Intensity_Knees_m_per_s", "Intensity_Ankles_m_per_s"
]

if all(col in df_engineering.columns for col in intensity_cols):
    print("="*80)
    print("INTENSITY INDEX (meters per second)")
    print("="*80)
    print("\nIntensity = Path Length / Duration")
    print("  ‚Üí Normalized measure of movement activity")
    print("  ‚Üí Allows comparison between sessions of different durations")
    print("  ‚Üí Units: m/s (average velocity-like measure)")
    print("\n" + "="*80)
    
    # Display intensity by anatomical region
    intensity_data = []
    for _, row in df_engineering.iterrows():
        regions = {
            "Run_ID": row["Run_ID"],
            "Duration_sec": row.get("Duration_sec", 0),
            "Neck": row["Intensity_Neck_m_per_s"],
            "Shoulders": row["Intensity_Shoulders_m_per_s"],
            "Elbows": row["Intensity_Elbows_m_per_s"],
            "Wrists": row["Intensity_Wrists_m_per_s"],
            "Spine": row["Intensity_Spine_m_per_s"],
            "Hips": row["Intensity_Hips_m_per_s"],
            "Knees": row["Intensity_Knees_m_per_s"],
            "Ankles": row["Intensity_Ankles_m_per_s"],
        }
        intensity_data.append(regions)
    
    df_intensity = pd.DataFrame(intensity_data)
    
    print("\nIntensity by Anatomical Region (m/s):")
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None, 'display.precision', 4):
        print(df_intensity.to_string(index=False))
    
    # Summary statistics
    print("\n" + "="*80)
    print("INTENSITY SUMMARY (across all runs)")
    print("="*80)
    
    intensity_summary = df_intensity.drop(columns=["Run_ID", "Duration_sec"]).describe().loc[["mean", "min", "max"]]
    print(intensity_summary.to_string())
    
    # Most intense regions
    mean_intensity = df_intensity.drop(columns=["Run_ID", "Duration_sec"]).mean().sort_values(ascending=False)
    print("\n" + "="*80)
    print("MOST INTENSE REGIONS (ranked by average intensity)")
    print("="*80)
    for i, (region, value) in enumerate(mean_intensity.items(), 1):
        print(f"  {i}. {region:12s}: {value:.4f} m/s")
    
    # Interpretation guide
    print("\n" + "="*80)
    print("INTERPRETATION GUIDE")
    print("="*80)
    print("\nIntensity Index represents average speed of movement:")
    print("  ‚Ä¢ 0.10 - 0.30 m/s : Slow, controlled movements")
    print("  ‚Ä¢ 0.30 - 0.60 m/s : Moderate movement speed")
    print("  ‚Ä¢ 0.60 - 1.00 m/s : Fast, dynamic movements")
    print("  ‚Ä¢ >1.00 m/s       : Very fast movements (e.g., sports, rapid reaching)")
    print("\nHigher intensity = more active movement per unit time")
else:
    print("‚ö†Ô∏è Intensity Index columns not found. Re-run notebook 06 with Phase 3 updates.")

INTENSITY INDEX (meters per second)

Intensity = Path Length / Duration
  ‚Üí Normalized measure of movement activity
  ‚Üí Allows comparison between sessions of different durations
  ‚Üí Units: m/s (average velocity-like measure)


Intensity by Anatomical Region (m/s):
                                      Run_ID  Duration_sec   Neck  Shoulders  Elbows  Wrists  Spine   Hips  Knees  Ankles
671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001        255.03 0.1981     0.1854  0.2380  0.4604 0.1115 0.0478 0.1817  0.2070
671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005        252.96 0.1973     0.1817  0.2232  0.4774 0.1041 0.0579 0.1932  0.2258
671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000        263.94 0.1996     0.1818  0.2258  0.4337 0.1530 0.0478 0.1684  0.2063

INTENSITY SUMMARY (across all runs)
          Neck  Shoulders  Elbows    Wrists     Spine      Hips   Knees    Ankles
mean  0.198333   0.182967  0.2290  0.457167  0.122867  0.051167  0.1811  0.213033
min   0.197300   0.181700  0.2232  0.433

## SECTION 12: Cross-Session Analysis (Phase 4)

**Multi-session comparison and subject-level insights**

In [8]:
# ============================================================
# PHASE 4: CROSS-SESSION ANALYSIS
# ============================================================

if len(df_engineering) > 1:
    print("="*80)
    print("CROSS-SESSION ANALYSIS")
    print("="*80)
    print(f"\nAnalyzing {len(df_engineering)} sessions across {df_engineering['Subject_ID'].nunique()} subject(s)")
    
    # Group by subject
    subjects = df_engineering.groupby('Subject_ID')
    
    print("\n" + "="*80)
    print("SUBJECT-LEVEL SUMMARY")
    print("="*80)
    
    for subject_id, subject_data in subjects:
        n_sessions = len(subject_data)
        print(f"\n{'='*80}")
        print(f"Subject: {subject_id} ({n_sessions} sessions)")
        print(f"{'='*80}")
        
        # Session list
        print("\nSessions:")
        for idx, row in subject_data.iterrows():
            print(f"  ‚Ä¢ {row['Session_ID']}: {row['Duration_sec']:.1f}s, {row['Total_Frames']} frames")
        
        # Key metrics comparison
        key_metrics = [
            "Path_Length_Total_m",
            "Intensity_Mean_m_per_s",
            "Bilateral_Symmetry_Mean",
            "Raw_Missing_Data_Percent",
            "Bone_Length_CV_Percent",
        ]
        
        print("\n" + "-"*80)
        print("Key Metrics Across Sessions:")
        print("-"*80)
        
        for metric in key_metrics:
            if metric in subject_data.columns:
                values = subject_data[metric].values
                mean_val = values.mean()
                std_val = values.std()
                min_val = values.min()
                max_val = values.max()
                
                # Coefficient of variation (CV%)
                cv_pct = (std_val / mean_val * 100) if mean_val > 0 else 0
                
                print(f"\n{metric}:")
                print(f"  Mean ¬± Std: {mean_val:.4f} ¬± {std_val:.4f}")
                print(f"  Range: [{min_val:.4f}, {max_val:.4f}]")
                print(f"  CV%: {cv_pct:.2f}%")
                
                # Flag high variability
                if cv_pct > 25:
                    print(f"  ‚ö†Ô∏è HIGH VARIABILITY (CV > 25%)")
        
        # Movement patterns
        print("\n" + "-"*80)
        print("Movement Patterns (Anatomical Regions):")
        print("-"*80)
        
        region_cols = [
            "Intensity_Wrists_m_per_s",
            "Intensity_Elbows_m_per_s",
            "Intensity_Knees_m_per_s",
            "Intensity_Ankles_m_per_s",
        ]
        
        for col in region_cols:
            if col in subject_data.columns:
                region_name = col.replace("Intensity_", "").replace("_m_per_s", "")
                values = subject_data[col].values
                print(f"  {region_name:12s}: {values.mean():.4f} m/s (œÉ={values.std():.4f})")
    
    # Anomaly Detection
    print("\n" + "="*80)
    print("ANOMALY DETECTION")
    print("="*80)
    print("\nIdentifying sessions that deviate from subject baseline:")
    
    for subject_id, subject_data in subjects:
        if len(subject_data) < 2:
            print(f"\n{subject_id}: Need ‚â•2 sessions for anomaly detection")
            continue
        
        print(f"\n{subject_id}:")
        
        # Check key metrics for outliers (Z-score method)
        anomaly_metrics = [
            "Path_Length_Total_m",
            "Intensity_Mean_m_per_s",
            "Raw_Missing_Data_Percent",
        ]
        
        anomalies_found = False
        
        for metric in anomaly_metrics:
            if metric in subject_data.columns:
                values = subject_data[metric].values
                mean = values.mean()
                std = values.std()
                
                if std > 0:
                    z_scores = (values - mean) / std
                    
                    # Flag outliers (|Z| > 2)
                    outlier_mask = np.abs(z_scores) > 2
                    
                    if outlier_mask.any():
                        anomalies_found = True
                        outlier_sessions = subject_data[outlier_mask]['Session_ID'].values
                        outlier_values = values[outlier_mask]
                        outlier_z = z_scores[outlier_mask]
                        
                        print(f"\n  ‚ö†Ô∏è {metric}:")
                        for session, value, z in zip(outlier_sessions, outlier_values, outlier_z):
                            direction = "above" if z > 0 else "below"
                            print(f"      {session}: {value:.4f} (Z={z:.2f}, {direction} baseline)")
        
        if not anomalies_found:
            print("  ‚úÖ No significant anomalies detected")
    
    # Trend Analysis (if sessions are ordered by time)
    print("\n" + "="*80)
    print("TREND ANALYSIS")
    print("="*80)
    
    for subject_id, subject_data in subjects:
        if len(subject_data) < 3:
            print(f"\n{subject_id}: Need ‚â•3 sessions for trend analysis")
            continue
        
        print(f"\n{subject_id}:")
        
        # Assume sessions are ordered by Session_ID or Run_ID
        subject_data_sorted = subject_data.sort_values('Session_ID')
        
        trend_metrics = [
            ("Path_Length_Total_m", "Total Movement"),
            ("Intensity_Mean_m_per_s", "Movement Intensity"),
            ("Bilateral_Symmetry_Mean", "Symmetry"),
        ]
        
        for metric, label in trend_metrics:
            if metric in subject_data_sorted.columns:
                values = subject_data_sorted[metric].values
                
                # Simple linear trend (correlation with session order)
                session_order = np.arange(len(values))
                correlation = np.corrcoef(session_order, values)[0, 1]
                
                # Trend direction
                if correlation > 0.5:
                    trend = "üìà INCREASING"
                elif correlation < -0.5:
                    trend = "üìâ DECREASING"
                else:
                    trend = "‚û°Ô∏è STABLE"
                
                print(f"  {label:20s}: {trend} (r={correlation:.3f})")

else:
    print("="*80)
    print("CROSS-SESSION ANALYSIS")
    print("="*80)
    print(f"\n‚ö†Ô∏è Only {len(df_engineering)} session(s) available.")
    print("Cross-session analysis requires multiple sessions.")
    print("\nTo enable:")
    print("  1. Process multiple sessions through the pipeline (notebooks 01-06)")
    print("  2. Re-run this notebook")
    print(f"\nCurrent session: {df_engineering['Run_ID'].iloc[0] if len(df_engineering) > 0 else 'N/A'}")

CROSS-SESSION ANALYSIS

Analyzing 3 sessions across 1 subject(s)

SUBJECT-LEVEL SUMMARY

Subject: 671 (3 sessions)

Sessions:
  ‚Ä¢ T1: 255.0s, 30604 frames
  ‚Ä¢ T2: 253.0s, 30356 frames
  ‚Ä¢ T3: 263.9s, 31674 frames

--------------------------------------------------------------------------------
Key Metrics Across Sessions:
--------------------------------------------------------------------------------

Path_Length_Total_m:
  Mean ¬± Std: 1092.3367 ¬± 9.5772
  Range: [1082.4100, 1105.2800]
  CV%: 0.88%

Intensity_Mean_m_per_s:
  Mean ¬± Std: 0.2235 ¬± 0.0059
  Range: [0.2158, 0.2300]
  CV%: 2.62%

Bilateral_Symmetry_Mean:
  Mean ¬± Std: 0.9083 ¬± 0.0236
  Range: [0.8860, 0.9410]
  CV%: 2.60%

Raw_Missing_Data_Percent:
  Mean ¬± Std: 0.0000 ¬± 0.0000
  Range: [0.0000, 0.0000]
  CV%: 0.00%

Bone_Length_CV_Percent:
  Mean ¬± Std: 0.8603 ¬± 0.4822
  Range: [0.4630, 1.5390]
  CV%: 56.05%
  ‚ö†Ô∏è HIGH VARIABILITY (CV > 25%)

-------------------------------------------------------------

## SECTION 13: Subject Profiles Export (Phase 4)

Export aggregated subject-level profiles for longitudinal analysis.

In [9]:
# ============================================================
# SUBJECT PROFILES EXPORT (Phase 4)
# ============================================================

if len(df_engineering) > 0:
    print("="*80)
    print("GENERATING SUBJECT PROFILES")
    print("="*80)
    
    # Build subject profiles
    subject_profiles = []
    
    for subject_id in df_engineering['Subject_ID'].unique():
        profile = build_subject_profile(df_engineering, subject_id)
        subject_profiles.append(profile)
    
    print(f"\nGenerated {len(subject_profiles)} subject profile(s)")
    
    # Create output directory and timestamp
    REPORTS_DIR = os.path.join(PROJECT_ROOT, "reports")
    os.makedirs(REPORTS_DIR, exist_ok=True)
    timestamp_str = datetime.now().strftime('%Y%m%d_%H%M%S')
    
    # Save to JSON
    subject_profiles_path = os.path.join(REPORTS_DIR, f"Subject_Profiles_{timestamp_str}.json")
    
    with open(subject_profiles_path, 'w', encoding='utf-8') as f:
        json.dump(subject_profiles, f, indent=2)
    
    print(f"‚úÖ Saved: {subject_profiles_path}")
    
    # Display summary
    print("\n" + "="*80)
    print("SUBJECT PROFILES SUMMARY")
    print("="*80)
    
    for profile in subject_profiles:
        subject_id = profile.get('subject_id', 'Unknown')
        n_sessions = profile.get('n_sessions', 0)
        
        print(f"\n{'-'*80}")
        print(f"Subject: {subject_id} ({n_sessions} sessions)")
        print(f"{'-'*80}")
        
        # Key metrics
        if 'Path_Length_Total_m_mean' in profile:
            print(f"\nMovement:")
            print(f"  Path Length:  {profile['Path_Length_Total_m_mean']:.2f}m (œÉ={profile['Path_Length_Total_m_std']:.2f})")
            print(f"  Intensity:    {profile['Intensity_Mean_m_per_s_mean']:.4f} m/s (œÉ={profile['Intensity_Mean_m_per_s_std']:.4f})")
        
        if 'Bilateral_Symmetry_Mean_mean' in profile:
            print(f"\nSymmetry:")
            print(f"  Mean Index:   {profile['Bilateral_Symmetry_Mean_mean']:.3f} (œÉ={profile['Bilateral_Symmetry_Mean_std']:.3f})")
        
        if 'Raw_Missing_Data_Percent_mean' in profile:
            print(f"\nData Quality:")
            print(f"  Missing Data: {profile['Raw_Missing_Data_Percent_mean']:.2f}% (œÉ={profile['Raw_Missing_Data_Percent_std']:.2f}%)")
            print(f"  Bone CV%:     {profile['Bone_Length_CV_Percent_mean']:.3f}% (œÉ={profile['Bone_Length_CV_Percent_std']:.3f}%)")
        
        # Consistency assessment
        if 'consistency_assessment' in profile:
            print(f"\nConsistency:")
            for metric, assessment in profile['consistency_assessment'].items():
                metric_short = metric.replace("_m", "").replace("_Percent", "").replace("_", " ")
                print(f"  {metric_short:30s}: {assessment}")
        
        # Movement signature
        if 'movement_signature' in profile:
            print(f"\nMovement Pattern (Intensity by Region):")
            for region, stats in sorted(profile['movement_signature'].items(), key=lambda x: x[1]['mean'], reverse=True):
                print(f"  {region:12s}: {stats['mean']:.4f} m/s (œÉ={stats['std']:.4f})")
    
    print("\n" + "="*80)
    print(f"Subject profiles saved to: {subject_profiles_path}")
    print("="*80)

else:
    print("‚ö†Ô∏è No sessions available for subject profile generation")

GENERATING SUBJECT PROFILES

Generated 1 subject profile(s)
‚úÖ Saved: c:\Users\drorh\OneDrive - Mobileye\Desktop\gaga\reports\Subject_Profiles_20260215_130904.json

SUBJECT PROFILES SUMMARY

--------------------------------------------------------------------------------
Subject: 671 (3 sessions)
--------------------------------------------------------------------------------

Movement:
  Path Length:  1092.34m (œÉ=9.58)
  Intensity:    0.2235 m/s (œÉ=0.0059)

Symmetry:
  Mean Index:   0.908 (œÉ=0.024)

Data Quality:
  Missing Data: 0.00% (œÉ=0.00%)
  Bone CV%:     0.860% (œÉ=0.482%)

Consistency:
  Duration sec                  : VERY_CONSISTENT
  Path Length Total             : VERY_CONSISTENT
  Intensity Mean per s          : VERY_CONSISTENT
  Bilateral Symmetry Mean       : VERY_CONSISTENT
  Raw Missing Data              : VERY_CONSISTENT
  Bone Length CV                : HIGHLY_VARIABLE

Movement Pattern (Intensity by Region):
  Wrists      : 0.4572 m/s (œÉ=0.0220)
  Elbows      

In [10]:
print_section_header("CAPTURE BASELINE PROFILE (RAW STATE)")

cols_baseline = [
    'Run_ID',
    'Total_Frames',
    'Duration_sec',
    'Native_Sampling_Rate_Hz',
    'Raw_Missing_Data_Percent',
    'OptiTrack_System_Error_mm',
    'True_Raw_SNR_Mean_dB',
    'True_Raw_SNR_Min_dB',
    'True_Raw_SNR_Max_dB',
    'SNR_Joints_Excellent_Count',
    'SNR_Joints_Failed_Count'
]

display(df_engineering[cols_baseline])

print("\n" + "="*80)
print("BASELINE SUMMARY STATISTICS")
print("="*80)

print(f"\nRecording Duration:")
print(f"  Total: {df_engineering['Duration_sec'].sum():.1f} seconds ({df_engineering['Duration_sec'].sum()/60:.1f} minutes)")
print(f"  Mean: {df_engineering['Duration_sec'].mean():.1f} seconds")
print(f"  Range: {df_engineering['Duration_sec'].min():.1f} - {df_engineering['Duration_sec'].max():.1f} seconds")

print(f"\nRaw Data Completeness:")
pristine_count = (df_engineering['Raw_Missing_Data_Percent'] == 0).sum()
print(f"  Pristine (0% missing): {pristine_count}/{len(df_engineering)} recordings")
print(f"  Mean Missing: {df_engineering['Raw_Missing_Data_Percent'].mean():.3f}%")
print(f"  Max Missing: {df_engineering['Raw_Missing_Data_Percent'].max():.3f}%")

print(f"\nInherent Signal Quality (Pre-Processing SNR):")
print(f"  Mean SNR: {df_engineering['True_Raw_SNR_Mean_dB'].mean():.1f} dB")
print(f"  Best Recording: {df_engineering['True_Raw_SNR_Max_dB'].max():.1f} dB")
print(f"  Worst Recording: {df_engineering['True_Raw_SNR_Min_dB'].min():.1f} dB")

# Interpret SNR levels
mean_snr = df_engineering['True_Raw_SNR_Mean_dB'].mean()
if mean_snr >= 30:
    snr_interpretation = "EXCELLENT - Publication-quality capture"
elif mean_snr >= 20:
    snr_interpretation = "GOOD - Acceptable for research"
elif mean_snr >= 15:
    snr_interpretation = "ACCEPTABLE - Review recommended"
else:
    snr_interpretation = "POOR - Check capture environment"

print(f"  Interpretation: {snr_interpretation}")

print(f"\nOptiTrack System Calibration:")
print(f"  Mean Error: {df_engineering['OptiTrack_System_Error_mm'].mean():.3f} mm")
print(f"  Max Error: {df_engineering['OptiTrack_System_Error_mm'].max():.3f} mm")

CAPTURE BASELINE PROFILE (RAW STATE)


Unnamed: 0,Run_ID,Total_Frames,Duration_sec,Native_Sampling_Rate_Hz,Raw_Missing_Data_Percent,OptiTrack_System_Error_mm,True_Raw_SNR_Mean_dB,True_Raw_SNR_Min_dB,True_Raw_SNR_Max_dB,SNR_Joints_Excellent_Count,SNR_Joints_Failed_Count
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,30604,255.03,120.005,0.0,0.0,44.1,38.7,52.6,19,0
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,30356,252.96,120.005,0.0,0.0,47.8,45.7,51.0,19,0
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,31674,263.94,120.005,0.0,0.0,47.3,43.7,51.8,19,0



BASELINE SUMMARY STATISTICS

Recording Duration:
  Total: 771.9 seconds (12.9 minutes)
  Mean: 257.3 seconds
  Range: 253.0 - 263.9 seconds

Raw Data Completeness:
  Pristine (0% missing): 3/3 recordings
  Mean Missing: 0.000%
  Max Missing: 0.000%

Inherent Signal Quality (Pre-Processing SNR):
  Mean SNR: 46.4 dB
  Best Recording: 52.6 dB
  Worst Recording: 38.7 dB
  Interpretation: EXCELLENT - Publication-quality capture

OptiTrack System Calibration:
  Mean Error: 0.000 mm
  Max Error: 0.000 mm


---

<a id="structure"></a>
## 5. Structural Integrity

**Purpose:** Verify skeleton hierarchy and biomechanical stability.

This section documents:
- Skeleton completeness (all expected segments present?)
- Bone length stability (CV% - rigid body assumption validity)
- Subject anthropometry (height, mass)
- Static pose calibration offsets

In [11]:
print_section_header("STRUCTURAL INTEGRITY PROFILE")

cols_structure = [
    'Run_ID',
    'Skeleton_Segments_Found',
    'Skeleton_Segments_Missing',
    'Bone_Length_CV_Percent',
    'Worst_Bone_Segment',
    'Subject_Height_cm',
    'Subject_Mass_kg',
    'Left_Arm_Offset_deg',
    'Right_Arm_Offset_deg'
]

display(df_engineering[cols_structure])

print("\n" + "="*80)
print("SKELETON STABILITY ANALYSIS")
print("="*80)

print(f"\nBone Length Coefficient of Variation:")
print(f"  Mean CV: {df_engineering['Bone_Length_CV_Percent'].mean():.4f}%")
print(f"  Range: {df_engineering['Bone_Length_CV_Percent'].min():.4f}% - {df_engineering['Bone_Length_CV_Percent'].max():.4f}%")
print(f"\n  Interpretation (R√°cz et al., 2025):")
print(f"    CV < 0.5%:  Excellent rigidity (research-grade)")
print(f"    CV 0.5-1%:  Good (acceptable soft tissue artifact)")
print(f"    CV 1-2%:    Marginal (review recommended)")
print(f"    CV > 2%:    Poor (tracking or marker placement issue)")

print(f"\nWorst Bone Segments (Most Variable):")
worst_bones = df_engineering.groupby('Worst_Bone_Segment')['Bone_Length_CV_Percent'].agg(['count', 'mean']).sort_values('count', ascending=False)
for bone, stats in worst_bones.head(5).iterrows():
    print(f"  {bone}: {int(stats['count'])} recordings, Mean CV = {stats['mean']:.4f}%")

print(f"\nAnthropometry:")
print(f"  Mean Height: {df_engineering['Subject_Height_cm'].mean():.1f} cm")
print(f"  Height Range: {df_engineering['Subject_Height_cm'].min():.1f} - {df_engineering['Subject_Height_cm'].max():.1f} cm")
valid_mass = df_engineering['Subject_Mass_kg'][df_engineering['Subject_Mass_kg'] > 0]
if len(valid_mass) > 0:
    print(f"  Mean Mass: {valid_mass.mean():.1f} kg")

print(f"\nStatic Pose Calibration Offsets:")
print(f"  Left Arm Mean: {df_engineering['Left_Arm_Offset_deg'].mean():.2f}¬∞")
print(f"  Right Arm Mean: {df_engineering['Right_Arm_Offset_deg'].mean():.2f}¬∞")
print(f"  Max Bilateral Asymmetry: {abs(df_engineering['Left_Arm_Offset_deg'] - df_engineering['Right_Arm_Offset_deg']).max():.2f}¬∞")

STRUCTURAL INTEGRITY PROFILE


Unnamed: 0,Run_ID,Skeleton_Segments_Found,Skeleton_Segments_Missing,Bone_Length_CV_Percent,Worst_Bone_Segment,Subject_Height_cm,Subject_Mass_kg,Left_Arm_Offset_deg,Right_Arm_Offset_deg
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,51,0,0.463,Hips->Spine,152.66,0.0,12.65,11.64
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,51,0,0.579,Hips->Spine,149.98,0.0,10.79,9.69
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,51,0,1.539,Hips->Spine,153.68,0.0,8.01,4.82



SKELETON STABILITY ANALYSIS

Bone Length Coefficient of Variation:
  Mean CV: 0.8603%
  Range: 0.4630% - 1.5390%

  Interpretation (R√°cz et al., 2025):
    CV < 0.5%:  Excellent rigidity (research-grade)
    CV 0.5-1%:  Good (acceptable soft tissue artifact)
    CV 1-2%:    Marginal (review recommended)
    CV > 2%:    Poor (tracking or marker placement issue)

Worst Bone Segments (Most Variable):
  Hips->Spine: 3 recordings, Mean CV = 0.8603%

Anthropometry:
  Mean Height: 152.1 cm
  Height Range: 150.0 - 153.7 cm

Static Pose Calibration Offsets:
  Left Arm Mean: 10.48¬∞
  Right Arm Mean: 8.72¬∞
  Max Bilateral Asymmetry: 3.19¬∞


In [12]:
# Display selected segments for one representative run
if len(runs_data) > 0:
    representative_run = list(runs_data.keys())[0]
    selected_segments = extract_selected_segments(runs_data[representative_run])
    
    print("\n" + "="*80)
    print(f"SELECTED KINEMATIC SEGMENTS (19 Joints)")
    print("="*80)
    print(f"\nThese segments are used for rotation analysis and kinematic computation:")
    print(f"\nRepresentative Run: {representative_run[:60]}...\n")
    
    # Group by body region
    regions = {
        "Trunk": [s for s in selected_segments if s in ['Hips', 'Spine', 'Spine1', 'Neck', 'Head']],
        "Left Upper Limb": [s for s in selected_segments if s.startswith('Left') and s in ['LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand']],
        "Right Upper Limb": [s for s in selected_segments if s.startswith('Right') and s in ['RightShoulder', 'RightArm', 'RightForeArm', 'RightHand']],
        "Left Lower Limb": [s for s in selected_segments if s.startswith('Left') and s in ['LeftUpLeg', 'LeftLeg', 'LeftFoot']],
        "Right Lower Limb": [s for s in selected_segments if s.startswith('Right') and s in ['RightUpLeg', 'RightLeg', 'RightFoot']]
    }
    
    for region, segments in regions.items():
        if segments:
            print(f"  {region}:")
            for seg in segments:
                print(f"    - {seg}")
    
    print(f"\nTotal: {len(selected_segments)} segments")


SELECTED KINEMATIC SEGMENTS (19 Joints)

These segments are used for rotation analysis and kinematic computation:

Representative Run: 671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001...

  Trunk:
    - Head
    - Hips
    - Neck
    - Spine
    - Spine1
  Left Upper Limb:
    - LeftArm
    - LeftForeArm
    - LeftHand
    - LeftShoulder
  Right Upper Limb:
    - RightArm
    - RightForeArm
    - RightHand
    - RightShoulder
  Left Lower Limb:
    - LeftFoot
    - LeftLeg
    - LeftUpLeg
  Right Lower Limb:
    - RightFoot
    - RightLeg
    - RightUpLeg

Total: 19 segments


---

<a id="signal"></a>
## 6. Signal Quality Profile

**Purpose:** TRUE RAW SNR - Capture quality assessment BEFORE any filtering.

Method: Raw data frequency analysis (signal: 0.5-10Hz, noise: 15-50Hz)

This measures inherent capture quality, NOT filtering effectiveness.

In [13]:
print_section_header("SIGNAL QUALITY PROFILE (TRUE RAW SNR)")

cols_signal = [
    'Run_ID',
    'True_Raw_SNR_Mean_dB',
    'True_Raw_SNR_Min_dB',
    'True_Raw_SNR_Max_dB',
    'SNR_Joints_Excellent_Count',
    'SNR_Joints_Failed_Count',
    'SNR_Failed_Joint_List'
]

display(df_engineering[cols_signal])

print("\n" + "="*80)
print("SNR INTERPRETATION GUIDE")
print("="*80)
print(f"\nSNR > 30 dB:  EXCELLENT - Publication-quality, minimal noise")
print(f"SNR 20-30 dB: GOOD - Acceptable for research, moderate noise")
print(f"SNR 15-20 dB: ACCEPTABLE - Usable but review recommended")
print(f"SNR < 15 dB:  POOR - High noise, check capture environment")

# Histogram of SNR distribution
print("\n" + "="*80)
print("SNR DISTRIBUTION ACROSS RECORDINGS")
print("="*80)

snr_bins = pd.cut(df_engineering['True_Raw_SNR_Mean_dB'], bins=[0, 15, 20, 30, 100], labels=['POOR', 'ACCEPTABLE', 'GOOD', 'EXCELLENT'])
snr_counts = snr_bins.value_counts().sort_index()
print(f"\n{snr_counts.to_string()}")

SIGNAL QUALITY PROFILE (TRUE RAW SNR)


Unnamed: 0,Run_ID,True_Raw_SNR_Mean_dB,True_Raw_SNR_Min_dB,True_Raw_SNR_Max_dB,SNR_Joints_Excellent_Count,SNR_Joints_Failed_Count,SNR_Failed_Joint_List
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,44.1,38.7,52.6,19,0,[]
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,47.8,45.7,51.0,19,0,[]
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,47.3,43.7,51.8,19,0,[]



SNR INTERPRETATION GUIDE

SNR > 30 dB:  EXCELLENT - Publication-quality, minimal noise
SNR 20-30 dB: GOOD - Acceptable for research, moderate noise
SNR 15-20 dB: ACCEPTABLE - Usable but review recommended
SNR < 15 dB:  POOR - High noise, check capture environment

SNR DISTRIBUTION ACROSS RECORDINGS

True_Raw_SNR_Mean_dB
POOR          0
ACCEPTABLE    0
GOOD          0
EXCELLENT     3


---

<a id="processing"></a>
## 7. Processing Transparency

**Purpose:** Document exactly what was done to the raw data.

This section provides:
- Interpolation methods used
- Resampling parameters
- Filtering strategy
- 3-stage cleaning metrics

In [14]:
print_section_header("PROCESSING TRANSPARENCY")

cols_processing = [
    'Run_ID',
    'Interpolation_Method_Positions',
    'Interpolation_Method_Rotations',
    'Resampling_Target_Hz',
    'Temporal_Grid_Std_ms',
    'Filtering_Mode',
    'Filter_Cutoff_Weighted_Avg_Hz',
    'Filter_Residual_RMS_mm'
]

display(df_engineering[cols_processing])

print("\n" + "="*80)
print("3-STAGE CLEANING METRICS")
print("="*80)

cols_cleaning = [
    'Run_ID',
    'Stage1_Total_Artifacts_Detected',
    'Stage1_Artifact_Percent',
    'Stage2_Hampel_Outliers',
    'Stage2_Hampel_Percent',
    'Stage3_Winter_Cutoff_Min_Hz',
    'Stage3_Winter_Cutoff_Max_Hz',
    'Stage3_Winter_Cutoff_Mean_Hz'
]

display(df_engineering[cols_cleaning])

print("\nProcessing Summary:")
print(f"  Stage 1 (Artifact Detection):")
print(f"    Total artifacts: {df_engineering['Stage1_Total_Artifacts_Detected'].sum()} frames")
print(f"    Mean rate: {df_engineering['Stage1_Artifact_Percent'].mean():.3f}%")

print(f"\n  Stage 2 (Hampel Filter):")
print(f"    Total outliers: {df_engineering['Stage2_Hampel_Outliers'].sum()} frames")
print(f"    Mean rate: {df_engineering['Stage2_Hampel_Percent'].mean():.3f}%")

print(f"\n  Stage 3 (Adaptive Winter):")
print(f"    Cutoff range: {df_engineering['Stage3_Winter_Cutoff_Min_Hz'].min():.1f} - {df_engineering['Stage3_Winter_Cutoff_Max_Hz'].max():.1f} Hz")
print(f"    Mean cutoff: {df_engineering['Stage3_Winter_Cutoff_Mean_Hz'].mean():.1f} Hz")

print(f"\n  Filter Residual (Price of Smoothing):")
print(f"    Mean RMS: {df_engineering['Filter_Residual_RMS_mm'].mean():.2f} mm")
print(f"    Range: {df_engineering['Filter_Residual_RMS_mm'].min():.2f} - {df_engineering['Filter_Residual_RMS_mm'].max():.2f} mm")

PROCESSING TRANSPARENCY


Unnamed: 0,Run_ID,Interpolation_Method_Positions,Interpolation_Method_Rotations,Resampling_Target_Hz,Temporal_Grid_Std_ms,Filtering_Mode,Filter_Cutoff_Weighted_Avg_Hz,Filter_Residual_RMS_mm
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,CubicSpline,SLERP,120.0,0.0,3_stage_pipeline,0.0,0.0
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,CubicSpline,SLERP,120.0,0.0,3_stage_pipeline,0.0,0.0
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,CubicSpline,SLERP,120.0,0.0,3_stage_pipeline,0.0,0.0



3-STAGE CLEANING METRICS


Unnamed: 0,Run_ID,Stage1_Total_Artifacts_Detected,Stage1_Artifact_Percent,Stage2_Hampel_Outliers,Stage2_Hampel_Percent,Stage3_Winter_Cutoff_Min_Hz,Stage3_Winter_Cutoff_Max_Hz,Stage3_Winter_Cutoff_Mean_Hz
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,7453,0.427,5416,0.31,14.5,15.2,14.69
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,8676,0.501,4270,0.247,14.5,15.2,14.66
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,9880,0.547,6156,0.341,14.5,15.2,14.87



Processing Summary:
  Stage 1 (Artifact Detection):
    Total artifacts: 26009 frames
    Mean rate: 0.492%

  Stage 2 (Hampel Filter):
    Total outliers: 15842 frames
    Mean rate: 0.299%

  Stage 3 (Adaptive Winter):
    Cutoff range: 14.5 - 15.2 Hz
    Mean cutoff: 14.7 Hz

  Filter Residual (Price of Smoothing):
    Mean RMS: 0.00 mm
    Range: 0.00 - 0.00 mm


---

<a id="kinematics"></a>
## 8. Kinematic Extremes (Processed Output)

**Purpose:** Report the final kinematic values after all processing.

These are the peak velocities and accelerations extracted from the cleaned data.

In [15]:
print_section_header("KINEMATIC EXTREMES (PROCESSED OUTPUT)")

cols_kinematics = [
    'Run_ID',
    'Max_Angular_Velocity_deg_s',
    'Max_Angular_Acceleration_deg_s2',
    'Max_Linear_Velocity_mm_s',
    'Max_Linear_Acceleration_mm_s2',
    'Path_Length_Total_m',
    'Intensity_Mean_m_per_s'
]

# Only display columns that exist
cols_to_display = [col for col in cols_kinematics if col in df_engineering.columns]
display(df_engineering[cols_to_display])

print("\n" + "="*80)
print("KINEMATIC SUMMARY STATISTICS")
print("="*80)

print(f"\nAngular Velocity:")
print(f"  Peak (across all recordings): {df_engineering['Max_Angular_Velocity_deg_s'].max():.2f} deg/s")
print(f"  Mean of maxima: {df_engineering['Max_Angular_Velocity_deg_s'].mean():.2f} deg/s")
print(f"\n  Reference Values:")
print(f"    Normal movement: < 800 deg/s")
print(f"    Athletic: 800-1500 deg/s")
print(f"    Gaga dance (distal): up to 2250 deg/s")
print(f"    Tracking artifact threshold: > 2500 deg/s")

print(f"\nAngular Acceleration:")
print(f"  Peak: {df_engineering['Max_Angular_Acceleration_deg_s2'].max():.0f} deg/s¬≤")
print(f"  Mean of maxima: {df_engineering['Max_Angular_Acceleration_deg_s2'].mean():.0f} deg/s¬≤")
print(f"\n  Reference Values:")
print(f"    Smooth movement: < 30,000 deg/s¬≤")
print(f"    Rapid transitions: 30,000-50,000 deg/s¬≤")
print(f"    Extreme/impact: > 50,000 deg/s¬≤")

print(f"\nLinear Acceleration:")
print(f"  Peak: {df_engineering['Max_Linear_Acceleration_mm_s2'].max():.0f} mm/s¬≤ ({df_engineering['Max_Linear_Acceleration_mm_s2'].max()/1000:.1f} m/s¬≤)")
print(f"  Mean of maxima: {df_engineering['Max_Linear_Acceleration_mm_s2'].mean():.0f} mm/s¬≤ ({df_engineering['Max_Linear_Acceleration_mm_s2'].mean()/1000:.1f} m/s¬≤)")

# Path Length Summary (using new anatomical region columns)
if 'Path_Length_Total_m' in df_engineering.columns:
    print(f"\nPath Length (Total Movement):")
    print(f"  Total: {df_engineering['Path_Length_Total_m'].sum():.1f} meters")
    print(f"  Mean per recording: {df_engineering['Path_Length_Total_m'].mean():.1f} meters")
    print(f"  Range: {df_engineering['Path_Length_Total_m'].min():.1f} - {df_engineering['Path_Length_Total_m'].max():.1f} meters")
    
    # Show most active region
    region_cols = ['Path_Neck_m', 'Path_Shoulders_m', 'Path_Elbows_m', 'Path_Wrists_m', 
                   'Path_Spine_m', 'Path_Hips_m', 'Path_Knees_m', 'Path_Ankles_m']
    if all(col in df_engineering.columns for col in region_cols):
        region_means = {col.replace('Path_', '').replace('_m', ''): df_engineering[col].mean() 
                       for col in region_cols}
        most_active = max(region_means.items(), key=lambda x: x[1])
        print(f"  Most active region: {most_active[0]} ({most_active[1]:.2f}m average)")
else:
    print(f"\n‚ö†Ô∏è Path Length: Not computed (missing Path_Length_Total_m column)")
    print(f"   ‚Üí Re-run notebook 06_ultimate_kinematics.ipynb")

KINEMATIC EXTREMES (PROCESSED OUTPUT)


Unnamed: 0,Run_ID,Max_Angular_Velocity_deg_s,Max_Angular_Acceleration_deg_s2,Max_Linear_Velocity_mm_s,Max_Linear_Acceleration_mm_s2,Path_Length_Total_m,Intensity_Mean_m_per_s
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,1456.67,34366.89,3400.89,47780.1,1089.32,0.2248
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,1311.73,28406.33,3853.08,46353.4,1105.28,0.23
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,1693.7,40472.58,4299.8,57911.68,1082.41,0.2158



KINEMATIC SUMMARY STATISTICS

Angular Velocity:
  Peak (across all recordings): 1693.70 deg/s
  Mean of maxima: 1487.37 deg/s

  Reference Values:
    Normal movement: < 800 deg/s
    Athletic: 800-1500 deg/s
    Gaga dance (distal): up to 2250 deg/s
    Tracking artifact threshold: > 2500 deg/s

Angular Acceleration:
  Peak: 40473 deg/s¬≤
  Mean of maxima: 34415 deg/s¬≤

  Reference Values:
    Smooth movement: < 30,000 deg/s¬≤
    Rapid transitions: 30,000-50,000 deg/s¬≤
    Extreme/impact: > 50,000 deg/s¬≤

Linear Acceleration:
  Peak: 57912 mm/s¬≤ (57.9 m/s¬≤)
  Mean of maxima: 50682 mm/s¬≤ (50.7 m/s¬≤)

Path Length (Total Movement):
  Total: 3277.0 meters
  Mean per recording: 1092.3 meters
  Range: 1082.4 - 1105.3 meters
  Most active region: Wrists (117.56m average)


---

<a id="noise"></a>
## 9. Per-Joint Noise Profile

**Purpose:** Root cause analysis for noisy segments.

This section identifies:
- Which joints have the most outlier frames?
- Is the noise localized (one joint) or systemic (whole skeleton)?
- Are the outliers sporadic glitches or sustained high-intensity movement?

In [16]:
print_section_header("PER-JOINT NOISE PROFILE")

# Extract per-joint profiles for one representative run
if len(runs_data) > 0:
    # Pick the run with highest outlier rate for demonstration
    worst_run_idx = df_engineering['Outlier_Frames_Percent'].idxmax()
    worst_run_id = df_engineering.loc[worst_run_idx, 'Run_ID']
    
    print(f"\nAnalyzing: {worst_run_id}")
    print(f"(Run with highest outlier rate: {df_engineering.loc[worst_run_idx, 'Outlier_Frames_Percent']:.3f}%)\n")
    
    # Extract per-joint profile
    joint_profile = extract_per_joint_noise_profile(runs_data[worst_run_id])
    
    if not joint_profile.empty:
        # Sort by outlier percentage
        joint_profile_sorted = joint_profile.sort_values('Outlier_Percent', ascending=False)
        
        print("Per-Joint Outlier Profile:")
        print("="*80)
        display(joint_profile_sorted)
        
        # Compute noise locality index
        locality_index = compute_noise_locality_index(joint_profile)
        
        print(f"\nNoise Locality Index: {locality_index:.2f}")
        print(f"\nInterpretation:")
        if locality_index > 5:
            print(f"  HIGH - Localized tracking issue in specific joint(s)")
            print(f"  ‚Üí Check marker placement and occlusion for worst joints")
        elif locality_index > 2:
            print(f"  MEDIUM - Regional problem (e.g., one limb)")
            print(f"  ‚Üí Review calibration for affected body region")
        else:
            print(f"  LOW - Systemic noise across skeleton")
            print(f"  ‚Üí Check capture environment (lighting, camera calibration)")
        
        # Classification summary
        print(f"\nClassification Summary:")
        for classification, count in joint_profile['Classification'].value_counts().items():
            print(f"  {classification}: {count} joints")
    else:
        print("‚ö†Ô∏è No per-joint outlier data available for this run.")
else:
    print("‚ö†Ô∏è No runs available for analysis.")

PER-JOINT NOISE PROFILE

Analyzing: 671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001
(Run with highest outlier rate: 2.872%)

Per-Joint Outlier Profile:


Unnamed: 0,Joint,WARNING_Frames,ALERT_Frames,CRITICAL_Frames,Outlier_Percent,Classification
14,LeftHand,830,49,0,2.872,Systemic_Noise
18,RightHand,371,23,0,1.287,Systemic_Noise
13,LeftForeArm,345,0,0,1.127,Systemic_Noise
17,RightForeArm,262,13,0,0.899,Sporadic_Glitches
10,RightFoot,13,0,0,0.042,Clean
6,LeftLeg,9,0,0,0.029,Clean
16,RightArm,0,0,0,0.0,Clean
15,RightShoulder,0,0,0,0.0,Clean
12,LeftArm,0,0,0,0.0,Clean
11,LeftShoulder,0,0,0,0.0,Clean



Noise Locality Index: 2.75

Interpretation:
  MEDIUM - Regional problem (e.g., one limb)
  ‚Üí Review calibration for affected body region

Classification Summary:
  Clean: 15 joints
  Systemic_Noise: 3 joints
  Sporadic_Glitches: 1 joints


---

<a id="outliers"></a>
## 10. Outlier Distribution

**Purpose:** Frame-level outlier patterns and event classification.

This section documents:
- Total outlier frames and percentage
- Maximum consecutive outlier runs
- Tier 1/2/3 event classification (Artifact/Burst/Flow)
- Data retention after artifact exclusion

In [17]:
print_section_header("OUTLIER DISTRIBUTION ANALYSIS")

cols_outliers = [
    'Run_ID',
    'Total_Outlier_Frames',
    'Outlier_Frames_Percent',
    'Max_Consecutive_Outlier_Frames',
    'Artifact_Events_Tier1',
    'Burst_Events_Tier2',
    'Flow_Events_Tier3',
    'Artifact_Frame_Rate_Percent'
]

display(df_engineering[cols_outliers])

print("\n" + "="*80)
print("EVENT CLASSIFICATION (3-TIER SYSTEM)")
print("="*80)

print(f"\nTier 1 - Artifacts (1-3 consecutive frames):")
print(f"  Total events: {df_engineering['Artifact_Events_Tier1'].sum()}")
print(f"  Mean per recording: {df_engineering['Artifact_Events_Tier1'].mean():.1f}")
print(f"  Frame rate: {df_engineering['Artifact_Frame_Rate_Percent'].mean():.4f}%")
print(f"  Interpretation: Sporadic tracking glitches (excluded from analysis)")

print(f"\nTier 2 - Bursts (4-7 consecutive frames):")
print(f"  Total events: {df_engineering['Burst_Events_Tier2'].sum()}")
print(f"  Mean per recording: {df_engineering['Burst_Events_Tier2'].mean():.1f}")
print(f"  Interpretation: Rapid movement transitions (preserved for Gaga analysis)")

print(f"\nTier 3 - Flows (8+ consecutive frames):")
print(f"  Total events: {df_engineering['Flow_Events_Tier3'].sum()}")
print(f"  Mean per recording: {df_engineering['Flow_Events_Tier3'].mean():.1f}")
print(f"  Interpretation: Sustained high-intensity movement (legitimate dance)")

print("\n" + "="*80)
print("DATA RETENTION AFTER ARTIFACT EXCLUSION")
print("="*80)

cols_retention = [
    'Run_ID',
    'Clean_Max_Velocity_deg_s',
    'Clean_Mean_Velocity_deg_s',
    'Velocity_Reduction_Percent',
    'Data_Retained_Percent',
    'Excluded_Frame_Count'
]

display(df_engineering[cols_retention])

print(f"\nRetention Summary:")
print(f"  Mean data retained: {df_engineering['Data_Retained_Percent'].mean():.4f}%")
print(f"  Total frames excluded: {df_engineering['Excluded_Frame_Count'].sum()}")
print(f"  Mean velocity reduction: {df_engineering['Velocity_Reduction_Percent'].mean():.2f}%")

OUTLIER DISTRIBUTION ANALYSIS


Unnamed: 0,Run_ID,Total_Outlier_Frames,Outlier_Frames_Percent,Max_Consecutive_Outlier_Frames,Artifact_Events_Tier1,Burst_Events_Tier2,Flow_Events_Tier3,Artifact_Frame_Rate_Percent
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,879,2.872,0,37,119,67,0.2614
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,732,2.411,0,40,83,34,0.3228
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,871,2.75,0,30,64,61,0.2021



EVENT CLASSIFICATION (3-TIER SYSTEM)

Tier 1 - Artifacts (1-3 consecutive frames):
  Total events: 107
  Mean per recording: 35.7
  Frame rate: 0.2621%
  Interpretation: Sporadic tracking glitches (excluded from analysis)

Tier 2 - Bursts (4-7 consecutive frames):
  Total events: 266
  Mean per recording: 88.7
  Interpretation: Rapid movement transitions (preserved for Gaga analysis)

Tier 3 - Flows (8+ consecutive frames):
  Total events: 162
  Mean per recording: 54.0
  Interpretation: Sustained high-intensity movement (legitimate dance)

DATA RETENTION AFTER ARTIFACT EXCLUSION


Unnamed: 0,Run_ID,Clean_Max_Velocity_deg_s,Clean_Mean_Velocity_deg_s,Velocity_Reduction_Percent,Data_Retained_Percent,Excluded_Frame_Count
0,671_T1_P1_R1_Take 2026-01-06 03.57.12 PM_001,799.89,44.47,45.09,99.7386,80
1,671_T2_P1_R1_Take 2026-01-15 04.35.25 PM_005,799.98,48.01,39.01,99.6772,98
2,671_T3_P1_R1_Take 2026-02-03 08.05.01 PM_000,799.66,40.14,52.79,99.7979,64



Retention Summary:
  Mean data retained: 99.7379%
  Total frames excluded: 242
  Mean velocity reduction: 45.63%


---

<a id="export"></a>
## 11. Excel Export

**Output:** `reports/Engineering_Audit_YYYYMMDD_HHMMSS.xlsx`

**Sheets:**
1. Engineering_Profile - All physical measurements (no scores)
2. Methodology_Passport - Mathematical documentation

In [18]:
print_section_header("EXPORT TO EXCEL")

# Create output path
REPORTS_DIR = os.path.join(PROJECT_ROOT, "reports")
os.makedirs(REPORTS_DIR, exist_ok=True)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
excel_path = os.path.join(REPORTS_DIR, f"Engineering_Audit_{timestamp}.xlsx")

# Export to Excel
with pd.ExcelWriter(excel_path, engine='xlsxwriter') as writer:
    workbook = writer.book
    
    # Formats
    title_fmt = workbook.add_format({
        'bold': True, 'font_size': 16, 
        'bg_color': '#2E75B6', 'font_color': 'white'
    })
    header_fmt = workbook.add_format({
        'bold': True, 'bg_color': '#4472C4', 
        'font_color': 'white', 'text_wrap': True
    })
    
    # ============================================================
    # SHEET 1: ENGINEERING PROFILE
    # ============================================================
    df_engineering.to_excel(writer, index=False, sheet_name='Engineering_Profile')
    
    ws_profile = writer.sheets['Engineering_Profile']
    for col_num, value in enumerate(df_engineering.columns):
        ws_profile.write(0, col_num, value, header_fmt)
    
    # Auto-fit columns
    for i, col in enumerate(df_engineering.columns):
        max_len = max(df_engineering[col].astype(str).str.len().max(), len(str(col)))
        ws_profile.set_column(i, i, min(max_len + 2, 50))
    
    # ============================================================
    # SHEET 2: METHODOLOGY PASSPORT
    # ============================================================
    methodology_sheet = workbook.add_worksheet('Methodology_Passport')
    
    methodology_sheet.merge_range('A1:E1', 'METHODOLOGY PASSPORT - MATHEMATICAL DOCUMENTATION', title_fmt)
    methodology_sheet.write('A2', f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    methodology_sheet.write('A3', f"Pipeline Version: v3.0_3stage_signal_cleaning")
    
    row = 5
    
    # Write methodology as structured text
    methodology_sheet.write(row, 0, 'INTERPOLATION METHODS', header_fmt)
    row += 1
    
    for key, method in METHODOLOGY_PASSPORT["interpolation"].items():
        methodology_sheet.write(row, 0, key.title())
        row += 1
        for field, value in method.items():
            methodology_sheet.write(row, 1, field)
            methodology_sheet.write(row, 2, str(value))
            row += 1
        row += 1
    
    methodology_sheet.set_column('A:A', 20)
    methodology_sheet.set_column('B:B', 25)
    methodology_sheet.set_column('C:E', 60)

print(f"\n‚úÖ Engineering Audit Created:")
print(f"   {excel_path}")
print(f"\n   Sheets:")
print(f"   1. Engineering_Profile - {len(df_engineering)} recordings √ó {len(df_engineering.columns)} measurements")
print(f"   2. Methodology_Passport - Mathematical documentation")

print(f"\n" + "="*80)
print("NOTEBOOK COMPLETE")
print("="*80)
print(f"\nRecordings Processed: {len(df_engineering)}")
print(f"Excel Output: {excel_path}")
print(f"\nThis report contains ZERO synthetic scores and ZERO decision labels.")
print(f"All values are pure physical measurements for researcher interpretation.")

EXPORT TO EXCEL

‚úÖ Engineering Audit Created:
   c:\Users\drorh\OneDrive - Mobileye\Desktop\gaga\reports\Engineering_Audit_20260215_130905.xlsx

   Sheets:
   1. Engineering_Profile - 3 recordings √ó 94 measurements
   2. Methodology_Passport - Mathematical documentation

NOTEBOOK COMPLETE

Recordings Processed: 3
Excel Output: c:\Users\drorh\OneDrive - Mobileye\Desktop\gaga\reports\Engineering_Audit_20260215_130905.xlsx

This report contains ZERO synthetic scores and ZERO decision labels.
All values are pure physical measurements for researcher interpretation.
