# Experiment 1: Baseline Transfer Learning

**Goal:** Benchmark 7 pre-trained CNN backbones using standard 2-phase transfer learning to establish baseline performance for exercise recognition from Gait Energy Images (GEIs).

## Experiment Design

### Dataset
- **Modalities:** Front-view + Side-view GEIs (merged)
- **Classes:** 15 different exercises
- **Split:** 70% training / 30% testing (subject-independent)
- **Input:** 224√ó224 grayscale images (converted to RGB for pretrained models)

### Training Strategy (2-Phase Transfer Learning)
**Phase 1 (Frozen Backbone):**
- Freeze all backbone layers
- Train only custom classification head
- 10 epochs, learning rate: 0.001
- Purpose: Adapt the head to our specific task

**Phase 2 (Fine-tuning):**
- Unfreeze last few layers of backbone
- Train with lower learning rate: 0.0001
- 10 epochs
- Purpose: Fine-tune features for exercise recognition

### Backbones Tested
1. **EfficientNet-B0** - Lightweight, efficient architecture
2. **EfficientNet-B2** - Moderate capacity
3. **EfficientNet-B3** - Higher capacity
4. **ResNet50** - Classic residual network
5. **VGG16** - Deep but simple architecture
6. **MobileNet-V2** - Mobile-optimized, efficient
7. **MobileNet-V3-Large** - Latest mobile architecture

### Evaluation Protocol
- **Runs per backbone:** 10 independent runs (different random splits)
- **Metrics:** Training accuracy, test accuracy, confusion matrix
- **Statistical analysis:** Mean ¬± standard deviation across runs

---

## Setup: TensorFlow Configuration

This cell configures TensorFlow to suppress warnings and enable GPU memory growth.

**Key configurations:**
- Suppress TensorFlow/CUDA warnings for cleaner output
- Enable GPU memory growth (prevents out-of-memory errors)
- Set logging levels to ERROR only

In [1]:
# CRITICAL: Run this cell FIRST before any other imports
# Suppress TensorFlow warnings at the OS level before TensorFlow loads
import os
import sys
import warnings
import io
import tensorflow as tf

# Set environment variables BEFORE TensorFlow is imported anywhere
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # 0=all, 1=filter INFO, 2=filter WARNING, 3=errors only
os.environ['AUTOGRAPH_VERBOSITY'] = '0'   # Disable AutoGraph conversion warnings

# Filter Python warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)

# Suppress absl logging (used by TensorFlow internally)
try:
    from absl import logging as absl_logging
    absl_logging.set_verbosity(absl_logging.ERROR)
except ImportError:
    pass

# Redirect stderr temporarily to suppress any remaining warnings during TF import
stderr_backup = sys.stderr
sys.stderr = io.StringIO()

# Restore stderr
sys.stderr = stderr_backup

# Final TensorFlow logging configuration
try:
    tf.get_logger().setLevel('ERROR')
    tf.autograph.set_verbosity(0)
except Exception:
    pass

# Enable GPU memory growth
try:
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
except Exception:
    pass

print("‚úÖ TensorFlow imported with all warnings suppressed")
print("   TF_CPP_MIN_LOG_LEVEL:", os.environ.get('TF_CPP_MIN_LOG_LEVEL'))
print("   AUTOGRAPH_VERBOSITY:", os.environ.get('AUTOGRAPH_VERBOSITY'))
print("   TensorFlow version:", tf.__version__)
print("   GPUs detected:", len(tf.config.list_physical_devices('GPU')))

   TF_CPP_MIN_LOG_LEVEL: 3
   AUTOGRAPH_VERBOSITY: 0
   TensorFlow version: 2.10.0
   GPUs detected: 1


## Import Modules

**Data modules (`src.data`):**
- `load_data` - Load GEI images from folder structure
- `split_training_testing_by_subject` - Subject-independent data splitting

**Model modules (`src.models`):**
- `build_model_for_backbone` - Build transfer learning models

**Training modules (`src.Training`):**
- `train_one_run` - Execute one complete training run (Phase 1 + Phase 2)

**Utility modules (`src.utils`):**
- `setup_results_folder_for_backbone` - Organize results by backbone
- `save_experiment_summary` - Save metrics and statistics
- `get_all_model_parameters` - Count model parameters
- `load_backbone_results_with_config` - Load saved results
- `create_comprehensive_comparison` - Generate comparison reports
- `generate_statistical_comparison` - Statistical analysis

In [2]:
# Import refactored modules
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname('__file__'), '../..')))

# Data loading and preprocessing
from src.data import load_data, split_training_testing_by_subject, get_subjects_identities

# Model building
from src.models import build_model_for_backbone

# Training experiments
from src.scripts import train_one_run

# Utilities
from src.utils import (
    set_global_seed,
    setup_results_folder_for_backbone,
    save_experiment_summary,
    get_all_model_parameters,
    load_backbone_results_with_config,
    create_comprehensive_comparison,
    generate_statistical_comparison
)

# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import logging
from tqdm import tqdm

# Access the logger
logger = logging.getLogger(__name__)

print("‚úÖ All modules imported successfully from refactored structure")

‚úÖ All modules imported successfully from refactored structure


## Data Loading (Front + Side Views)

**Merging front + side views**
- Provides complementary perspectives of each exercise
- Improves model generalization

**Data structure:** Each sample is a tuple `(exercise_label, gei_image, subject_id)`
- `exercise_label` (str): Exercise name (e.g., "Dumbbell shoulder press")
- `gei_image` (np.ndarray): 2D grayscale image (H√óW)
- `subject_id` (str): Volunteer identifier for subject-independent splitting

In [6]:
# Folder paths - using the datasets folder in the project
front_base_folder = "D:/Graduation_Project/ai-virtual-coach/datasets/GEIs_of_rgb_front/GEIs"
side_base_folder = "D:/Graduation_Project/ai-virtual-coach/datasets/GEIs_of_rgb_side/GEIs"

# Load both datasets using refactored module
front_dataset = load_data(front_base_folder)
side_dataset = load_data(side_base_folder)

# Merge and shuffle
dataset = front_dataset + side_dataset
random.seed(42)
random.shuffle(dataset)

# Summary
print(f"Merged dataset size: {len(dataset)} (front: {len(front_dataset)}, side: {len(side_dataset)})")

if len(dataset) > 0:
    sample = dataset[0]
    print(f"Sample tuple structure: (label:str, image:np.ndarray[H,W], subject:str) -> {type(sample[0]).__name__} {sample[1].shape} {type(sample[2]).__name__}")

subjects = get_subjects_identities(dataset)
subject_count = len(subjects)
print(f'Total unique subjects: {subject_count}')
print(f'Subject preview: {subjects}')

Merged dataset size: 3142 (front: 1574, side: 1568)
Sample tuple structure: (label:str, image:np.ndarray[H,W], subject:str) -> str (1280, 720) str
Total unique subjects: 70
Subject preview: ['V3', 'V31', 'V39', 'V4', 'V46', 'V47', 'V48', 'V5', 'V50', 'Volunteer #1', 'Volunteer #10', 'Volunteer #2', 'Volunteer #3', 'Volunteer #31', 'Volunteer #38', 'Volunteer #39', 'Volunteer #4', 'Volunteer #40', 'Volunteer #41', 'Volunteer #42', 'Volunteer #43', 'Volunteer #44', 'Volunteer #45', 'Volunteer #46', 'Volunteer #5', 'Volunteer #50', 'Volunteer #6', 'Volunteer #7', 'Volunteer #8', 'Volunteer #9', 'v1', 'v10', 'v11', 'v12', 'v13', 'v14', 'v15', 'v16', 'v17', 'v18', 'v19', 'v2', 'v20', 'v21', 'v22', 'v23', 'v24', 'v25', 'v26', 'v27', 'v28', 'v29', 'v3', 'v30', 'v31', 'v32', 'v33', 'v34', 'v35', 'v36', 'v39', 'v4', 'v46', 'v49', 'v5', 'v50', 'v6', 'v7', 'v8', 'volunteer #9']


## Train Baseline Models (7 Backbones √ó 10 Runs)

## Training Execution

**What happens in each run:**
1. **Data split:** Subject-independent train/test split (70/30 ratio)
2. **Model creation:** Build transfer learning model with frozen backbone
3. **Phase 1 training:** Train classification head (10 epochs, lr=0.001)
4. **Phase 2 training:** Fine-tune backbone layers (10 epochs, lr=0.0001)
5. **Evaluation:** Test on held-out subjects
6. **Save results:** Confusion matrix, learning curves, metrics

**Expected duration:** ~10-15 minutes per backbone (depends on GPU)

**Progress tracking:**
- Outer progress bar: Backbones (7 total)
- Inner progress bar: Runs per backbone (10 runs each)
- Logs: Real-time accuracy updates

‚ö†Ô∏è **Note:** This cell will take several hours to complete all 7 backbones √ó 10 runs = 70 training runs!

In [None]:
BACKBONES_TO_TEST = [
    'efficientnet_b0',
    'efficientnet_b2',
    'efficientnet_b3',
    'resnet50',
    'vgg16',
    'mobilenet_v2',
    'mobilenet_v3_large',
]

N_RUNS = 10
TEST_RATIO = 0.3

all_backbone_summaries = {}

# Outer progress bar for backbones
for bb in tqdm(BACKBONES_TO_TEST, desc="Backbones", position=0):
    logger.info("\n" + "#"*72)
    logger.info(f"Benchmarking backbone: {bb}")
    logger.info("#"*72)
    
    # Create results folder for this backbone (using refactored utility)
    RESULTS_FOLDER, RUN_INDEX = setup_results_folder_for_backbone(bb, base_results_dir='experiments/exer_recog/results/exp_01_baseline')
    
    all_results = []
    
    # Inner progress bar for runs
    for run_idx in tqdm(range(N_RUNS), desc=f"{bb} runs", position=1, leave=False):
        logger.info(f"\n‚ñ∂ [{bb}] Run {run_idx+1}/{N_RUNS} starting...")
        
        try:
            # Using refactored train_one_run from src.scripts.experiment_1
            res = train_one_run(
                run_idx=run_idx,
                dataset=dataset,
                test_ratio=TEST_RATIO,
                img_size=224,
                num_classes=15,
                results_folder=RESULTS_FOLDER,
                backbone=bb
            )
            all_results.append(res)
            logger.info(f"  ‚úì Completed: train_acc={res['train_acc']:.4f}  test_acc={res['test_acc']:.4f}")
            
        except Exception as e:
            logger.error(f"  ‚úó Run {run_idx} failed: {e}")
            continue
    
    # Save summary for this backbone (using refactored utility)
    if all_results:
        summary_path, json_path = save_experiment_summary(all_results, RESULTS_FOLDER, RUN_INDEX, TEST_RATIO)
        
        # Compute statistics
        test_accs = [r['test_acc'] for r in all_results]
        all_backbone_summaries[bb] = {
            'mean_test_acc': np.mean(test_accs),
            'std_test_acc': np.std(test_accs),
            'min_test_acc': np.min(test_accs),
            'max_test_acc': np.max(test_accs),
            'num_runs': len(all_results)
        }
        
        logger.info(f"\n{bb} Summary:")
        logger.info(f"  Mean Test Acc: {all_backbone_summaries[bb]['mean_test_acc']:.4f} ¬± {all_backbone_summaries[bb]['std_test_acc']:.4f}")

# Display final comparison
logger.info("\n" + "="*80)
logger.info("FINAL BACKBONE COMPARISON")
logger.info("="*80)

comparison_df = pd.DataFrame(all_backbone_summaries).T
comparison_df = comparison_df.sort_values('mean_test_acc', ascending=False)
logger.info("\n" + comparison_df.to_string())

# Save comparison
os.makedirs('experiments/exer_recog/results/exp_01_baseline', exist_ok=True)
comparison_df.to_csv('experiments/exer_recog/results/exp_01_baseline/backbone_comparison.csv')
logger.info("\nComparison saved to: experiments/exer_recog/results/exp_01_baseline/backbone_comparison.csv")

## Comprehensive Analysis

In [None]:
print("="*80)
print("COMPREHENSIVE BACKBONE COMPARISON")
print("="*80)

# Step 1: Count model parameters (NO training needed!)
backbones_to_analyze = [
    'efficientnet_b0',
    'efficientnet_b2',
    'efficientnet_b3',
    'resnet50',
    'vgg16',
    'mobilenet_v2',
    'mobilenet_v3_large',
]

print("\nStep 1: Counting model parameters...")
params_df = get_all_model_parameters(backbones_to_analyze, img_size=224, num_classes=15)

print("\nModel Parameter Comparison:")
print("="*80)
print(params_df.to_string(index=False))
print("="*80)

# Step 2: Load training results (using refactored utility)
print("\nStep 2: Loading training results...")
backbone_results = load_backbone_results_with_config(results_base_dir='experiments/exer_recog/results/exp_01_baseline')

# Step 3: Generate comparisons (using refactored utility)
print("\nStep 3: Generating comprehensive comparison...")
os.makedirs('experiments/exer_recog/results/exp_01_baseline/comparisons', exist_ok=True)
comparison_csv = create_comprehensive_comparison(
    all_backbone_results=backbone_results,
    model_params_df=params_df,
    output_dir='experiments/exer_recog/results/exp_01_baseline/comparisons'
)

# Step 4: Statistical analysis (using refactored utility)
print("\nStep 4: Performing statistical analysis...")
stats_txt = generate_statistical_comparison(
    all_backbone_results=backbone_results,
    output_dir='experiments/exer_recog/results/exp_01_baseline/comparisons'
)

print("\n‚úì Analysis complete! Results saved to: experiments/exer_recog/results/exp_01_baseline/comparisons/")

## Post-Training Analysis

This comprehensive analysis includes:

### 1. Model Parameters Count
- Total parameters (trainable + non-trainable)
- Helps understand model complexity
- No training required - just builds models and counts

### 2. Load Training Results
- Reads saved JSON files from all runs
- Aggregates metrics across runs
- Prepares data for statistical analysis

### 3. Comprehensive Comparison
- Accuracy statistics (mean, std, min, max)
- Model parameters comparison
- Performance vs complexity analysis
- Saves CSV report

### 4. Statistical Analysis
- Paired comparisons between backbones
- Confidence intervals
- Statistical significance testing (t-tests)
- Generates detailed text report

**Output location:** `experiments/exer_recog/results/exp_01_baseline/comparisons/`

---

## Experiment 1 Complete! üéâ

### Results Summary
All results saved to: `experiments/exer_recog/results/exp_01_baseline/`

**Generated files:**
- `backbone_comparison.csv` - Quick comparison table
- `comparisons/comprehensive_comparison.csv` - Detailed metrics + parameters
- `comparisons/statistical_analysis.txt` - Statistical tests
- Individual backbone folders with run-specific results

### Key Findings to Look For

**Best Backbone:**
- Highest mean test accuracy
- Lowest standard deviation (most reliable)
- Balance between performance and parameter count

**Training Stability:**
- Which backbones converge fastest?
- Which show most consistent results across runs?

**Model Efficiency:**
- Accuracy per million parameters
- Inference speed (can measure separately)

### Next Steps
1. **Review comprehensive comparison** in `comparisons/` folder
2. **Run Experiment 2** (`02_progressive.ipynb`) to test improved training strategy
3. **Compare experiments** using `99_comparison.ipynb`

This baseline establishes the performance floor - Experiment 2 should exceed these results with progressive training!