# Full Normalization Experiment - RunningNorm + Post-activation LayerNorm

**Hypothesis:** The recommended RL normalization strategy combines:
- **RunningNorm (input)**: Normalizes using running mean/std (like BatchNorm statistics)
- **Post-activation LayerNorm**: After each hidden ReLU layer for gradient stability
- **NO LayerNorm before tanh**: Avoids forcing saturation
- **NO LayerNorm on critic output**: Preserves TD error signal

## Why RunningNorm instead of LayerNorm for input?
- LayerNorm normalizes across features within a single sample
- RunningNorm tracks population statistics across samples (more stable for RL)
- Better handles the non-stationary nature of RL data distributions

## Architecture Comparison

| Experiment | Actor Architecture | Critic Architecture |
|------------|-------------------|---------------------|
| Original | 7->64->ReLU->32->ReLU->tanh->3 | 510->512->ReLU->128->ReLU->1 |
| LayerNorm | 7->64->ReLU->32->ReLU->**LN**->tanh->3 | (unchanged) |
| **FullNorm** | **RunningNorm**->64->ReLU->**LN**->32->ReLU->**LN**->tanh->3 | **RunningNorm**->512->ReLU->**LN**->128->ReLU->**LN**->1 |

## Key Insight
In RL, **magnitude matters as much as direction**. Post-activation LN preserves relative magnitudes while stabilizing gradients. Pre-tanh LN forces ~N(0,1) which pushes many values toward saturation.

## Learning Rate Order (High to Low)
Experiments run from **HIGH to LOW** learning rates to prioritize early-stopping cases:
- Actor LRs: `[0.1, 0.01, 0.001, 0.0001]`
- Critic LRs: `[0.1, 0.01, 0.001, 0.0001]`
- Total: **16 experiments**

## Cell 1: Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted!")

## Cell 2: Upload and Extract Experiment Files

**Option A:** Upload `fullnorm_experiment.zip` when prompted

**Option B:** If you already uploaded to Drive, skip the upload prompt

In [None]:
import os
import zipfile

# Check if files already exist
if os.path.exists('/content/fullnorm_experiment/mec_env.py'):
    print("Experiment files already present!")
else:
    # Try to find zip in Drive first
    drive_zip = '/content/drive/MyDrive/fullnorm_experiment.zip'
    
    if os.path.exists(drive_zip):
        print(f"Found zip in Drive: {drive_zip}")
        zip_path = drive_zip
    else:
        # Upload zip file
        print("Upload fullnorm_experiment.zip:")
        from google.colab import files
        uploaded = files.upload()
        zip_path = list(uploaded.keys())[0]
    
    # Extract
    print(f"Extracting {zip_path}...")
    with zipfile.ZipFile(zip_path, 'r') as z:
        z.extractall('/content')
    
    print("\nExtracted files:")
    for item in os.listdir('/content'):
        if not item.startswith('.'):
            print(f"  {item}")

## Cell 3: Check GPU and Environment

In [None]:
import torch
import os

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("")
    print("WARNING: No GPU detected!")
    print("Go to: Runtime -> Change runtime type -> GPU")

# Set working directory
os.chdir('/content')
print(f"\nWorking directory: {os.getcwd()}")

## Cell 4: Verify FullNorm Architecture

In [None]:
import sys
sys.path.insert(0, '/content/fullnorm_experiment')

from Model import FullNormActorNetwork, FullNormCriticNetwork, ActorNetwork, CriticNetwork, RunningNorm
import torch

# Compare architectures
original_actor = ActorNetwork(7, 3, torch.tanh)
fullnorm_actor = FullNormActorNetwork(7, 3, torch.tanh)
original_critic = CriticNetwork(350, 150, 7, 3)
fullnorm_critic = FullNormCriticNetwork(350, 150, 7, 3)

print("=" * 70)
print("ARCHITECTURE COMPARISON")
print("=" * 70)
print(f"\nOriginal Actor: 7 -> 64 -> ReLU -> 32 -> ReLU -> tanh -> 3")
print(f"  Parameters: {sum(p.numel() for p in original_actor.parameters()):,}")
print(f"\nFullNorm Actor: RunningNorm -> 64 -> ReLU -> LN -> 32 -> ReLU -> LN -> tanh -> 3")
print(f"  Parameters: {sum(p.numel() for p in fullnorm_actor.parameters()):,}")
print(f"\nOriginal Critic: 510 -> 512 -> ReLU -> 128 -> ReLU -> 1")
print(f"  Parameters: {sum(p.numel() for p in original_critic.parameters()):,}")
print(f"\nFullNorm Critic: RunningNorm -> 512 -> ReLU -> LN -> 128 -> ReLU -> LN -> 1")
print(f"  Parameters: {sum(p.numel() for p in fullnorm_critic.parameters()):,}")

# Show the FullNorm modules
print("\n" + "=" * 70)
print("FullNorm Actor Modules:")
print("=" * 70)
for name, module in fullnorm_actor.named_modules():
    if name:
        print(f"  {name}: {module}")

print("\n" + "=" * 70)
print("FullNorm Critic Modules:")
print("=" * 70)
for name, module in fullnorm_critic.named_modules():
    if name:
        print(f"  {name}: {module}")

# Show RunningNorm details
print("\n" + "=" * 70)
print("RunningNorm Details (Actor Input):")
print("=" * 70)
print(f"  Features: {fullnorm_actor.input_norm.num_features}")
print(f"  Momentum: {fullnorm_actor.input_norm.momentum}")
print(f"  Initial running_mean: {fullnorm_actor.input_norm.running_mean}")
print(f"  Initial running_var: {fullnorm_actor.input_norm.running_var}")

## Cell 5: Run All 16 Experiments

This will run all 16 experiments with **HIGH learning rates FIRST**:
- Actor LRs: `[0.1, 0.01, 0.001, 0.0001]` (high to low)
- Critic LRs: `[0.1, 0.01, 0.001, 0.0001]` (high to low)

**Progress is auto-saved to Google Drive every 100 episodes.**

In [None]:
import sys
sys.path.insert(0, '/content')
sys.path.insert(0, '/content/fullnorm_experiment')

from run_fullnorm_experiment import run_all_experiments

# Run all experiments
run_all_experiments()

## Cell 6: Check Experiment Status

In [None]:
import json
import os

results_dir = '/content/results/fullnorm_experiment'
status_file = os.path.join(results_dir, 'experiment_status.json')

if os.path.exists(status_file):
    with open(status_file) as f:
        status = json.load(f)
    print("Experiment Status:")
    print(f"  Completed: {len(status['completed'])}/16")
    if status['in_progress']:
        print(f"  In progress: {status['in_progress']}")
    print("\nCompleted experiments:")
    for exp in sorted(status['completed']):
        print(f"  - {exp}")
else:
    print("No status file found yet.")

# Also check Drive backup
drive_dir = '/content/drive/MyDrive/fullnorm_results'
if os.path.exists(drive_dir):
    print(f"\nDrive backup exists: {drive_dir}")
    contents = os.listdir(drive_dir)
    print(f"  Contents: {contents[:5]}..." if len(contents) > 5 else f"  Contents: {contents}")

## Cell 7: View Results Summary

In [None]:
import json
import os

results_dir = '/content/results/fullnorm_experiment'

if os.path.exists(results_dir):
    print("FullNorm Experiment Results Summary:")
    print("="*70)
    print(f"{'Actor LR':<12} {'Critic LR':<12} {'Stop Ep.':<12} {'Final Reward':<15}")
    print("-"*70)
    
    results = []
    for exp_dir in sorted(os.listdir(results_dir)):
        result_file = os.path.join(results_dir, exp_dir, 'results.json')
        if os.path.exists(result_file):
            with open(result_file) as f:
                data = json.load(f)
            results.append(data)
    
    for data in sorted(results, key=lambda x: (-x['actor_lr'], -x['critic_lr'])):
        print(f"{data['actor_lr']:<12} {data['critic_lr']:<12} {data['stopping_episode']:<12} {data['final_reward']:<15.4f}")
    print("="*70)
else:
    print("No results directory found yet.")

## Cell 8: Compare All Three Experiments (Original vs LayerNorm vs FullNorm)

In [None]:
import json
import os

def load_results(results_dir):
    results = []
    if os.path.exists(results_dir):
        for exp_dir in os.listdir(results_dir):
            result_file = os.path.join(results_dir, exp_dir, 'results.json')
            if os.path.exists(result_file):
                with open(result_file) as f:
                    results.append(json.load(f))
    return results

# Load all results
fullnorm_results = load_results('/content/results/fullnorm_experiment')
layernorm_results = load_results('/content/results/layernorm_experiment')
original_results = load_results('/content/results/stopping_experiment')

# Try Drive backups
if not fullnorm_results:
    fullnorm_results = load_results('/content/drive/MyDrive/fullnorm_results')
if not layernorm_results:
    layernorm_results = load_results('/content/drive/MyDrive/layernorm_results')
if not original_results:
    original_results = load_results('/content/drive/MyDrive/gradient_asymmetry_results')

print("COMPARISON: Original vs LayerNorm vs FullNorm")
print("="*100)
print(f"{'Actor LR':<10} {'Critic LR':<10} {'Original':<12} {'LayerNorm':<12} {'FullNorm':<12} {'Best':<15}")
print("-"*100)

for actor_lr in [0.1, 0.01, 0.001, 0.0001]:
    for critic_lr in [0.1, 0.01, 0.001, 0.0001]:
        orig = next((r for r in original_results if r['actor_lr'] == actor_lr and r['critic_lr'] == critic_lr), None)
        ln = next((r for r in layernorm_results if r['actor_lr'] == actor_lr and r['critic_lr'] == critic_lr), None)
        fn = next((r for r in fullnorm_results if r['actor_lr'] == actor_lr and r['critic_lr'] == critic_lr), None)
        
        orig_ep = orig['stopping_episode'] if orig else '-'
        ln_ep = ln['stopping_episode'] if ln else '-'
        fn_ep = fn['stopping_episode'] if fn else '-'
        
        # Determine best (highest stopping episode = longest training)
        values = []
        if orig: values.append(('Original', orig['stopping_episode']))
        if ln: values.append(('LayerNorm', ln['stopping_episode']))
        if fn: values.append(('FullNorm', fn['stopping_episode']))
        
        best = max(values, key=lambda x: x[1])[0] if values else '-'
        
        print(f"{actor_lr:<10} {critic_lr:<10} {str(orig_ep):<12} {str(ln_ep):<12} {str(fn_ep):<12} {best:<15}")

print("="*100)
print("\nNote: Higher stopping episode = longer training before actor gradients vanish")

## Cell 9: Analyze Gradient Asymmetry Comparison

In [None]:
import json
import os
import numpy as np

def get_asymmetry_stats(results_dir):
    stats = {}
    if os.path.exists(results_dir):
        for exp_dir in os.listdir(results_dir):
            tracking_file = os.path.join(results_dir, exp_dir, 'tracking_data.json')
            result_file = os.path.join(results_dir, exp_dir, 'results.json')
            if os.path.exists(tracking_file) and os.path.exists(result_file):
                with open(tracking_file) as f:
                    tracking = json.load(f)
                with open(result_file) as f:
                    result = json.load(f)
                
                asym = tracking.get('asymmetry_history', [])
                if asym:
                    ratios = [a['ratio'] for a in asym if a['ratio'] != float('inf')]
                    if ratios:
                        key = (result['actor_lr'], result['critic_lr'])
                        stats[key] = {
                            'mean_ratio': np.mean(ratios),
                            'final_ratio': ratios[-1] if ratios else 0,
                            'stopping_episode': result['stopping_episode']
                        }
    return stats

# Load asymmetry stats
fullnorm_stats = get_asymmetry_stats('/content/results/fullnorm_experiment')
if not fullnorm_stats:
    fullnorm_stats = get_asymmetry_stats('/content/drive/MyDrive/fullnorm_results')

layernorm_stats = get_asymmetry_stats('/content/results/layernorm_experiment')
if not layernorm_stats:
    layernorm_stats = get_asymmetry_stats('/content/drive/MyDrive/layernorm_results')

if fullnorm_stats or layernorm_stats:
    print("Gradient Asymmetry Analysis (Actor/Critic Gradient Ratio):")
    print("="*90)
    print(f"{'Actor LR':<10} {'Critic LR':<10} {'LayerNorm Ratio':<18} {'FullNorm Ratio':<18} {'Improvement':<15}")
    print("-"*90)
    
    for actor_lr in [0.1, 0.01, 0.001, 0.0001]:
        for critic_lr in [0.1, 0.01, 0.001, 0.0001]:
            key = (actor_lr, critic_lr)
            ln_ratio = layernorm_stats.get(key, {}).get('mean_ratio', '-')
            fn_ratio = fullnorm_stats.get(key, {}).get('mean_ratio', '-')
            
            if isinstance(ln_ratio, float) and isinstance(fn_ratio, float):
                # Closer to 1.0 is better (balanced gradients)
                ln_dist = abs(1.0 - ln_ratio)
                fn_dist = abs(1.0 - fn_ratio)
                improvement = "FullNorm" if fn_dist < ln_dist else "LayerNorm"
            else:
                improvement = "-"
            
            ln_str = f"{ln_ratio:.4f}" if isinstance(ln_ratio, float) else str(ln_ratio)
            fn_str = f"{fn_ratio:.4f}" if isinstance(fn_ratio, float) else str(fn_ratio)
            
            print(f"{actor_lr:<10} {critic_lr:<10} {ln_str:<18} {fn_str:<18} {improvement:<15}")
    
    print("="*90)
    print("\nNote: Ratio closer to 1.0 = more balanced actor/critic gradients")
else:
    print("No asymmetry data found yet. Run experiments first.")

## Cell 10: Analyze Pre-activation Statistics

In [None]:
import json
import os
import numpy as np

results_dir = '/content/results/fullnorm_experiment'
if not os.path.exists(results_dir):
    results_dir = '/content/drive/MyDrive/fullnorm_results'

if os.path.exists(results_dir):
    print("Pre-activation Statistics (before tanh - should be moderate without forcing N(0,1)):")
    print("="*85)
    print(f"{'Actor LR':<10} {'Critic LR':<10} {'Min Preact':<15} {'Max Preact':<15} {'Saturation':<15}")
    print("-"*85)
    
    for exp_dir in sorted(os.listdir(results_dir)):
        tracking_file = os.path.join(results_dir, exp_dir, 'tracking_data.json')
        result_file = os.path.join(results_dir, exp_dir, 'results.json')
        if os.path.exists(tracking_file) and os.path.exists(result_file):
            with open(tracking_file) as f:
                tracking = json.load(f)
            with open(result_file) as f:
                result = json.load(f)
            
            act_hist = tracking.get('activation_history', [])
            if act_hist:
                last = act_hist[-1]
                print(f"{result['actor_lr']:<10} {result['critic_lr']:<10} {last['min_preact']:<15.2f} {last['max_preact']:<15.2f} {last['avg_actor_output_saturation']:<15.2%}")
    print("="*85)
    print("\nNote: FullNorm should keep pre-activations moderate without forcing exact N(0,1)")
    print("This allows the network to learn appropriate output scales naturally.")
else:
    print("No results found yet.")

## Cell 11: Download Results

In [None]:
import shutil
import os

results_dir = '/content/results/fullnorm_experiment'
output_zip = '/content/fullnorm_results.zip'

if os.path.exists(results_dir):
    shutil.make_archive('/content/fullnorm_results', 'zip', results_dir)
    print(f"Created: {output_zip}")
    print(f"Size: {os.path.getsize(output_zip) / 1024:.1f} KB")
    
    # Download
    from google.colab import files
    files.download(output_zip)
else:
    print("No results directory found.")