# SOLAS Results Analysis Notebook

This notebook analyzes and visualizes evaluation results from completed SOLAS experiments.

**Prerequisites:** You must have already run experiments using the Evaluation notebook and have results saved.

**Note:** This notebook does NOT require a GPU - analysis runs entirely on CPU.

## Analyses Available

| Analysis | Description |
|----------|-------------|
| ASR Model Comparison | Compare Whisper tiny/small/large transcription quality |
| Quantization Impact | Compare 4-bit quantization vs full precision |
| Repetition Penalty Impact | Analyze degeneration prevention |
| Summary Mode Impact | Compare greedy/sampled summarization |
| Chunk Size Impact | Evaluate text chunking strategies |
| Temperature Impact | Analyze creativity settings for podcast generation |

In [None]:
# @title ### Setup
# @markdown Clone/update SOLAS repository and load evaluation results.
# @markdown **No GPU required** - this notebook only analyzes existing results.

import sys
import subprocess
from pathlib import Path

# Clone/update SOLAS repository
if Path('SOLAS').exists():
    subprocess.run(['git', 'pull'], check=True, cwd='SOLAS')
else:
    subprocess.run(['git', 'clone', 'https://github.com/andrecarini/SOLAS.git'], check=True)

sys.path.insert(0, 'SOLAS')

# Create evaluation interface (handles Google Drive mounting automatically)
from library import EvaluationNotebook
evaluation = EvaluationNotebook(
    solas_dir=None,                      # Auto-detect: /content/SOLAS (Colab) or ./SOLAS (local)
    use_gdrive=None,                     # Auto-detect: True in Colab, False otherwise
    gdrive_mount_point='/gdrive',        # Where to mount Google Drive
    gdrive_folder='SOLAS',               # Folder name in Google Drive MyDrive
    gdrive_symlink='/content/gdrive',    # Symlink path for easy access
    local_dir='./evaluation_results'     # Local directory when not using Google Drive
)
evaluation.print_setup_info()

# Load and display results summary
results = evaluation.load_results()
if not results.get('experiments'):
    print('\n\u26a0\ufe0f No evaluation results found.')
    print('Run the Evaluation notebook first to generate results.')
else:
    print(f'\n\u2705 Loaded {len(results["experiments"])} experiments')

In [None]:
# @title ### ASR Model Comparison
# @markdown Compare Whisper tiny/small/large-v3 transcription results with visual table and scrollable transcript views.
# @markdown Includes degeneration detection - models with severe repetition loops are marked as FAIL.

evaluation.asr_analysis()

In [None]:
# @title ### Quantization Impact
# @markdown Compare 4-bit quantization vs full precision across all LLMs.
# @markdown Shows text outputs, metrics tables, and comparison summary.

evaluation.quantization_analysis()

In [None]:
# @title ### Repetition Penalty Impact
# @markdown Compare repetition penalty (None vs 1.2) on Qwen2-0.5B and Mistral-7B.
# @markdown Shows text outputs, metrics tables, and overall results.

evaluation.repetition_penalty_analysis()

In [None]:
# @title ### Summary Mode Impact
# @markdown Compare greedy and sampled summary modes on Phi-3-mini (no quantization).
# @markdown Note: Summary mode only affects the summary stage.

evaluation.summary_mode_analysis()

In [None]:
# @title ### Chunk Size Impact
# @markdown Compare 2000 vs 4000 character chunks on Phi-3-mini (no quantization).
# @markdown Shows how chunk size affects translation, summary, and podcast generation.

evaluation.chunk_size_analysis()

In [None]:
# @title ### Temperature Impact
# @markdown Compare temperature 0.2/0.5 on Mistral-7B podcast generation.
# @markdown Temperature only affects the podcast stage (translation and summary use greedy decoding).

evaluation.temperature_analysis()