# Indic ASR Benchmarking

This notebook benchmarks the Indic Conformer ASR model on the `ai4bharat/indicvoices_r` dataset.

**NOTE**: Enable GPU in Runtime -> Change runtime type -> T4 GPU (or better).

In [None]:
# 1. Setup Environment
# OPTION A: Clone Repository (if public)
# !git clone https://github.com/YourUser/Indic_conformer_asr_testing.git
# %cd Indic_conformer_asr_testing

# OPTION B: Upload Files Directly
# 1. Zip your local 'Indic_conformer_asr_testing' folder.
# 2. Upload it to the Colab 'Files' sidebar.
# 3. Unzip it:
# !unzip Indic_conformer_asr_testing.zip
# %cd Indic_conformer_asr_testing

# Install system dependencies for audio
!apt-get update -qq && apt-get install -y libsndfile1 ffmpeg

# Install python dependencies
!pip install -q transformers torch torchaudio librosa soundfile datasets jiwer pandas python-dotenv onnx onnxruntime-gpu

In [None]:
# 2. Run Benchmark - Nepali
!python scripts/benchmark.py --language ne --subset Nepali --samples 50 --output results_ne.csv

In [None]:
# 3. Run Benchmark - Hindi
!python scripts/benchmark.py --language hi --subset Hindi --samples 50 --output results_hi.csv

In [None]:
# 4. Run Benchmark - Maithili
!python scripts/benchmark.py --language mai --subset Maithili --samples 50 --output results_mai.csv

In [None]:
# 5. View & Combine Results
import pandas as pd
import glob

csv_files = glob.glob('results_*.csv')
dfs = []

for filename in csv_files:
    df_temp = pd.read_csv(filename)
    df_temp['source_file'] = filename
    dfs.append(df_temp)

if dfs:
    final_df = pd.concat(dfs, ignore_index=True)
    final_df.to_csv('benchmark_results_combined.csv', index=False)
    print(f"Combined {len(final_df)} samples from {len(csv_files)} files.")
    display(final_df.head())
else:
    print("No result files found.")