This notebook can be used to quantize the Indic Conformer 600M model by AI4Bharat to int8.

[![Open in Kaggle](https://img.shields.io/badge/Open%20in-Kaggle-blue?logo=kaggle)](https://www.kaggle.com/code/haposeiz/quantize-indic-conformer-600m)

Additional Links: 

[![Hugging Face](https://img.shields.io/badge/-Hugging%20Face-181717?logo=huggingface&logoColor=FFD21E)](https://huggingface.co/atharva-again/indic-conformer-600m-quantized)

[![GitHub](https://img.shields.io/badge/-GitHub-181717?logo=github&logoColor=white)](https://github.com/atharva-again/indic-asr-onnx)


# 1. Initial Download and Setup

## 1a. Downloading Libraries

#### - I am using uv here for package management, because it is faster than pip due to it being rust-based.
#### - The full quantization only needs CPU, therefore we will be installing torch and its related libraries for only CPU. 

In [1]:
# Installing uv
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Install other dependencies
!uv pip install torch torchaudio torchcodec --index-url https://download.pytorch.org/whl/cpu
!uv pip install datasets pandas huggingface_hub soundfile librosa onnxruntime

# Verify CPU environment
import onnxruntime as ort
print(f"ONNX Runtime version: {ort.__version__}")
print(f"Available providers: {ort.get_available_providers()}")

downloading uv 0.9.15 x86_64-unknown-linux-gnu
no checksums to verify
installing to /usr/local/bin
  uv
  uvx
everything's installed!
[2mUsing Python 3.11.13 environment at: /usr[0m
[2K[2mResolved [1m11 packages[0m [2min 867ms[0m[0m                                        [0m
[2K[2mPrepared [1m4 packages[0m [2min 5.47s[0m[0m                                             
[2mUninstalled [1m3 packages[0m [2min 1.61s[0m[0m
[2K[2mInstalled [1m4 packages[0m [2min 290ms[0m[0m                               [0m
 [31m-[39m [1msympy[0m[2m==1.13.1[0m
 [32m+[39m [1msympy[0m[2m==1.14.0[0m
 [31m-[39m [1mtorch[0m[2m==2.6.0+cu124 (from https://download.pytorch.org/whl/cu124/torch-2.6.0%2Bcu124-cp311-cp311-linux_x86_64.whl)[0m
 [32m+[39m [1mtorch[0m[2m==2.9.1+cpu[0m
 [31m-[39m [1mtorchaudio[0m[2m==2.6.0+cu124 (from https://download.pytorch.org/whl/cu124/torchaudio-2.6.0%2Bcu124-cp311-cp311-linux_x86_64.whl)[0m
 [32m+[39m [1mtorchaudio[0m[2m

## 1b. Setting Up The Environment

In [2]:
import os
import gc
import shutil
import numpy as np
import pandas as pd
import torch
import torchaudio
import onnx
import onnxruntime as ort
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType, QuantFormat, CalibrationMethod
from datasets import load_dataset
from huggingface_hub import snapshot_download

# Configuration
MODEL_REPO = "ai4bharat/indic-conformer-600m-multilingual"
LOCAL_MODEL_DIR = "indic-conformer-600m-onnx"
CALIBRATION_FILE = "/kaggle/input/indicvoices-calibration-1408/indicvoices_calibration_1408.parquet"
QUANTIZED_MODEL_DIR = "indic-conformer-600m-quantized-int8"

os.makedirs(LOCAL_MODEL_DIR, exist_ok=True)
os.makedirs(QUANTIZED_MODEL_DIR, exist_ok=True)

print("Done.")

Done.


## 1c. Downloading The Model From HuggingFace

#### Since the model is gated, access is needed. Make sure to get access to the model and then add your HF read-only access token to Kaggle Secrets for model downloading.

In [3]:
# Downloading the complete model

from huggingface_hub import login

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

# change HF_TOKEN to your token variable name
hf_token = user_secrets.get_secret("HF_TOKEN") 

login(token=hf_token)
# 1. Download the FULL model (including all external weight files with no extension)
print(f"Downloading FULL model from {MODEL_REPO}...")

# The external weight files have NO extension (e.g., "Constant_1970_attr__value", "layers.0.conv.pointwise_conv1.weight")
# We need to download EVERYTHING to get them
snapshot_download(
    repo_id=MODEL_REPO, 
    local_dir=LOCAL_MODEL_DIR,
    # No allow_patterns = download everything
    ignore_patterns=[".git*", "*.md", "*.txt"]  # Ignore only non-essential files
)
print("Download complete.")

Downloading FULL model from ai4bharat/indic-conformer-600m-multilingual...


Fetching 402 files:   0%|          | 0/402 [00:00<?, ?it/s]

assets/joint_post_net_as.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/Constant_1970_attr__value:   0%|          | 0.00/41.0M [00:00<?, ?B/s]

assets/joint_post_net_doi.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_bn.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_brx.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/ctc_decoder.onnx:   0%|          | 0.00/23.1M [00:00<?, ?B/s]

assets/joint_enc.onnx:   0%|          | 0.00/2.63M [00:00<?, ?B/s]

assets/encoder.onnx:   0%|          | 0.00/2.98M [00:00<?, ?B/s]

assets/joint_post_net_hi.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_kok.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_gu.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_mai.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_kn.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_ks.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_ml.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_mni.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_mr.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_or.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_pa.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_ne.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_sat.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_sa.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_sd.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_ta.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_te.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_post_net_ur.onnx:   0%|          | 0.00/663k [00:00<?, ?B/s]

assets/joint_pre_net.onnx:   0%|          | 0.00/2.77k [00:00<?, ?B/s]

language_masks.json:   0%|          | 0.00/862k [00:00<?, ?B/s]

assets/joint_pred.onnx:   0%|          | 0.00/1.65M [00:00<?, ?B/s]

assets/layers.0.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.0.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.0.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.0.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.1.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.0.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.1.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.1.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.1.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.10.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.10.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.10.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.1.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.10.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.11.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.10.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.11.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.11.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.11.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.11.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.12.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.12.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.12.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.12.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.12.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.13.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.13.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.13.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.13.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.13.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.14.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.14.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.14.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.14.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.14.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.15.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.15.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.15.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.15.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.16.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.16.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.16.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.15.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.16.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.17.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.17.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.17.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.16.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.17.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.17.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.18.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.18.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.18.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.18.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.18.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.19.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.19.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.19.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.19.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.19.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.2.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.2.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.2.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.2.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.2.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.20.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.20.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.20.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.20.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.20.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.21.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.21.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.21.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.21.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.21.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.22.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.22.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.22.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.22.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.22.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.23.conv.pointwise_conv1.we(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.23.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.23.conv.pointwise_conv2.we(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.23.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.23.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.3.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.3.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.3.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.3.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.3.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.4.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.4.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.4.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.4.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.4.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.5.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.5.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.5.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.5.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.5.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.6.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.6.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.6.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.6.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.6.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.7.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.7.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.7.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.7.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.8.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

layers.7.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

assets/layers.8.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

assets/layers.8.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

layers.8.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.8.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.9.conv.pointwise_conv1.bias:   0%|          | 0.00/8.19k [00:00<?, ?B/s]

assets/layers.9.conv.pointwise_conv2.wei(…):   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/layers.9.conv.pointwise_conv1.wei(…):   0%|          | 0.00/8.39M [00:00<?, ?B/s]

layers.9.feed_forward1.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

layers.9.feed_forward2.linear1.bias:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

onnx__Conv_7995:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_7998:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8004:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8013:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8010:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8007:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8001:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8019:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8022:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8016:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8031:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8025:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8037:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8034:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8028:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8040:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8043:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8046:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8049:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8052:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8055:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8064:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

onnx__Conv_8061:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

assets/onnx__MatMul_8067:   0%|          | 0.00/10.5M [00:00<?, ?B/s]

assets/onnx__MatMul_8083:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

onnx__Conv_8058:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

assets/onnx__MatMul_8084:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8085:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8095:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8096:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8185:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8097:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8191:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8194:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8192:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8193:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8207:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8205:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8195:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8206:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8223:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8232:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8231:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8230:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8229:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8243:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8244:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8233:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8245:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8261:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8267:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8268:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8269:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8270:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8271:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8281:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8282:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8283:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8299:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8305:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8307:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8306:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8308:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8309:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8319:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8320:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8321:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8337:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8343:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8345:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8344:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8357:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8347:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8358:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8346:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8359:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8375:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8383:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8382:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8381:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8384:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8385:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8395:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8396:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8419:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8413:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8397:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8422:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8420:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8433:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8421:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8423:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8435:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8434:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8451:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8458:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8459:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8461:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8460:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8457:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8471:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8472:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8473:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8489:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8495:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8497:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8496:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8498:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8509:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8499:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8510:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8511:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8527:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8533:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8534:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8535:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8536:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8537:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8547:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8548:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8549:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8565:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8571:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8574:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8572:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8573:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8575:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8585:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8586:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8587:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8603:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8609:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8610:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8611:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8612:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8613:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8624:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8623:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8625:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8641:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8647:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8648:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8650:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8651:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8649:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8661:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8662:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8663:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8685:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8679:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8686:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8699:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8689:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8688:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8687:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8700:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8717:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8701:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8723:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8724:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8725:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8726:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8727:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8737:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8738:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8739:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8761:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8762:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8755:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8763:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8764:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8765:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8775:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8776:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8777:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8793:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8802:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8801:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8803:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8800:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8799:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8813:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8814:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8815:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8831:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8837:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8838:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8839:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8841:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8840:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8851:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8852:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8853:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8869:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8875:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8877:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8876:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8878:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8879:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8890:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8889:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8891:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8907:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8913:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8915:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8916:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8927:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8914:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8928:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8917:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8929:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8945:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8953:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8952:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8951:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8954:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8955:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8966:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8967:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8965:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8983:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8993:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_8992:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8990:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8991:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_8989:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_9003:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9004:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9005:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9027:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_9021:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9028:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_9029:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_9031:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9041:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9042:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9043:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9030:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/onnx__MatMul_9059:   0%|          | 0.00/4.19M [00:00<?, ?B/s]

assets/onnx__MatMul_9065:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

pre_encode.conv.0.weight:   0%|          | 0.00/9.22k [00:00<?, ?B/s]

pre_encode.conv.2.weight:   0%|          | 0.00/9.22k [00:00<?, ?B/s]

assets/pre_encode.conv.3.weight:   0%|          | 0.00/262k [00:00<?, ?B/s]

assets/onnx__MatMul_9066:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

assets/pre_encode.conv.6.weight:   0%|          | 0.00/262k [00:00<?, ?B/s]

pre_encode.conv.5.weight:   0%|          | 0.00/9.22k [00:00<?, ?B/s]

assets/rnnt_decoder.onnx:   0%|          | 0.00/40.7M [00:00<?, ?B/s]

preprocessor.ts:   0%|          | 0.00/91.7k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/96.7k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/241 [00:00<?, ?B/s]

model_onnx_1b_batched_rnnt.py:   0%|          | 0.00/15.3k [00:00<?, ?B/s]

model_onnx.py:   0%|          | 0.00/9.64k [00:00<?, ?B/s]

model_ts.py:   0%|          | 0.00/6.17k [00:00<?, ?B/s]

Download complete.


In [None]:
# Check directory structure and verify external weight files

print("Directory structure:")
for root, dirs, files in os.walk(LOCAL_MODEL_DIR):
    level = root.replace(LOCAL_MODEL_DIR, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f'{indent}{os.path.basename(root)}/')
    subindent = ' ' * 2 * (level + 1)
    # Show only ONNX files and important configs
    onnx_files = [f for f in files if f.endswith('.onnx') or f.endswith('.json')]
    for file in onnx_files[:15]:
        print(f'{subindent}{file}')
    if len(onnx_files) > 15:
        print(f'{subindent}... and {len(onnx_files) - 15} more ONNX/JSON files')

# List all ONNX files specifically
print("\n" + "="*50)
print("All ONNX files found:")
assets_dir = os.path.join(LOCAL_MODEL_DIR, "assets")
if os.path.exists(assets_dir):
    onnx_files = [f for f in os.listdir(assets_dir) if f.endswith('.onnx')]
    for f in sorted(onnx_files):
        size = os.path.getsize(os.path.join(assets_dir, f))
        print(f"  {f}: {size / 1024 / 1024:.2f} MB")
else:
    print("  assets/ directory not found!")

# Check for external weight files (no extension, large sizes)
print("\n" + "="*50)
print("External weight files (no extension):")
if os.path.exists(assets_dir):
    all_files = os.listdir(assets_dir)
    weight_files = [f for f in all_files if '.' not in f or f.startswith('onnx__')]
    print(f"  Found {len(weight_files)} external weight files")
    if weight_files:
        # Show first few
        for f in sorted(weight_files)[:10]:
            size = os.path.getsize(os.path.join(assets_dir, f))
            print(f"    {f}: {size / 1024 / 1024:.2f} MB")
        if len(weight_files) > 10:
            print(f"    ... and {len(weight_files) - 10} more weight files")
        # Total size
        total_size = sum(os.path.getsize(os.path.join(assets_dir, f)) for f in weight_files)
        print(f"  Total external weights size: {total_size / 1024 / 1024 / 1024:.2f} GB")
    else:
        print("  WARNING: No external weight files found! Download may have failed.")
else:
    print("  assets/ directory not found!")

# 2. Helper Functions for Consolidation and Quantization

#### - The original model is divided into more than 400 files and the weights are scattered across them. In order to quantize it, we first need to consolidate all external data into their respective models.

#### - We also define a simple `quantize_model_helper` function that wraps pre-processing and static and dynamic quantization methods.

In [7]:
from onnx.external_data_helper import load_external_data_for_model
from onnxruntime.quantization import quant_pre_process, quantize_static, quantize_dynamic, QuantType, QuantFormat, CalibrationMethod
import onnxruntime.quantization.calibrate as calibrate_module
import onnx
import os
import gc
import functools

def consolidate_model(input_filename, output_path, save_external_data=False):
    """
    Consolidates external weights into the model file.
    Args:
        input_filename: Name of the file in LOCAL_MODEL_DIR or assets/
        output_path: Full path for the output file
        save_external_data: If True, saves weights to a separate .data file (for >2GB models)
    """
    # Locate input file
    input_path = os.path.join(LOCAL_MODEL_DIR, input_filename)
    if not os.path.exists(input_path):
        input_path = os.path.join(LOCAL_MODEL_DIR, "assets", input_filename)
    
    if not os.path.exists(input_path):
        print(f"Skipping {input_filename} (not found)")
        return None

    print(f"Consolidating {input_filename}...")
    model = onnx.load(input_path, load_external_data=False)
    base_dir = os.path.dirname(input_path)
    load_external_data_for_model(model, base_dir)
    
    if os.path.exists(output_path): os.remove(output_path)
    
    if save_external_data:
        external_data_file = os.path.basename(output_path) + ".data"
        ext_data_path = os.path.join(os.path.dirname(output_path), external_data_file)
        if os.path.exists(ext_data_path): os.remove(ext_data_path)
        
        onnx.save(model, output_path, 
                  save_as_external_data=True, 
                  all_tensors_to_one_file=True, 
                  location=external_data_file, 
                  size_threshold=1024, 
                  convert_attribute=False)
    else:
        onnx.save(model, output_path)
        
    print(f"Saved consolidated model to {output_path}")
    del model
    gc.collect()
    return output_path

def quantize_model_helper(input_path, output_path, data_reader=None, quant_type='static'):
    """
    Handles preprocessing and quantization.
    Args:
        quant_type: 'static' or 'dynamic'
    """
    print(f"Quantizing {os.path.basename(input_path)} -> {os.path.basename(output_path)} ({quant_type})...")
    
    # 1. Preprocess
    # We use a temporary path for the preprocessed model
    pre_path = input_path + ".pre.onnx"
    
    try:
        # Always use external data for intermediate preprocessed file to be safe with large models
        quant_pre_process(
            input_path,
            pre_path,
            skip_symbolic_shape=True,
            save_as_external_data=True,
            all_tensors_to_one_file=True,
            external_data_location=os.path.basename(pre_path) + ".data"
        )
    except Exception as e:
        print(f"Preprocessing failed: {e}. Proceeding without preprocessing...")
        pre_path = input_path

    # 2. Quantize
    if quant_type == 'static':
        if data_reader is None:
            raise ValueError("Data reader required for static quantization")
        
        # Monkey-patch onnx.save_model to force external data for the intermediate augmented model
        # This fixes InvalidProtobuf error when the augmented model > 2GB
        original_save_model = onnx.save_model
        original_save = onnx.save
        
        def patched_save_model(*args, **kwargs):
            # Check if this is likely the augmented model (usually in a temp dir or named 'augmented_model.onnx')
            # We force external data for ALL saves during this block to be safe
            print(f"DEBUG: Patched save_model called for {args[1] if len(args)>1 else kwargs.get('f')}")
            kwargs['save_as_external_data'] = True
            kwargs['all_tensors_to_one_file'] = True
            kwargs['size_threshold'] = 1024
            # If location isn't provided, generate one based on the filename
            if 'location' not in kwargs:
                model_path = args[1] if len(args) > 1 else kwargs.get('f')
                if isinstance(model_path, str):
                    kwargs['location'] = os.path.basename(model_path) + ".data"
            return original_save_model(*args, **kwargs)

        # Apply patch to onnx module
        onnx.save_model = patched_save_model
        onnx.save = patched_save_model
        
        # Apply patch to calibrate module if it imported save_model directly
        original_calibrate_save_model = None
        if hasattr(calibrate_module, 'save_model'):
            print("DEBUG: Patching onnxruntime.quantization.calibrate.save_model")
            original_calibrate_save_model = calibrate_module.save_model
            calibrate_module.save_model = patched_save_model
        
        try:
            quantize_static(
                model_input=pre_path,
                model_output=output_path,
                calibration_data_reader=data_reader,
                quant_format=QuantFormat.QDQ,
                per_channel=True,
                reduce_range=False,
                activation_type=QuantType.QInt8,
                weight_type=QuantType.QInt8,
                calibrate_method=CalibrationMethod.MinMax,
                use_external_data_format=False, # Output is int8 < 2GB, so single file is fine
                extra_options={'ActivationSymmetric': True, 'WeightSymmetric': True}
            )
        finally:
            # Restore original function
            onnx.save_model = original_save_model
            onnx.save = original_save
            if original_calibrate_save_model:
                calibrate_module.save_model = original_calibrate_save_model
        
        # Post-processing: The monkey patch forced the output to be split. 
        # We now load it and save it again as a single file.
        if os.path.exists(output_path):
            print("DEBUG: Consolidating quantized model into a single file...")
            try:
                # Load the model (this loads the external data automatically)
                q_model = onnx.load(output_path)
                
                # Save to a temporary file without external data
                temp_output = output_path + ".temp"
                onnx.save_model(q_model, temp_output, save_as_external_data=False)
                
                # Replace the original file
                os.replace(temp_output, output_path)
                
                # Try to find and remove the .data file created by the patch
                # The patch named it os.path.basename(output_path) + ".data"
                data_filename = os.path.basename(output_path) + ".data"
                data_path = os.path.join(os.path.dirname(output_path), data_filename)
                if os.path.exists(data_path):
                    os.remove(data_path)
                    print(f"DEBUG: Removed external data file: {data_filename}")
                    
            except Exception as e:
                print(f"WARNING: Could not consolidate model: {e}")
            
    else:
        quantize_dynamic(
            model_input=pre_path,
            model_output=output_path,
            per_channel=True,
            weight_type=QuantType.QInt8,
            use_external_data_format=False # Output is int8 < 2GB
        )
        
    # Cleanup intermediate files
    if pre_path != input_path and os.path.exists(pre_path):
        os.remove(pre_path)
        if os.path.exists(pre_path + ".data"):
            os.remove(pre_path + ".data")
            
    print(f"Quantization complete: {output_path}")

# **3. Quantizing**

We use static PTQ for encoder and CTC decoder anda dynamic PTQ for RNNT decoder and its related layers / components.

## *3a. Defining The Calibration Data Reader*

We define the class that will work with the Calibration Data.

In [5]:
# Set up execution providers
print(f"ONNX Runtime available providers: {ort.get_available_providers()}")
EXECUTION_PROVIDERS = ['CPUExecutionProvider']
print("Using CPU for ONNX Runtime")

# Mapping from ISO language codes to IndicVoices config names
LANG_CODE_MAP = {
    'as': 'assamese',
    'bn': 'bengali',
    'bodo': 'bodo',
    'doi': 'dogri',
    'gu': 'gujarati',
    'hi': 'hindi',
    'kn': 'kannada',
    'ks': 'kashmiri',
    'kok': 'konkani',
    'mai': 'maithili',
    'ml': 'malayalam',
    'mni': 'manipuri',
    'mr': 'marathi',
    'ne': 'nepali',
    'or': 'odia',
    'pa': 'punjabi',
    'sa': 'sanskrit',
    'sat': 'santali',
    'sd': 'sindhi',
    'ta': 'tamil',
    'te': 'telugu',
    'ur': 'urdu'
}

class CalibrationDataReader(CalibrationDataReader):
    def __init__(self, parquet_file, model_path, encoder_path=None, batch_size=1, limit=None):
        self.df = pd.read_parquet(parquet_file)
        if limit:
            self.df = self.df.head(limit)
        self.data = self.df.to_dict('records')
        self.batch_size = batch_size
        self.index = 0
        
        # Preprocessor setup
        self.device = 'cpu'
        self.mel_transform = torchaudio.transforms.MelSpectrogram(
            sample_rate=16000, n_fft=512, win_length=400, hop_length=160, 
            f_min=0.0, f_max=8000.0, n_mels=80, window_fn=torch.hann_window, power=2.0
        )
        
        # Encoder setup (if provided, we run inference on it)
        self.encoder_sess = None
        if encoder_path:
            print(f"Loading encoder for calibration inference: {encoder_path}")
            self.encoder_sess = ort.InferenceSession(encoder_path, providers=EXECUTION_PROVIDERS)
            self.enc_input_name = self.encoder_sess.get_inputs()[0].name
            self.enc_len_name = self.encoder_sess.get_inputs()[1].name if len(self.encoder_sess.get_inputs()) > 1 else None

        # Session for the model being quantized (to get input names)
        session = ort.InferenceSession(model_path, providers=EXECUTION_PROVIDERS)
        self.input_name = session.get_inputs()[0].name
        self.len_input_name = session.get_inputs()[1].name if len(session.get_inputs()) > 1 else None
        print(f"Model inputs: {self.input_name}, {self.len_input_name}")

    def preprocess_audio(self, calibration_row):
        # Use speaker_id, duration, lang for robust matching
        speaker_id = calibration_row.get('speaker_id')
        duration = calibration_row.get('duration')
        lang_code = calibration_row.get('lang')
        config_name = LANG_CODE_MAP.get(lang_code, lang_code)
        
        # Note: In a real scenario, loading the dataset streaming for every row is inefficient.
        # But keeping logic similar to original for correctness.
        ds = load_dataset("ai4bharat/IndicVoices", config_name, split="valid", streaming=True)
        
        audio_array = None
        sample_rate = 16000
        
        for example in ds:
            if (example.get('speaker_id') == speaker_id and
                abs(float(example.get('duration', 0)) - float(duration)) < 0.01 and
                example.get('lang') == lang_code):
                
                audio_feat = example["audio_filepath"]
                if hasattr(audio_feat, "get_all_samples"):
                    decoded = audio_feat.get_all_samples()
                    audio_array = decoded.data
                    sample_rate = decoded.sample_rate
                elif hasattr(audio_feat, "decode"):
                    audio_array, sample_rate = audio_feat.decode()
                elif isinstance(audio_feat, dict) and "array" in audio_feat:
                    audio_array = audio_feat["array"]
                    sample_rate = audio_feat["sampling_rate"]
                break
        
        if audio_array is None:
            raise FileNotFoundError(f"Audio not found for {speaker_id}")

        # Convert to torch tensor
        if isinstance(audio_array, torch.Tensor):
            waveform = audio_array.float()
        else:
            waveform = torch.from_numpy(audio_array).float()
            
        if waveform.ndim > 1: waveform = waveform.mean(dim=0)
        if sample_rate != 16000:
            resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
            waveform = resampler(waveform)
            
        features = self.mel_transform(waveform.unsqueeze(0))
        features = torch.log(features + 1e-9)
        mean = features.mean(dim=2, keepdims=True)
        stddev = features.std(dim=2, keepdims=True) + 1e-5
        features = (features - mean) / stddev
        features = features.squeeze(0)
        return features.numpy(), features.shape[1]

    def get_next(self):
        if self.index >= len(self.data):
            return None
        batch_rows = self.data[self.index:self.index+self.batch_size]
        self.index += self.batch_size
        row = batch_rows[0]
        try:
            features, length = self.preprocess_audio(row)
            features = np.expand_dims(features, axis=0) # (1, 80, T)
            length_arr = np.array([length], dtype=np.int64)
            
            # If we have an encoder session, run it first (for CTC/Decoder calibration)
            if self.encoder_sess:
                enc_inputs = {self.enc_input_name: features}
                if self.enc_len_name:
                    enc_inputs[self.enc_len_name] = length_arr
                encoder_outputs = self.encoder_sess.run(None, enc_inputs)
                # Return encoder output
                return {self.input_name: encoder_outputs[0]}
            else:
                # Return raw audio features (for Encoder calibration)
                inputs = {self.input_name: features}
                if self.len_input_name:
                    inputs[self.len_input_name] = length_arr
                return inputs
                
        except Exception as e:
            print(f"Error processing calibration row: {e}")
            return self.get_next()

ONNX Runtime available providers: ['AzureExecutionProvider', 'CPUExecutionProvider']
Using CPU for ONNX Runtime


## *3b. Quantizing the Encoder.*

In [8]:
print("Processing Encoder...")

# 1. Consolidate (Large model -> keep external data)
# encoder_fp32 is the consolidated model path
encoder_fp32 = os.path.join(LOCAL_MODEL_DIR, "encoder_consolidated.onnx")
consolidate_model("encoder.onnx", encoder_fp32, save_external_data=True)

# 2. Setup Reader
# We pass the FP32 model path to reader so it knows input names
data_reader = CalibrationDataReader(CALIBRATION_FILE, encoder_fp32, limit=100)

# 3. Quantize (Static PTQ)
encoder_int8 = os.path.join(QUANTIZED_MODEL_DIR, "encoder_quantized_int8.onnx")
quantize_model_helper(encoder_fp32, encoder_int8, data_reader, quant_type='static')

Processing Encoder...
Consolidating encoder.onnx...
Saved consolidated model to indic-conformer-600m-onnx/encoder_consolidated.onnx
Model inputs: audio_signal, length
Quantizing encoder_consolidated.onnx -> encoder_quantized_int8.onnx (static)...
DEBUG: Patched save_model called for /tmp/ort.quant.bg3w6rl1/augmented_model.onnx


Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

DEBUG: Patched save_model called for indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx
DEBUG: Consolidating quantized model into a single file...
DEBUG: Removed external data file: encoder_quantized_int8.onnx.data
Quantization complete: indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx


## *3c. Quantizing the CTC Decoder.*

In [9]:
print("Processing CTC Decoder...")

# 1. Consolidate (Small model -> single file)
# ctc_fp32 is the consolidated model path
ctc_fp32 = os.path.join(LOCAL_MODEL_DIR, "ctc_decoder_consolidated.onnx")
consolidate_model("ctc_decoder.onnx", ctc_fp32, save_external_data=False)

# 2. Setup Reader (Needs encoder output)
# We reuse the encoder_fp32 from previous step for inference to generate inputs for CTC
# Note: We use the FP32 encoder for calibration accuracy
encoder_fp32 = os.path.join(LOCAL_MODEL_DIR, "encoder_consolidated.onnx")
ctc_reader = CalibrationDataReader(CALIBRATION_FILE, ctc_fp32, encoder_path=encoder_fp32, limit=100)

# 3. Quantize (Static PTQ)
ctc_int8 = os.path.join(QUANTIZED_MODEL_DIR, "ctc_decoder_quantized_int8.onnx")
quantize_model_helper(ctc_fp32, ctc_int8, ctc_reader, quant_type='static')

Processing CTC Decoder...
Consolidating ctc_decoder.onnx...
Saved consolidated model to indic-conformer-600m-onnx/ctc_decoder_consolidated.onnx
Loading encoder for calibration inference: indic-conformer-600m-onnx/encoder_consolidated.onnx
Model inputs: encoder_output, None
Quantizing ctc_decoder_consolidated.onnx -> ctc_decoder_quantized_int8.onnx (static)...
DEBUG: Patched save_model called for /tmp/ort.quant.mltlrd37/augmented_model.onnx


Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/91 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/80 [00:00<?, ?it/s]

DEBUG: Patched save_model called for indic-conformer-600m-quantized-int8/ctc_decoder_quantized_int8.onnx
DEBUG: Consolidating quantized model into a single file...
DEBUG: Removed external data file: ctc_decoder_quantized_int8.onnx.data
Quantization complete: indic-conformer-600m-quantized-int8/ctc_decoder_quantized_int8.onnx


## *3d. Quantize RNNT & Joint Modules.*

In [10]:
print("Processing RNNT & Joint Modules...")

# List of models to process
# (Input Filename, Output Filename)
models_to_quantize = [
    ("rnnt_decoder.onnx", "rnnt_decoder_quantized_int8.onnx"),
    ("joint_enc.onnx", "joint_enc_quantized_int8.onnx"),
    ("joint_pred.onnx", "joint_pred_quantized_int8.onnx"),
    ("joint_pre_net.onnx", "joint_pre_net_quantized_int8.onnx")
]

# Add Language Adapters
assets_dir = os.path.join(LOCAL_MODEL_DIR, "assets")
if os.path.exists(assets_dir):
    for f in os.listdir(assets_dir):
        if f.startswith("joint_post_net_") and f.endswith(".onnx"):
            lang = f.replace("joint_post_net_", "").replace(".onnx", "")
            models_to_quantize.append((f, f"joint_post_net_{lang}_quantized_int8.onnx"))

for input_name, output_name in models_to_quantize:
    # 1. Consolidate (Small -> single file)
    # We use a temp path for the consolidated FP32 model
    fp32_path = os.path.join(LOCAL_MODEL_DIR, f"temp_{input_name}")
    
    # Consolidate directly to single file (save_external_data=False)
    if consolidate_model(input_name, fp32_path, save_external_data=False):
        # 2. Quantize (Dynamic PTQ)
        int8_path = os.path.join(QUANTIZED_MODEL_DIR, output_name)
        quantize_model_helper(fp32_path, int8_path, quant_type='dynamic')
        
        # Cleanup temp FP32
        if os.path.exists(fp32_path): os.remove(fp32_path)

Processing RNNT & Joint Modules...
Consolidating rnnt_decoder.onnx...
Saved consolidated model to indic-conformer-600m-onnx/temp_rnnt_decoder.onnx
Quantizing temp_rnnt_decoder.onnx -> rnnt_decoder_quantized_int8.onnx (dynamic)...
Quantization complete: indic-conformer-600m-quantized-int8/rnnt_decoder_quantized_int8.onnx
Consolidating joint_enc.onnx...
Saved consolidated model to indic-conformer-600m-onnx/temp_joint_enc.onnx
Quantizing temp_joint_enc.onnx -> joint_enc_quantized_int8.onnx (dynamic)...
Quantization complete: indic-conformer-600m-quantized-int8/joint_enc_quantized_int8.onnx
Consolidating joint_pred.onnx...
Saved consolidated model to indic-conformer-600m-onnx/temp_joint_pred.onnx
Quantizing temp_joint_pred.onnx -> joint_pred_quantized_int8.onnx (dynamic)...
Quantization complete: indic-conformer-600m-quantized-int8/joint_pred_quantized_int8.onnx
Consolidating joint_pre_net.onnx...
Saved consolidated model to indic-conformer-600m-onnx/temp_joint_pre_net.onnx
Quantizing temp

# **4. Inference**

## *4a. Inference Using CTC Head*

In [11]:
# for ctc head

import os
import numpy as np
import torch
import torchaudio
import onnxruntime as ort
import json
import glob

# ---- CONFIG ----
AUDIO_PATH = "/kaggle/input/sample-voices/mooli.wav"  # Change to your test file
# Update paths to match your local environment if needed
ENCODER_ONNX = "/kaggle/working/indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx"
CTC_ONNX = "/kaggle/working/indic-conformer-600m-quantized-int8/ctc_decoder_quantized_int8.onnx"
# Assuming assets are in the local model dir we created earlier
ASSETS_DIR = "indic-conformer-600m-onnx/assets" 
VOCAB_JSON = os.path.join(ASSETS_DIR, "vocab.json")
MASKS_JSON = os.path.join(ASSETS_DIR, "language_masks.json")
LANGUAGE_CODE = "hi" # 'hi' for Hindi

# Helper to find files if paths are wrong (e.g. in Kaggle working dir)
def find_file(filename, search_path="."):
    if os.path.exists(filename):
        return filename
    candidates = glob.glob(os.path.join(search_path, "**", os.path.basename(filename)), recursive=True)
    if candidates:
        print(f"Found {os.path.basename(filename)} at {candidates[0]}")
        return candidates[0]
    return None

VOCAB_JSON = find_file(VOCAB_JSON) or VOCAB_JSON
MASKS_JSON = find_file(MASKS_JSON) or MASKS_JSON

# ---- LOAD RESOURCES ----
if not os.path.exists(VOCAB_JSON) or not os.path.exists(MASKS_JSON):
    raise FileNotFoundError(f"Missing vocab or masks file. Vocab: {VOCAB_JSON}, Masks: {MASKS_JSON}")

with open(VOCAB_JSON, "r", encoding="utf-8") as f:
    vocab_raw = json.load(f)

with open(MASKS_JSON, "r", encoding="utf-8") as f:
    language_masks = json.load(f)

print(f"Loaded vocab for {len(vocab_raw)} languages.")
print(f"Loaded masks for {len(language_masks)} languages.")

# ---- AUDIO PREPROCESS ----
# Ensure audio path exists
if not os.path.exists(AUDIO_PATH):
    print(f"Audio file not found at {AUDIO_PATH}. Creating a dummy sine wave for testing...")
    sample_rate = 16000
    waveform = torch.sin(torch.linspace(0, 100, 16000*3)).unsqueeze(0) # 3 seconds
else:
    waveform, sample_rate = torchaudio.load(AUDIO_PATH)

if waveform.ndim > 1:
    waveform = waveform.mean(dim=0, keepdim=True)  # Convert to mono (1, T)
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
    waveform = resampler(waveform)

# Mel Spectrogram (match calibration: n_mels=80, n_fft=512, win_length=400, hop_length=160)
mel_transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000, n_fft=512, win_length=400, hop_length=160,
    f_min=0.0, f_max=8000.0, n_mels=80, window_fn=torch.hann_window, power=2.0
)
features = mel_transform(waveform)  # (1, 80, T)
features = torch.log(features + 1e-9)
# Normalize (per feature)
mean = features.mean(dim=2, keepdim=True)
stddev = features.std(dim=2, keepdim=True) + 1e-5
features = (features - mean) / stddev
features = features.squeeze(0)  # (80, T)

# ---- ONNX INFERENCE ----
features_np = features.numpy().astype(np.float32)
length_np = np.array([features_np.shape[1]], dtype=np.int64)
encoder_input = {
    "input": np.expand_dims(features_np, axis=0),  # (1, 80, T)
    "length": length_np
}

print(f"Running inference on {ENCODER_ONNX}...")
encoder_sess = ort.InferenceSession(ENCODER_ONNX, providers=["CPUExecutionProvider"])
encoder_inputs = encoder_sess.get_inputs()
encoder_input_dict = {encoder_inputs[0].name: encoder_input["input"],}
if len(encoder_inputs) > 1:
    encoder_input_dict[encoder_inputs[1].name] = encoder_input["length"]
encoder_outputs = encoder_sess.run(None, encoder_input_dict)

print(f"Running inference on {CTC_ONNX}...")
ctc_sess = ort.InferenceSession(CTC_ONNX, providers=["CPUExecutionProvider"])
ctc_inputs = ctc_sess.get_inputs()
ctc_input_dict = {ctc_inputs[0].name: encoder_outputs[0],}
if len(ctc_inputs) > 1:
    ctc_input_dict[ctc_inputs[1].name] = encoder_input["length"]
ctc_outputs = ctc_sess.run(None, ctc_input_dict)

# ---- DECODE WITH MASKS ----
logits = ctc_outputs[0]  # (batch, T, 5633)
print(f"Global Logits shape: {logits.shape}")

if LANGUAGE_CODE not in language_masks:
    raise ValueError(f"Language {LANGUAGE_CODE} not found in masks.")

# 1. Get boolean mask
mask = np.array(language_masks[LANGUAGE_CODE], dtype=bool)
print(f"Mask length: {len(mask)}, True count: {mask.sum()}")

# 2. Slice logits to get language-specific logits
# logits is (1, T, 5633), mask is (5633,)
# We want to select columns where mask is True
logits_sliced = logits[:, :, mask] # (1, T, 257)
print(f"Sliced Logits shape: {logits_sliced.shape}")

# 3. Argmax to get local indices
pred_ids = np.argmax(logits_sliced, axis=-1)[0] # (T,)

# 4. Decode using local vocab
vocab_list = vocab_raw[LANGUAGE_CODE]
BLANK_ID = 256 # Local blank index for Indic Conformer

tokens = []
prev = None
for idx in pred_ids:
    if idx == prev:
        prev = idx
        continue
    if idx == BLANK_ID:
        prev = idx
        continue
    
    if idx < len(vocab_list):
        tokens.append(vocab_list[idx])
    prev = idx

transcription = "".join(tokens).replace("▁", " ").strip()
print("-" * 30)
print("Transcription:", transcription)
print("-" * 30)

Loaded vocab for 22 languages.
Loaded masks for 22 languages.
Running inference on /kaggle/working/indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx...
Running inference on /kaggle/working/indic-conformer-600m-quantized-int8/ctc_decoder_quantized_int8.onnx...
Global Logits shape: (1, 67, 5633)
Mask length: 5633, True count: 257
Sliced Logits shape: (1, 67, 257)
------------------------------
Transcription: हाँ हाँोली भीू बी रुपय है हाँ ले लीजिएगा तो पंद्रह पड़ेगा
------------------------------


## *4b. Inference Using RNNT Head*

In [13]:
# for rnnt head decoding

import os
import numpy as np
import torch
import torchaudio
import onnxruntime as ort
import json
import glob

# ---- CONFIG ----
AUDIO_PATH = "/kaggle/input/sample-voices/mooli.wav"  # Change to your test file
# Update paths to match your local environment if needed
ENCODER_ONNX = "/kaggle/working/indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx"
RNNT_DECODER_ONNX = os.path.join(QUANTIZED_MODEL_DIR, "rnnt_decoder_quantized_int8.onnx")
JOINT_ENC_ONNX = os.path.join(QUANTIZED_MODEL_DIR, "joint_enc_quantized_int8.onnx")
JOINT_PRED_ONNX = os.path.join(QUANTIZED_MODEL_DIR, "joint_pred_quantized_int8.onnx")
JOINT_PRE_NET_ONNX = os.path.join(QUANTIZED_MODEL_DIR, "joint_pre_net_quantized_int8.onnx")
# Language-specific adapter
LANGUAGE_CODE = "hi"  # 'hi' for Hindi
JOINT_POST_NET_ONNX = os.path.join(QUANTIZED_MODEL_DIR, f"joint_post_net_{LANGUAGE_CODE}_quantized_int8.onnx")

ASSETS_DIR = "indic-conformer-600m-onnx/assets" 
VOCAB_JSON = os.path.join(ASSETS_DIR, "vocab.json")
MASKS_JSON = os.path.join(ASSETS_DIR, "language_masks.json")

# Helper to find files if paths are wrong (e.g. in Kaggle working dir)
def find_file(filename, search_path="."):
    if os.path.exists(filename):
        return filename
    candidates = glob.glob(os.path.join(search_path, "**", os.path.basename(filename)), recursive=True)
    if candidates:
        print(f"Found {os.path.basename(filename)} at {candidates[0]}")
        return candidates[0]
    return None

VOCAB_JSON = find_file(VOCAB_JSON) or VOCAB_JSON
MASKS_JSON = find_file(MASKS_JSON) or MASKS_JSON

# ---- LOAD RESOURCES ----
if not os.path.exists(VOCAB_JSON) or not os.path.exists(MASKS_JSON):
    raise FileNotFoundError(f"Missing vocab or masks file. Vocab: {VOCAB_JSON}, Masks: {MASKS_JSON}")

with open(VOCAB_JSON, "r", encoding="utf-8") as f:
    vocab_raw = json.load(f)

with open(MASKS_JSON, "r", encoding="utf-8") as f:
    language_masks = json.load(f)

print(f"Loaded vocab for {len(vocab_raw)} languages.")
print(f"Loaded masks for {len(language_masks)} languages.")

# ---- LOAD ONNX MODELS ----
print("Loading ONNX models...")
encoder_sess = ort.InferenceSession(ENCODER_ONNX, providers=["CPUExecutionProvider"])
rnnt_decoder_sess = ort.InferenceSession(RNNT_DECODER_ONNX, providers=["CPUExecutionProvider"])
joint_enc_sess = ort.InferenceSession(JOINT_ENC_ONNX, providers=["CPUExecutionProvider"])
joint_pred_sess = ort.InferenceSession(JOINT_PRED_ONNX, providers=["CPUExecutionProvider"])
joint_pre_net_sess = ort.InferenceSession(JOINT_PRE_NET_ONNX, providers=["CPUExecutionProvider"])
joint_post_net_sess = ort.InferenceSession(JOINT_POST_NET_ONNX, providers=["CPUExecutionProvider"])

print("Models loaded successfully.")

# Print input shapes for debugging
print("Joint Enc inputs:")
for inp in joint_enc_sess.get_inputs():
    print(f"  {inp.name}: {inp.shape}")
print("Joint Pred inputs:")
for inp in joint_pred_sess.get_inputs():
    print(f"  {inp.name}: {inp.shape}")
print("Joint Pre Net inputs:")
for inp in joint_pre_net_sess.get_inputs():
    print(f"  {inp.name}: {inp.shape}")
print("Joint Post Net inputs:")
for inp in joint_post_net_sess.get_inputs():
    print(f"  {inp.name}: {inp.shape}")

# ---- AUDIO PREPROCESS ----
# Ensure audio path exists
if not os.path.exists(AUDIO_PATH):
    print(f"Audio file not found at {AUDIO_PATH}. Creating a dummy sine wave for testing...")
    sample_rate = 16000
    waveform = torch.sin(torch.linspace(0, 100, 16000*3)).unsqueeze(0) # 3 seconds
else:
    waveform, sample_rate = torchaudio.load(AUDIO_PATH)

if waveform.ndim > 1:
    waveform = waveform.mean(dim=0, keepdim=True)  # Convert to mono (1, T)
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
    waveform = resampler(waveform)

# Mel Spectrogram (match calibration: n_mels=80, n_fft=512, win_length=400, hop_length=160)
mel_transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000, n_fft=512, win_length=400, hop_length=160,
    f_min=0.0, f_max=8000.0, n_mels=80, window_fn=torch.hann_window, power=2.0
)
features = mel_transform(waveform)  # (1, 80, T)
features = torch.log(features + 1e-9)
# Normalize (per feature)
mean = features.mean(dim=2, keepdim=True)
stddev = features.std(dim=2, keepdim=True) + 1e-5
features = (features - mean) / stddev
features = features.squeeze(0)  # (80, T)

# ---- ENCODER INFERENCE ----
features_np = features.numpy().astype(np.float32)
length_np = np.array([features_np.shape[1]], dtype=np.int64)
encoder_input = {
    "input": np.expand_dims(features_np, axis=0),  # (1, 80, T)
    "length": length_np
}

print(f"Running inference on {ENCODER_ONNX}...")
encoder_inputs = encoder_sess.get_inputs()
encoder_input_dict = {encoder_inputs[0].name: encoder_input["input"],}
if len(encoder_inputs) > 1:
    encoder_input_dict[encoder_inputs[1].name] = encoder_input["length"]
encoder_outputs = encoder_sess.run(None, encoder_input_dict)
encoder_output = encoder_outputs[0]  # (1, T, D)

print(f"Encoder output shape: {encoder_output.shape}")

# Transpose encoder output to (1, T, D_enc) for joint_enc
encoder_output_transposed = encoder_output.transpose(0, 2, 1)  # (1, 69, 1024)

# ---- RNNT DECODING (Greedy) ----
print("Starting RNNT decoding (Greedy Search)...")

# 1. Run Joint Encoder once
# encoder_output_transposed: (1, T, 1024)
joint_enc_input_dict = {joint_enc_sess.get_inputs()[0].name: encoder_output_transposed}
enc_outputs = joint_enc_sess.run(None, joint_enc_input_dict)
enc_output = enc_outputs[0]  # (1, T, 640)
T = enc_output.shape[1]

print(f"Encoder output processed. Time steps: {T}")

# 2. Decoding Loop
BLANK_ID = 256
predicted_tokens = [BLANK_ID]  # Start with BLANK/SOS token
t = 0
max_symbols = 100

# Initial Prediction Network run
decoder_input = np.array([predicted_tokens], dtype=np.int32)
target_length = np.array([len(predicted_tokens)], dtype=np.int32)
rnnt_input_dict = {
    'targets': decoder_input,
    'target_length': target_length,
    'states.1': np.zeros((2, 1, 640), dtype=np.float32),
    'onnx::Slice_3': np.zeros((2, 1, 640), dtype=np.float32)
}
decoder_outputs = rnnt_decoder_sess.run(None, rnnt_input_dict)
# (1, 640, 1) -> (1, 1, 640)
decoder_output = decoder_outputs[0].transpose(0, 2, 1)
last_token_embedding = decoder_output[:, -1:, :]

joint_pred_input_dict = {joint_pred_sess.get_inputs()[0].name: last_token_embedding}
pred_outputs = joint_pred_sess.run(None, joint_pred_input_dict)
pred_current = pred_outputs[0] # (1, 1, 640)

while t < T and len(predicted_tokens) < max_symbols:
    # Get current encoder frame
    enc_current = enc_output[:, t:t+1, :] # (1, 1, 640)
    
    # Combine
    joint_input = enc_current + pred_current # (1, 1, 640)
    
    # Joint Net
    joint_pre_input_dict = {joint_pre_net_sess.get_inputs()[0].name: joint_input}
    pre_net_output = joint_pre_net_sess.run(None, joint_pre_input_dict)[0]
    
    joint_post_input_dict = {joint_post_net_sess.get_inputs()[0].name: pre_net_output}
    logits = joint_post_net_sess.run(None, joint_post_input_dict)[0] # (1, 1, vocab)
    
    # Argmax
    k = np.argmax(logits[0, 0, :])
    
    if k == BLANK_ID:
        t += 1
    else:
        predicted_tokens.append(int(k))
        # Update Prediction Network
        decoder_input = np.array([predicted_tokens], dtype=np.int32)
        target_length = np.array([len(predicted_tokens)], dtype=np.int32)
        rnnt_input_dict['targets'] = decoder_input
        rnnt_input_dict['target_length'] = target_length
        
        decoder_outputs = rnnt_decoder_sess.run(None, rnnt_input_dict)
        decoder_output = decoder_outputs[0].transpose(0, 2, 1)
        last_token_embedding = decoder_output[:, -1:, :]
        
        joint_pred_input_dict = {joint_pred_sess.get_inputs()[0].name: last_token_embedding}
        pred_outputs = joint_pred_sess.run(None, joint_pred_input_dict)
        pred_current = pred_outputs[0]

print(f"Decoding complete. Tokens: {predicted_tokens}")

# ---- DECODE TOKENS ----
vocab_list = vocab_raw[LANGUAGE_CODE]
BLANK_ID = 256

tokens = []
prev = None
for idx in predicted_tokens[1:]:  # Skip SOS token
    if idx == prev:
        prev = idx
        continue
    if idx == BLANK_ID:
        prev = idx
        continue
    
    if idx < len(vocab_list):
        tokens.append(vocab_list[idx])
    prev = idx

transcription = "".join(tokens).replace("▁", " ").strip()
print("-" * 30)
print("RNNT Transcription:", transcription)
print("-" * 30)

Loaded vocab for 22 languages.
Loaded masks for 22 languages.
Loading ONNX models...
Models loaded successfully.
Joint Enc inputs:
  input: ['B', 'T', 1024]
Joint Pred inputs:
  input: ['B', 'T', 640]
Joint Pre Net inputs:
  input: ['B', 'T', 640]
Joint Post Net inputs:
  input: ['B', 'T', 640]
Running inference on /kaggle/working/indic-conformer-600m-quantized-int8/encoder_quantized_int8.onnx...
Encoder output shape: (1, 1024, 67)
Starting RNNT decoding (Greedy Search)...
Encoder output processed. Time steps: 67
Decoding complete. Tokens: [256, 3, 173, 228, 3, 173, 228, 4, 204, 186, 39, 4, 204, 186, 181, 64, 9, 179, 180, 16, 196, 188, 189, 176, 12, 3, 173, 228, 100, 19, 179, 192, 68, 96, 136, 5, 116, 11, 183, 10, 5, 57, 176, 96]
------------------------------
RNNT Transcription: हाँ हाँ मूल्य मूलि भी बीस रुपये है हाँ ले लीजिएगा तो पंद्रह के पड़ेगा
------------------------------


# **5. Preparing For Download**

In [16]:
# Organize files into a clean folder structure before zipping.

print("Packaging quantized model...")

# Create subdirectories
onnx_dir = os.path.join(QUANTIZED_MODEL_DIR, "onnx")
adapters_dir = os.path.join(onnx_dir, "adapters")
config_dir = os.path.join(QUANTIZED_MODEL_DIR, "config")
os.makedirs(onnx_dir, exist_ok=True)
os.makedirs(adapters_dir, exist_ok=True)
os.makedirs(config_dir, exist_ok=True)

# Move ONNX files to onnx/ (excluding adapters)
for f in os.listdir(QUANTIZED_MODEL_DIR):
    if f.endswith(".onnx") and not f.startswith("joint_post_net_"):
        shutil.move(os.path.join(QUANTIZED_MODEL_DIR, f), os.path.join(onnx_dir, f))

# Move adapter files to onnx/adapters/
for f in os.listdir(QUANTIZED_MODEL_DIR):
    if f.startswith("joint_post_net_") and f.endswith(".onnx"):
        shutil.move(os.path.join(QUANTIZED_MODEL_DIR, f), os.path.join(adapters_dir, f))

# Move config files to config/

assets_dir = os.path.join(LOCAL_MODEL_DIR, "assets")
for f in os.listdir(assets_dir):
    if f.endswith(".json") or f.endswith(".ts"):
        shutil.copy(os.path.join(assets_dir, f), config_dir)

for f in os.listdir(LOCAL_MODEL_DIR):
    if f.endswith(".json") or f.endswith(".ts"):
        shutil.copy(os.path.join(LOCAL_MODEL_DIR, f), config_dir)


print(f"Organized files in {QUANTIZED_MODEL_DIR}")
print(f"Creating zip archive...")
shutil.make_archive("indic_conformer_600m_quantized_int8", 'zip', QUANTIZED_MODEL_DIR)
print(f"Created indic_conformer_600m_int8.zip for HuggingFace upload.")

print("Zip file is ready in the working directory. Download it from the 'Output' tab in Kaggle.")

Packaging quantized model...
Organized files in indic-conformer-600m-quantized-int8
Creating zip archive...
Created indic_conformer_600m_int8.zip for HuggingFace upload.
Zip file is ready in the working directory. Download it from the 'Output' tab in Kaggle.
