# AI4Bharat ASR to HF Compatible Format

The objective of this notebook is to convert the AI4Bharat ASR models to the Hugging Face compatible "transformers" format. This allows us to use transformer's "automatic-speech-recognition" pipeline to transcribe speech using the AI4Bharat models.

This notebook focuses on converting the [indicwav2vec-kannada](https://github.com/AI4Bharat/IndicWav2Vec?tab=readme-ov-file#download-models) model to the Hugging Face compatible format. The same steps can be followed for other AI4Bharat ASR models.

You can run this on Google Colab or any other environment with a GPU.

## Installation and Setup

In [None]:
! apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev
! add-apt-repository ppa:savoury1/ffmpeg4 -y && apt-get update && apt-get install ffmpeg

In [None]:
!apt install -y liblzma-dev libbz2-dev libzstd-dev libsndfile1-dev libopenblas-dev libfftw3-dev libgflags-dev libgoogle-glog-dev build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev

In [None]:
! pip install transformers datasets pyctcdecode soundfile gradio;
! pip install https://github.com/kpu/kenlm/archive/master.zip;

In [None]:
%cd /content

In [None]:
!rm -rf IndicWav2Vec fairseq kenlm flashlight
!git clone https://github.com/AI4Bharat/IndicWav2Vec.git
!git clone https://github.com/pytorch/fairseq.git
!git clone https://github.com/kpu/kenlm.git
!git clone https://github.com/flashlight/flashlight.git

Install packages

In [None]:
!pip install "numpy<1.24"
!pip install transformers==4.29.2

In [None]:
%cd /content/IndicWav2Vec
!pip install packaging soundfile swifter -r w2v_inference/requirements.txt
%cd ..

Build fariseq

In [None]:
%cd /content/fairseq
!git checkout cf8ff8c3c5242e6e71e8feb40de45dd699f3cc08
!pip install ./
%cd /content

Build KenLM



In [None]:
%cd /content/kenlm
!mkdir -p build
%cd build
!cmake ..
!make -j 16
%cd /content

Build Flashlight

In [None]:
%cd /content/flashlight/bindings/python
!git checkout 06ddb51857ab1780d793c52948a0759f0ccc6ddb
!export USE_MKL=0 && export KENLM_ROOT="/content/kenlm/" && python setup.py install
%cd /content

## Build Model

Download model

In [None]:
!wget https://indic-asr-public.objectstore.e2enetworks.net/aaai_ckpts/models/ta/ta.pt -O /content/ta.pt

Load Model

In [None]:
import torch

DEVICE_ID = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_PATH = '/content/IndicWav2Vec/ta.pt'

In [None]:
%cd /content/IndicWav2Vec

from inference.support import load_model

In [None]:
model, char_dict = load_model(MODEL_PATH)
model.to(DEVICE_ID)
%cd /content

Install git-lfs

In [None]:
!curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
!apt-get install git-lfs
!git lfs install

# Put in your details
!git config --global user.email "...."
!git config --global user.name "...."

Login to huggingface-hub

In [None]:
!huggingface-cli login

In [None]:
from transformers import Wav2Vec2Config
from huggingface_hub import create_repo, Repository

from transformers import pipeline, AutoModelForCTC, Wav2Vec2Processor, Wav2Vec2ProcessorWithLM

### Export models to HuggingFace

Create and Initialize Repo

In [None]:
repo_url = create_repo("indicwav2vec-tamil", private=True)

Save config.json from a "similar" architecture in huggingface

In [None]:
repo = Repository(local_dir="indicwav2vec-tamil", clone_from=repo_url)

In [None]:
config = Wav2Vec2Config.from_pretrained('facebook/wav2vec2-large-960h-lv60-self')
config.save_pretrained('indicwav2vec-tamil')

In [None]:
# using the indicwav2vec-hindi config.json for indicwav2vec-tamil
import json

data = {
  "_name_or_path": "facebook/wav2vec2-large-960h-lv60-self",
  "activation_dropout": 0.1,
  "adapter_kernel_size": 3,
  "adapter_stride": 2,
  "add_adapter": False,
  "apply_spec_augment": True,
  "architectures": [
    "Wav2Vec2ForCTC"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 1,
  "classifier_proj_size": 256,
  "codevector_dim": 256,
  "contrastive_logits_temperature": 0.1,
  "conv_bias": True,
  "conv_dim": [
    512,
    512,
    512,
    512,
    512,
    512,
    512
  ],
  "conv_kernel": [
    10,
    3,
    3,
    3,
    3,
    2,
    2
  ],
  "conv_stride": [
    5,
    2,
    2,
    2,
    2,
    2,
    2
  ],
  "ctc_loss_reduction": "sum",
  "ctc_zero_infinity": False,
  "diversity_loss_weight": 0.1,
  "do_stable_layer_norm": True,
  "eos_token_id": 2,
  "feat_extract_activation": "gelu",
  "feat_extract_dropout": 0.0,
  "feat_extract_norm": "layer",
  "feat_proj_dropout": 0.1,
  "feat_quantizer_dropout": 0.0,
  "final_dropout": 0.1,
  "gradient_checkpointing": False,
  "hidden_act": "gelu",
  "hidden_dropout": 0.1,
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "layerdrop": 0.1,
  "mask_feature_length": 10,
  "mask_feature_min_masks": 0,
  "mask_feature_prob": 0.0,
  "mask_time_length": 10,
  "mask_time_min_masks": 2,
  "mask_time_prob": 0.05,
  "model_type": "wav2vec2",
  "num_adapter_layers": 3,
  "num_attention_heads": 16,
  "num_codevector_groups": 2,
  "num_codevectors_per_group": 320,
  "num_conv_pos_embedding_groups": 16,
  "num_conv_pos_embeddings": 128,
  "num_feat_extract_layers": 7,
  "num_hidden_layers": 24,
  "num_negatives": 100,
  "output_hidden_size": 1024,
  "pad_token_id": 0,
  "proj_codevector_dim": 256,
  "tdnn_dilation": [
    1,
    2,
    3,
    1,
    1
  ],
  "tdnn_dim": [
    512,
    512,
    512,
    512,
    1500
  ],
  "tdnn_kernel": [
    5,
    3,
    3,
    1,
    1
  ],
  "torch_dtype": "float32",
  "transformers_version": "4.19.2",
  "use_weighted_layer_sum": False,
  "vocab_size": 68,
  "xvector_output_dim": 512
}

with open('/content/config.json', 'w') as f:
    json.dump(data, f, indent=2)

In [None]:
# downloading the dictionary as per the github repo
!wget https://indic-asr-public.objectstore.e2enetworks.net/aaai_ckpts/models/ta/dict.ltr.txt -O /content/dict.ltr.txt

Convert ASR model to Huggingface's format

In [None]:
import transformers

transformers.__version__

In [None]:
%cd "/content/IndicWav2Vec"
!python workshop-2022/utils/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py \
    --pytorch_dump_folder /content/IndicWav2Vec/indicwav2vec-tamil \
    --checkpoint_path /content/ta.pt \
    --config_path /content/config.json \
    --dict_path /content/dict.ltr.txt
%cd /content

Push to Huggingface Model Hub

In [None]:
%cd "/content/IndicWav2Vec/indicwav2vec-tamil"
!huggingface-cli lfs-enable-largefiles .
!git lfs track "*.binary"
!git add .
!git commit -m "added language model"
!git push origin main
%cd /content