<a href="https://colab.research.google.com/github/dbstj1231/2023_AI_Academy_ASR/blob/main/6_nemo_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6. NeMo fine-tuning

## NeMo (NVIDIA, 2019)
-  https://developer.nvidia.com/nvidia-nemo (homepage)
- https://github.com/NVIDIA/NeMo (source)
- open-source framework for developers to build and train state-of-the-art (SOTA) conversational AI models
- PyTorch, PyTorch Lightning을 기반으로 작성된 E2E toolkit
- SOTA model들의 pretrained model을 제공
- https://catalog.ngc.nvidia.com/ (ASR pretrained models)

## NeMo 설치

### pip를 이용한 설치
- pip install nemo_toolkit['all']

### Source code를 이용한 설치
- apt-get update && apt-get install -y libsndfile1 ffmpeg
- git clone https://github.com/NVIDIA/NeMo
- cd NeMo
- ./reinstall.sh

### NeMo Docker containers
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
- docker pull nvcr.io/nvidia/nemo:22.11 (build container)
- docker run --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:22.11 (docker run)

In [None]:
# install NeMo
!pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all]

- `omegaconf` : yaml, json등의 configuration 파일을 읽고 쓸 수 있는 라이브러리, Nemo에서 기본적으로 사용
- `nemo.collections.asr` : NeMo ASR Class
- `nemo.utils.exp_manager` : 학습로그, conf 등에 사용되는 라이브러리

In [None]:
from omegaconf import OmegaConf, open_dict

In [None]:
import nemo
import nemo.collections.asr as nemo_asr
from nemo.utils import exp_manager

## pre-trained 모델 불러오기
- Quartznet model 사용
- https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#quartznet
- SOTA성능을 보이는 모델은 아니지만, 파라미터 수가 작아 low resourced computing 환경에서 동작이 쉬움
- 500MB~1,000MB 사이의 SOTA 수준 모델에 비해 78MB의 적은 용량으로 동작 가능
- Librispeech test set 기준 4.19%의 word error rate (WER)
- https://arxiv.org/abs/1910.10261 (paper)


<img src = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/_images/quartz_vertical.png" height=700>

- https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_quartznet15x5 (pretrained model)


In [None]:
# pretrained model load (model : "stt_en_quartznet15x5")


## Small data를 통한 모델 확인 (영어)

- `datasets.load_dataset` : 학습 및 테스트 데이터 관리 라이브러리

In [None]:
! pip install datasets

In [None]:
from datasets import load_dataset

In [None]:
# LibriSpeech dataset load
english_ds = load_dataset("kresnik/librispeech_asr_test", "clean")

In [None]:
english_ds

In [None]:
# check file list
english_ds = english_ds['test']

In [None]:
# check sample data
sample = english_ds[0]

In [None]:
from pprint import pprint

In [None]:
pprint(sample)

- `IPython.display` : IPython 위젯을 사용할 수 있는 라이브러리

In [None]:
import IPython.display as ipd

In [None]:
# listen audio file using ipd.Audio


In [None]:
# transcribe using pretrained model
result =

In [None]:
# check result
print(result)

In [None]:
# compare with reference
print("hypothesis: " + result[0])
print("reference: " + sample['text'].lower())

- `jiwer` : CER, WER 등 음성인식 결과 평가 관련 라이브러리

In [None]:
! pip install jiwer

In [None]:
from jiwer import cer

In [None]:
# calculate cer


In [None]:
results = model.transcribe(english_ds['file'][:10])

## traning/test 데이터 불러오기 및 데이터 확인 (한국어)

- dataset : file, audio, text, speaker 정보

In [None]:
# Zeroth-Korean dataset load
ds = load_dataset("kresnik/zeroth_korean", "clean")

In [None]:
# check file list
ds

In [None]:
total_train_ds =
test_ds =

In [None]:
# check sample data
total_train_ds

In [None]:
# listen audio file using ipd.Audio

In [None]:
# split train, validation, test set
total_train_ds =

In [None]:
total_train_ds

In [None]:
train_ds = total_train_ds['train']
val_ds = total_train_ds['test']

In [None]:
print(len(train_ds))
print(len(val_ds))
print(len(test_ds))

In [None]:
train_ds.column_names

NeMo의 ASR data preparation은 3가지의 정보 필요
- audio_filepath, duration, text

In [None]:
# remove columns ["audio", "speaker_id", "chapter_id", "id"]
train_ds =
val_ds =

In [None]:
train_ds[0].keys()

In [None]:
# rename column
train_ds = train_ds.rename_column(original_column_name= , new_column_name = )
val_ds = val_ds.rename_column(original_column_name= , new_column_name = )

In [None]:
train_ds[0].keys()

In [None]:
train_ds[0]

In [None]:
import soundfile as sf

In [None]:
# get duration

def get_duration(batch):


In [None]:
get_duration(train_ds[0])

In [None]:
train_ds = train_ds.map(get_duration)
val_ds = val_ds.map(get_duration)

In [None]:
train_ds[0]

In [None]:
ipd.Audio(train_ds[0]['audio_filepath'])

In [None]:
import os

In [None]:
train_json_path = os.path.abspath("train.json")
print(train_json_path)
val_json_path = os.path.abspath("validation.json")
print(val_json_path)

- `force_ascii = False` : 한글이 깨지지 않도록 encoding
- `orient = "records` : {columns:value} 형태의 딕셔너리를 요소로 하는 리스트 형태

In [None]:
# make json file
train_ds.to_json(train_json_path, lines=True, force_ascii=False, orient="records")

In [None]:
# check json file
! head -5 train.json

In [None]:
# make validation json file
val_ds.to_json( )

In [None]:
# check validation json file


In [None]:
hug_ds = load_dataset("json", data_files=train_json_path)

In [None]:
hug_ds = hug_ds['train']

In [None]:
hug_ds[0]

In [None]:
import pandas as pd

In [None]:
pd.read_json()

## 한국어 fine-tuning을 위한 모델 설정
- 영어로 학습된 모델을 소용량의 한국어를 이용해 원활히 tuning하기 위해  
model의 encoder 정보는 그대로 유지  
model의 deocder 정보(sequence of output unit representation)을 재학습하는 것이 효율적)
- 충분한 양의 데이터를 확보하지 못했거나, computing 환경이 부족할 때 사용하는 방법
- encoder 전체를 학습하지 않는 경우 normalization 문제가 발생할 수 있음  
Ex> 원본 모델이 학습한 음성 데이터와 새로운 학습 데이터의 볼륨(소리 크기) 차이가 많이 나는 문제  
이를 방지하기 위해 batch normalization 부분은 freeze하지 않음




In [None]:
import torch
import torch.nn as nn

In [None]:
list(model.modules())

In [None]:
# batch normalization unfreeze
def enable_bn(m):
  if type(m) == nn.BatchNorm1d:
    m.train()
    for param in m.parameters():
      param.requires_grad_(True)

In [None]:
# encoder freeze


In [None]:
for name, param in model.encoder.named_parameters():
  print(name, param.requires_grad)

In [None]:
# unfreeze apply
model.encoder.apply(enable_bn)

In [None]:
# unfreeze check (requires_grad -> True : 가중치 학습)
for name, param in model.encoder.named_parameters():
  print(name, param.requires_grad)

## 한국어 ouput unit 설정 및 training 세팅
- 영어 알파벳으로 정의된 모델을 한국어 음절로 변경

In [None]:
# check model config
print(OmegaConf.to_yaml(model.cfg))

In [None]:
train_ds[0]['text']

In [None]:
#join


In [None]:
# extract characters
def all_extract_characters(batch):


In [None]:
len(train_ds)

In [None]:
# make vocab list
train_vocab_list = train_ds.map(all_extract_characters, batched= True, batch_size=-1, remove_columns=train_ds.column_names)

In [None]:
len(train_vocab_list["vocab"][0])

In [None]:
val_vocab_list = val_ds.map(all_extract_characters, batched= True, batch_size=-1, remove_columns=val_ds.column_names)

In [None]:
vocab_list = list(set(train_vocab_list["vocab"][0]) | set(val_vocab_list["vocab"][0]))

In [None]:
print(len(vocab_list))

In [None]:
# check vocab list

In [None]:
# change model output unit
model.change_vocabulary(vocab_list)

In [None]:
print(OmegaConf.to_yaml(model.cfg))

In [None]:
import copy

In [None]:
# copy configuration
print(OmegaConf.to_yaml(model.cfg.train_ds))

In [None]:
cfg = copy.deepcopy(model.cfg)

In [None]:
# setup train, validation configuration(manifest_filepath, labels, normalize_transcripts(대소문자), batch_size, num_workers, pin_memory, trim_silence)
with open_dict(cfg) as f:
  f.train_ds.manifest_filepath = train_json_path
  f.train_ds.normalize_transcripts = False
  f.train_ds.batch_size = 8
  f.train_ds.num_workers = 2
  f.train_ds.pin_memory = True
  f.train_ds.trim_silence = True

  f.validation_ds.manifest_filepath = val_json_path
  f.validation_ds.normalize_transcripts = False
  f.validation_ds.batch_size = 8
  f.validation_ds.num_workers = 2
  f.validation_ds.pin_memory = True
  f.validation_ds.trim_silence = True

In [None]:
# setup data loader with new configs
model.setup_training_data(cfg.train_ds)
model.setup_validation_data(cfg.validation_ds)

In [None]:
# print original optimizer + scheduler
print(OmegaConf.to_yaml(model.cfg.optim))

In [None]:
??torch.optim.Adam

- lr = 0.01
- betas = [0.95, 0.25]
- weight_decay = 0.001 (original weight decay)
- sched.warup_steps = None (remove default number of steps of warmup)
- sched.warup_ratio = 0.05 (5% warmup)
- sched.min_lr = 1e-5

In [None]:
# setup optimizer
with open_dict(model.cfg.optim) as f:
  f.betas = [0.95, 0.25]
  f.weight_decay = 0.001
  f.sched.warmup_steps = None
  f.sched.warmup_ratio = 0.05
  f.sched.min_lr = 1e-5

In [None]:
print(OmegaConf.to_yaml(model.cfg.optim))

In [None]:
# print original spec_augment
print(OmegaConf.to_yaml(model.cfg.spec_augment))

In [None]:
model.spec_augmentaion = model.from_config_dict(model.cfg.spec_augment)

In [None]:
print(OmegaConf.to_yaml(model.cfg.train_ds))

## 학습
- 학습 과정을 표시하는 metric 설정

In [None]:
# use_cer & log_prediction
model._wer.use_cer = True
model._wer.log_prediction = True

In [None]:
import torch
import pytorch_lightning as ptl

In [None]:
if torch.cuda.is_available():
  gpus = 1
else:
  gpus = 0

In [None]:
epochs = 1

In [None]:
??ptl.Trainer

- gpus, max_epochs, accumulate_grad_batches, enable_checkpointing, logger, log_every_n_steps, check_val_every_n_epoch

In [None]:
# set trainer
trainer = ptl.Trainer(
    gpus = gpus,
    max_epochs = True,
    accumulate_grad_batches = 1,
    enable_checkpointing = False,
    log_every_n_steps = 50,
    check_val_every_n_epoch=1
)

In [None]:
# setup model with the trainer
model.set_trainer(trainer)

In [None]:
# update the model's internal config
model.cfg = model._cfg

In [None]:
import os

os.environ.pop('NEMO_EXPM_VERSION', None)

In [18]:
??exp_manager.ExpManagerConfig

In [None]:
# set log path
config = exp_manager.ExpManagerConfig(
    exp_dir = f"experiment/lang/",
    name=f"ASR-Char-Model-Korean",
    checkpoint_callback_params=exp_manager.CallbackParams(
        monitor="val_wer",
        mode="min",
        always_save_nemo=True,
        save_best_model=True
    ),
)

In [None]:
config

In [None]:
# set exp_manager
config =

In [None]:
config

In [None]:
logdir = exp_manager.exp_manager(trainer, config)

In [None]:
# set tensorboard
%load_ext tensorboard
%tensorboard --logdir /content/experiment/lang/ASR-Char-Model-Korean/

In [None]:
# start train


In [None]:
# save model

## 테스트

In [None]:
# load trained model

In [None]:
# transcribe using trained model

In [None]:
# compare with reference

In [None]:
# calculate cer

## NeMo에서 full recipe로 학습을 수행하는 방법
- 임의의 데이터에 대해 같은 모델을 처음부터 학습하는 방법
- Colab에서는 동작이 어려움
- 시간 소요가 큼
- https://github.com/NVIDIA/NeMo/tree/main/examples/asr/asr_ctc