# Elpis + Huggingface inferencing

This notebook provides two methods to infer audio using models published on Huggingface hub. 

1) First up, install the packages we need.

In [None]:
!pip install datasets
!pip install transformers
!pip install tqdm
!pip install json-to-elan
!pip install gdown

2) Next, import them.

In [None]:
import datasets
import json
import string
import gdown
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm
from pathlib import Path
from json_to_elan import make_elan

3) Specify the model that you want to use to infer with.

In [None]:
model_name = "FYTM_fb_GUN"
my_model = f"elpis/{model_name}"

4) This code creates the pipeline. The `access_token` is required for private models. The access token must be from the account which published the model.

In [None]:
access_token = "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"
pipe = pipeline("automatic-speech-recognition", model=my_model, use_auth_token=access_token)

5) Let's put the infer code into a function for simpler repetition. 

In [None]:
# Build some directories
working_dir = Path("/content")
audio_dir_path = working_dir / "audio"

inf_dir_path = working_dir / "drive/MyDrive/Zara/inf_test"

audio_dir_path.mkdir(parents=True, exist_ok=True)
inf_dir_path.mkdir(parents=True, exist_ok=True)

In [None]:
def infer_me(audio_file_path:Path=None, inf_dir_path:Path=None, inf_name:str=""):

    #  Get the inference
    infer = pipe(f"{audio_file_path}", chunk_length_s=10, return_timestamps="word")

    # Write the inference text to a file
    inf_text_file_path = inf_dir_path / f"{inf_name}_inf.txt"
    with open(inf_text_file_path, "w") as infer_text_file:
        infer_text_file.write(str.lower(infer["text"]))
    
    # Write the data to JSON file for later conversion to Elan format
    inf_json_file_path = inf_dir_path / f"{inf_name}_inf.json"
    with open(inf_json_file_path, "w") as infer_json_file:
        json.dump(infer["chunks"], infer_json_file)

## Download and process files


In [None]:
# Mount your Google Drive. Can be handy for moving results around...
from google.colab import drive
drive.mount('/content/drive')

In [None]:

source_files = [
    ["1X83crFz0yEc5_AKgDTsjjc-vltf1HrlY", "ZMS_EIP_010_Pronoun.wav"],
    # ["1ugkbrI3Yfqzs_1gvhZxRZwhm3Zq9vPJP", "ZMS_EIP_011_Millions.wav"],
    # ["1RtmZi_zg4WfB8ViVvRZIvWMh_8b5D9mN", "ZMS_EIP_013_Transaction.wav"],
    # ["1SzPct1m5a1O7LANAuZ2APJW5PlbsXdXe", "ZMS_GUN_004_Vocab4001.wav"],
    # ["1lIVSesIWngp_rMHy4UFVtRUB8lYqgUHP", "ZMS_GUN_004_Vocab4010.wav"],
    # ["1YxrJt0G5gjA03TWHWbl5-9_xbKLcSQSZ", "ZMS_JER_019_FIWS2-6.wav"],
    # ["1B7tNM5pr2qnmHA5qbmezXg3Rg0gzRowX", "ZMS_JER_079_M-GrammarFartsID.wav"],
]


In [None]:
# Use this to test a single file
# source_files = [
#     ["1X83crFz0yEc5_AKgDTsjjc-vltf1HrlY", "ZMS_EIP_010_Pronoun.wav"],
# ]

In [None]:
# Download each audio file and save them in the content dir
for source_file in source_files:
    audio_file_path = audio_dir_path / source_file[1]
    gdown.download(id=source_file[0], output=audio_file_path.as_posix(), quiet=False)


In [None]:
for source_file in source_files:
    audio_file_path = audio_dir_path / source_file[1]
    name_stem = audio_file_path.stem
    inf_file_name = f"{name_stem}_{model_name}"
    audio_file_path = audio_dir_path / source_file[1]
    infer_me(audio_file_path=audio_file_path, inf_dir_path=inf_dir_path, inf_name=inf_file_name)
    print(f"{source_file[1]} done \n")

Build Elan files from the JSON files in the dir

In [None]:
make_elan(data_dir="/content/inf")

Copy the files to Google Drive for use in other notebooks

In [None]:
!cp /content/inf/* /content/drive/MyDrive/Zara/eaf_ref_inf

In [None]:
!cp /content/audio/* /content/drive/MyDrive/Zara/eaf_ref_inf

And then you might like to upload Elan ref files to that `eaf_ref_inf` folder and run this notebook which aligns inference words to reference utterances based on the timing info.

https://colab.research.google.com/drive/1bQZn318tUGSTujWON5moqlvaoyuohUas?usp=sharing