# Upload experimental models to Hugging Face Model repositories
This notebook is a helper for uploading pre-trained models to Hugging Face. It allows you to add README info for experiments at upload time for better documentation. 

*First*: Make sure that you have added your HuggingFace Hub token in some way or logged in on the command line via `huggingface-cli login`

In [1]:
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError
import transformers

from pathlib import Path

  import pynvml  # type: ignore[import]


In [2]:
# Model name prefix to 
MODEL_ROOT = Path("../data/models/")
ALL_MODELS_README = """
---
license: mit
language:
- en
pipeline_tag: automatic-speech-recognition
---
# About 
This model was created to support experiments for evaluating phonetic transcription 
with the Buckeye corpus as part of https://github.com/ginic/multipa. 
This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a specific subset of the Buckeye corpus.
For details about specific model parameters, please view the config.json here or 
training scripts in the scripts/buckeye_experiments folder of the GitHub repository. 

# Experiment Details
"""

TIMIT_FINED_TUNED_README ="""
---
license: mit
language:
- en
pipeline_tag: automatic-speech-recognition
---
# About 
This model was created to support experiments for evaluating phonetic transcription 
with the Buckeye and TIMIT corpus as part of https://github.com/ginic/multipa. 
This is a version of excalibur12/wav2vec2-large-lv60_phoneme-timit_english_timit-4k that was further fine-tuned on a subset of the Buckeye corpus.
For details about specific model parameters, please view the config.json here or 
training scripts in the scripts/fine_tuning_experiments folder of the GitHub repository. 

# Experiment Details
"""

# Specific sets of experiments have more details. I just copied these from the EXPERIMENT_LOG.md 
README_MAPPINGS = {
#     # This was the best hyperparam tuned model & these model parameters were used for all other experiments
#     "hyperparam_tuning_1":"""The best performing model from hyperparameter tuning experiments (batch size, learning rat, base model to fine tune). Vary the random seed to select training data while keeping an even 50/50 gender split to measure statistical significance of changing training data selection. Retrain with the same model parameters, but different data seeding to measure statistical significance of data seed, keeping 50/50 gender split. 

# Goals: 
# - Choose initial hyperparameters (batch size, learning rat, base model to fine tune) based on validation set performance
# - Establish whether data variation with the same gender makeup is statistically significant in changing performance on the test set (first data_seed experiment)
# """,
#     "data_seed_bs64": """Vary the random seed to select training data while keeping an even 50/50 gender split to measure statistical significance of changing training data selection. Retrain with the same model parameters, but different data seeding to measure statistical significance of data seed, keeping 50/50 gender split. 

# Goals: 
# - Establish whether data variation with the same gender makeup is statistically significant in changing performance on the test set

# Params to vary:
# - training data seed (--train_seed): [91, 114, 771, 503]
# """,

#     "gender_split": """Still training with a total amount of data equal to half the full training data (4000 examples), vary the gender split 30/70, but draw examples from all individuals. Do 5 models for each gender split with the same model parameters but different data seeds. 

# Goals: 
# - Determine how different in gender split in training data affects performance

# Params to vary: 
# - percent female (--percent_female) [0.0, 0.3, 0.7, 1.0]
# - training seed (--train_seed)
# """, 

#     "vary_individuals": """These experiments keep the total amount of data equal to half the training data with the gender split 50/50, but further exclude certain speakers completely using the --speaker_restriction argument. This allows us to restrict speakers included in training data in any way. For the purposes of these experiments, we are focussed on the age demogrpahic of the user.  

# For reference, the speakers and their demographics included in the training data are as follows where the speaker age range 'y' means under 30 and 'o' means over 40: 

# | speaker_id | speaker_gender | speaker_age_range | 
# | ---------- | -------------- | ----------------- |
# | S01 | f | y |
# | S04 | f | y | 
# | S08 | f | y | 
# | S09 | f | y | 
# | S12 | f | y | 
# | S21 | f | y | 
# | S02 | f | o |
# | S05 | f | o | 
# | S07 | f | o | 
# | S14 | f | o | 
# | S16 | f | o |
# | S17 | f | o | 
# | S06 | m | y | 
# | S11 | m | y | 
# | S13 | m | y | 
# | S15 | m | y | 
# | S28 | m | y | 
# | S30 | m | y |
# | S03 | m | o | 
# | S10 | m | o | 
# | S19 | m | o |
# | S22 | m | o |
# | S24 | m | o | 


# Goals: 
# - Determine how variety of speakers in the training data affects performance

# Params to vary: 
# - training seed (--train_seed)
# - demographic make up of training data by age, using --speaker_restriction 
#     - Experiments `young_only`: only individuals under 30, S01 S04 S08 S09 S12 S21 S06 S11 S13 S15 S28 S30
#     - Experiments `old_only`: only individuals over 40, S02 S05 S07 S14 S16 S17 S03 S10 S19 S22 S24
# """
#     "full_dataset": """The entire train split of the Buckeye corpus was used to train this model. 
# The only data excluded are samples in the train split that are too short (< 0.1 seconds) or too long (>12 seconds) to be used to train the model 

# Goals: 
# - Include the largest amount of training data possible. 
# - Can be used with a different corpus (e.g. TIMIT, Speech Accent Archive) for evaluation to test generalization to other dialects and language varieties. 
# """

    "fine_tune_data_seed": """These experiments take a wav2vec2.0 model originially fine-tuned on TIMIT (excalibur12/wav2vec2-large-lv60_phoneme-timit_english_timit-4k) and further fine-tune it on the Buckeye corpus.
The random seed is varied to select training data while keeping an even 50/50 gender split to measure significance of changing training data selection.

Goals:
- Determine how additional fine-tuning on different corpora affect performance on test sets for both corpora
- Establish whether data variation with the same gender makeup is statistically significant in changing performance on the test set

Params to vary:
- training data seed (--train_seed)
- batch size: [64, 32] will be indicated at the end of the model name following "bs"
"""

}

In [3]:
api = HfApi()
for model_folder in MODEL_ROOT.iterdir():
    if model_folder.is_dir(): 
        for prefix in README_MAPPINGS.keys(): 
            if model_folder.name.startswith(prefix):
                print(f"Model {model_folder} matches prefix '{prefix}'.")
                if prefix=="fine_tune_data_seed":
                    hub_name = f"ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_{model_folder.name[-6:]}"
                    full_readme = "".join([TIMIT_FINED_TUNED_README, README_MAPPINGS[prefix]])
                    model_to_upload = model_folder / "wav2vec2-large-lv60_phoneme-timit_english_timit-4k-buckeye-ipa"
                else:
                
                    hub_name = f"ginic/{model_folder.name}_wav2vec2-large-xlsr-53-buckeye-ipa" 
            
                    full_readme = "".join([ALL_MODELS_README, README_MAPPINGS[prefix]])
                    model_to_upload = model_folder / "wav2vec2-large-xlsr-53-buckeye-ipa"
                readme_path = model_to_upload / "README.md"
                readme_path.write_text(full_readme)

                model_pipeline = transformers.pipeline("automatic-speech-recognition", model=model_to_upload)
                print("Uploading to hub as:", hub_name)
                model_pipeline.push_to_hub(hub_name)
                print("Uploading README for", hub_name)
                api.upload_file(
                    path_or_fileobj = readme_path, 
                    path_in_repo = "README.md",
                    repo_id = hub_name, 
                    repo_type = "model"
                )

                # Don't look at other prefix keys, the model is already uploaded
                break



Model ../data/models/fine_tune_data_seed_bs32_1 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_1


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpid9b1obf/model.safetensors    :   1%|1         | 16.8MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_1
Model ../data/models/fine_tune_data_seed_bs32_2 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_2


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmp_jssw771/model.safetensors    :   2%|1         | 25.1MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_2
Model ../data/models/fine_tune_data_seed_bs32_3 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_3


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpv1ydz2ho/model.safetensors    :   3%|2         | 33.5MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_3
Model ../data/models/fine_tune_data_seed_bs32_4 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_4


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpzybzlrov/model.safetensors    :   2%|1         | 25.1MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_4
Model ../data/models/fine_tune_data_seed_bs32_5 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_5


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpx93xvgsw/model.safetensors    :   1%|          | 9.75MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_5
Model ../data/models/fine_tune_data_seed_bs64_1 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_1


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpnp_mkgd8/model.safetensors    :   3%|2         | 33.4MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_1
Model ../data/models/fine_tune_data_seed_bs64_2 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_2


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpglqckrt9/model.safetensors    :   0%|          | 3.79MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_2
Model ../data/models/fine_tune_data_seed_bs64_3 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_3


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpwvei7d7s/model.safetensors    :   1%|          | 7.58MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_3
Model ../data/models/fine_tune_data_seed_bs64_4 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_4


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpm0kfs67p/model.safetensors    :   1%|          | 9.19MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_4
Model ../data/models/fine_tune_data_seed_bs64_5 matches prefix 'fine_tune_data_seed'.
Uploading to hub as: ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_5


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpxmna6bm8/model.safetensors    :   1%|          | 7.05MB / 1.26GB            

Uploading README for ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs64_5


In [4]:
# Sanity check that upload worked and the model from the hub can be used for inference
from multipa.data_utils import load_buckeye_split
import datasets

dataset = datasets.load_dataset("MLCommons/peoples_speech", "clean", split="train", streaming=True).take(2)
dataset = dataset.cast_column("audio", datasets.Audio(sampling_rate=16_000))
print(list(dataset))
pipe = transformers.pipeline("automatic-speech-recognition", model="ginic/wav2vec2-large-lv60_phoneme-timit_english_timit-4k_buckeye-4k_bs32_5")
for i in list(dataset): 
    pred = pipe(i["audio"])
    print("actual text:", i["text"])
    print("prediction:", pred)



Resolving data files:   0%|          | 0/804 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/804 [00:00<?, ?it/s]

  from pkg_resources import resource_filename


[{'id': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00000.flac', 'audio': {'path': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00000.flac', 'array': array([ 0.14205933,  0.20620728,  0.27151489, ...,  0.00402832,
       -0.00628662, -0.01422119], shape=(238720,)), 'sampling_rate': 16000}, 'duration_ms': 14920, 'text': "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"}, {'id': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00001.flac', 'audio': {'path': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00001.flac', 'array': array([-0.01480103,  0.05319214, -0.0105896 , ..., -0.02996826,
        0.06680298,  0.0071106 ], shape=(232480,)), 'sampling_rate': 16000}, 'duration_ms': 14530, 'text': "state we support agriculture to the tune of point four percent no way i made a mistake

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json:   0%|          | 0.00/820 [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/30.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/548 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/214 [00:00<?, ?B/s]

actual text: i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to
prediction: {'text': 'ɑwɑɾ̃ɪdtɪdʒɪʃʃɛɹfjuθɪŋzbʌɾɑmɡʌɾ̃ʌnɑtʃɛɹʌzmʌtʃɪzɑwɑnɪdɪʃɛɹbikʌzwiɑɹstɑɹɾɪŋleɪʔaɪdlaɪktɪɡɪtðɪsθɪŋɡoʊɪnsʌwiɡɔlɡɪɾhoʊmɛɾɪdisʌnaʊɹ̩ʌmðɪsðɪsʌlɛkʃɪnɪzʌmvɛɹiɪmpɔɹʔn̩tuʌ'}
actual text: state we support agriculture to the tune of point four percent no way i made a mistake this year they lowered it from point four percent to point three eight percent and in the same breath they're saying food
prediction: {'text': 'steɪʔwisʌpɔɹɾæɡɹɹ̩kʌltʃɹ̩tɪðɪtunʌvpɔnfɔɹpɹ̩sɛnʔoʊnoʊweɪʔaɪmeɪɾʌmʌsteɪkðɪʃjɪɹðeɪloʊɹ̩ɾɪtfɹ̩mpɔntfɔɹpɹ̩sɛnttʌpɔntθɹieɪpɹ̩sɛnʔæɾ̃ɪnnɪseɪmbɹɛθðɹ̩seɪmfuds'}
