Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS SpeechLLM models #8364

Closed
wants to merge 206 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
206 commits
Select commit Hold shift + click to select a range
381f3a5
Init dataloader changes
subhankar-ghosh May 30, 2023
2cf5567
Changes to T5 dataloader to accept text and speech
subhankar-ghosh Jun 13, 2023
e114f75
Expand vocabulary of T5 model to include speech tokens, dataset chang…
subhankar-ghosh Jun 14, 2023
4c0145f
Add todo statement.
subhankar-ghosh Jun 14, 2023
f5b4393
Removing print statements.
subhankar-ghosh Jun 14, 2023
2ca3eca
wip: add pseudocode for speech layers
blisc Jun 14, 2023
e38dcd7
WIP
blisc Jun 27, 2023
e7e5de7
WIP2
blisc Jun 28, 2023
fbec910
working?
blisc Jun 28, 2023
4eb5a19
merge
blisc Jun 28, 2023
f3b9e3c
working fp32
blisc Jun 28, 2023
979baa5
update to bf16; version 1 of decoder-only code
blisc Jul 7, 2023
485a5f8
first working version
blisc Jul 12, 2023
4eb8ae4
WIP decoder+bug fixes
blisc Jul 15, 2023
bc3b9a0
added masking for speech llm pretraining
paarthneekhara Jul 15, 2023
28654b3
bug fix
paarthneekhara Jul 15, 2023
51357dd
added span length to masking procedure
paarthneekhara Jul 17, 2023
47b36ea
WIP decoder+bug fixes
blisc Jul 15, 2023
935ee9d
wip
paarthneekhara Jul 24, 2023
ceaa55a
wip
paarthneekhara Jul 24, 2023
6bdef90
some bug fixes
paarthneekhara Jul 24, 2023
c39a7ad
WIP
blisc Jul 25, 2023
2f53d19
WIP
blisc Jul 25, 2023
0e5e4e2
working v1
blisc Jul 27, 2023
d25d16b
debug
blisc Jul 27, 2023
f48daf9
remove encodec dep for now
blisc Jul 28, 2023
0d35d09
pretraining code setup
paarthneekhara Jul 31, 2023
01792be
hacky speech inference working
blisc Aug 1, 2023
3f71646
Add vocabulary expansion and also changes to model related to this.
subhankar-ghosh Aug 2, 2023
25167a0
bugfix for labels; minor cleanup/refactor
blisc Aug 2, 2023
71be7fd
remove print
blisc Aug 2, 2023
7ef3a10
wip, pretraining seems to be running
paarthneekhara Aug 3, 2023
893da81
wip
paarthneekhara Aug 4, 2023
52651e4
Loss not decreasing during SFT bug fix.
subhankar-ghosh Aug 7, 2023
4b71b1e
Change save and load logic to save entire model, expand positional em…
subhankar-ghosh Aug 8, 2023
0b40aae
pretraining on both text and speech
paarthneekhara Aug 8, 2023
045e263
merging
paarthneekhara Aug 8, 2023
c5778ba
Merge branch 'NVIDIA-subhankarg/speechlm' into speechlm_t5_pretraining
paarthneekhara Aug 8, 2023
8e460b7
merge in progress
paarthneekhara Aug 8, 2023
62cbefa
merging continued
paarthneekhara Aug 9, 2023
6d6fe39
reverting unnecessary changes
paarthneekhara Aug 9, 2023
a703282
pretraining running
paarthneekhara Aug 9, 2023
bf62163
bug fixes
paarthneekhara Aug 9, 2023
4082141
removed hardcoded speech offset
paarthneekhara Aug 9, 2023
a5763de
separated speech token output embeddings
paarthneekhara Aug 10, 2023
193256b
SFT changes on merged branch
shehzeen Aug 10, 2023
2ba8253
converting labels to correct range
shehzeen Aug 10, 2023
7c1c7d8
added comments
shehzeen Aug 10, 2023
2727058
enabled tensorboard audio logging when precision is 16
shehzeen Aug 13, 2023
e17fec3
add option to remove the conv layer and only use the final linear layer
blisc Aug 14, 2023
546080a
Merge branch 'speechllm_tts' into speechlm_merge_main
blisc Aug 14, 2023
49421b3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 14, 2023
aba8f75
add teacher forcing inference
blisc Aug 15, 2023
27feb8f
remove debug
blisc Aug 15, 2023
c72c6c5
merge post style fix
blisc Aug 15, 2023
d6776b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2023
ad4c337
update config
blisc Aug 15, 2023
f47764d
Merge working code into main branch (#7226)
blisc Aug 21, 2023
7ce7937
Shehzeen speechlm (#7283)
shehzeen Aug 22, 2023
14d5449
save work
blisc Aug 25, 2023
4a26895
Speechlm delayparallel (#7324)
paarthneekhara Aug 28, 2023
95c6a98
merge
blisc Sep 7, 2023
b531d53
update
blisc Sep 8, 2023
244aba2
Speechlm 3b shehzeen (#7411)
shehzeen Sep 14, 2023
1f194e1
wip
blisc Sep 19, 2023
1423df8
merge
blisc Sep 19, 2023
f188d49
fix
blisc Sep 19, 2023
b89d3f3
more cleanup
blisc Sep 20, 2023
983373c
merge with main
blisc Sep 21, 2023
2df19ce
merge WIP
blisc Sep 21, 2023
0217715
update
blisc Sep 21, 2023
3768775
fix some merge issues
blisc Sep 21, 2023
1076141
[TTS][SpeechLLM] Merge ASR SFT changes to SpeechLLM T5 (#7491)
subhankar-ghosh Sep 26, 2023
34447cc
WIP
blisc Sep 29, 2023
242c230
cleanup
blisc Sep 29, 2023
5e2735b
push working version
blisc Sep 29, 2023
9d64543
remove encodec in dataset
blisc Oct 3, 2023
1f2daf5
merge
blisc Oct 3, 2023
5d68912
WIP
blisc Oct 3, 2023
0966dcb
update reqs
blisc Oct 4, 2023
574cfd7
add back some infer changes
blisc Oct 4, 2023
ff08a9f
update hash
blisc Oct 4, 2023
18bda77
update
blisc Oct 5, 2023
9ba0959
install megatron into less used folder
blisc Oct 6, 2023
4a0715d
add train logging
blisc Oct 11, 2023
756f345
add initial tarred
blisc Oct 11, 2023
3caeb11
update code
blisc Oct 11, 2023
3f80067
update logging for sft
blisc Oct 11, 2023
fab96c2
add missing file
blisc Oct 11, 2023
d50bbca
initial commit of json tokenizer
blisc Oct 13, 2023
db97121
tokenzier bug fix; and improve logging
blisc Oct 13, 2023
94b3b0b
speed up preproc; add debug message
blisc Oct 16, 2023
515a718
speedup
blisc Oct 17, 2023
a9a965e
bugfix for tokenized dataset
blisc Oct 18, 2023
97ac61b
add tar file shuffling
blisc Oct 18, 2023
92cf532
Paarth gpt sft (#7)
paarthneekhara Oct 20, 2023
822aec4
add validation logic
blisc Oct 24, 2023
b3c25ad
first working commit of attention logging
blisc Oct 25, 2023
7bdc5a2
some logging fixes
blisc Oct 25, 2023
608fff1
increase check interval
blisc Oct 25, 2023
7c10445
add second config
blisc Oct 26, 2023
023853f
add new var for train logging; add return logs to attention
blisc Oct 27, 2023
1abfcc8
update logging
blisc Oct 27, 2023
3ad6f6b
update pruning
blisc Oct 27, 2023
18b4fdb
update pruning
blisc Oct 27, 2023
f72ece7
from scratch
blisc Oct 27, 2023
64ca135
update attention logging
blisc Oct 30, 2023
93315b3
finalize
blisc Oct 30, 2023
0182c8f
fix double attention call
blisc Oct 31, 2023
1ca53ab
Add sliced attention logging; add option to set context length
blisc Oct 31, 2023
1bb8ab4
inference eval code (#8)
paarthneekhara Oct 31, 2023
43115d9
update attention logging, start attention prior
blisc Nov 2, 2023
fb59c97
cleanup and add prior annealing; add scratch config
blisc Nov 3, 2023
1f436b2
YA prior change
blisc Nov 6, 2023
8a5780a
add back attention masks, add spec aug options; update prior again
blisc Nov 9, 2023
08f4029
update exp manager
blisc Nov 21, 2023
ece8322
Speechllm 2310 rebased (#9)
paarthneekhara Nov 21, 2023
7d92f33
add comment
blisc Nov 24, 2023
072683a
merge
blisc Nov 24, 2023
43fb29b
break prior for now, update alibi, remove val layer from prog bar, ma…
blisc Nov 28, 2023
49db9ce
typo
blisc Nov 28, 2023
fe8771d
Add debug msg in model; update dataset logic re context
blisc Nov 29, 2023
b256c52
update tarred to match non-tarred setup
blisc Nov 30, 2023
8c30c24
Attention Prior, Tarred MLS Dataset, Top K Sampling, Multitask audio …
shehzeen Dec 5, 2023
68d9c41
inference updates, top k sampling, edit speech task switched to use p…
paarthneekhara Dec 6, 2023
6e7c043
merge with main
blisc Dec 6, 2023
5ddca71
working gpt code; have to update megatron-core in docker
blisc Dec 6, 2023
68bd820
merge with t5 branch; ATTN PRIOR is currently BROKEN
blisc Dec 6, 2023
e90d513
model is running
blisc Dec 7, 2023
b966e65
enable attn prior; update logging and printing
blisc Dec 7, 2023
2381f95
fix validation loop; change logging a bit
blisc Dec 8, 2023
1632400
undo commit of tts tut
blisc Dec 8, 2023
1a85963
unpin apex in Docker; and fix Docker for our usecases
blisc Dec 12, 2023
db0756b
finalized some more logging
blisc Dec 12, 2023
6e0d89f
fix loading issue
blisc Dec 13, 2023
982fc85
Speechllm 2312 paarth (#12)
paarthneekhara Dec 15, 2023
0ee68e2
some final inference fixes for custom number of codebooks
paarthneekhara Dec 15, 2023
212420c
set dataloader seed randomly
paarthneekhara Dec 16, 2023
cde896d
some changes for multilingual model
paarthneekhara Dec 19, 2023
8da385f
bug
paarthneekhara Dec 19, 2023
d90acb9
bug fix
paarthneekhara Dec 19, 2023
d673a27
bring back cfg.seed
paarthneekhara Dec 20, 2023
007e8e4
allow delay pattern in context, allow context to by just text (for sp…
paarthneekhara Dec 21, 2023
0568a71
ctc loss
paarthneekhara Dec 27, 2023
86d748d
remove log softmax since it is being handled
paarthneekhara Dec 27, 2023
f276b9c
always return attention prob (for forward sum loss)
paarthneekhara Dec 27, 2023
899f0fd
added alignment loss scale
paarthneekhara Dec 28, 2023
9bd60dd
phoneme/tts eer logging and continuous eval script
paarthneekhara Dec 29, 2023
21db991
eval script update
paarthneekhara Dec 30, 2023
19f403f
some final inference fixes for custom number of codebooks (#14)
paarthneekhara Jan 2, 2024
e382a1b
merge with latest code
blisc Jan 2, 2024
3316149
nemo audio codec related changes in the new branch
shehzeen Jan 3, 2024
57e7103
Merge pull request #21 from shehzeen/speechllm_2312_nemocodec
paarthneekhara Jan 4, 2024
2c5522b
update for phones
blisc Jan 4, 2024
5786c1a
add ctc loss
blisc Jan 5, 2024
d6c697c
remove print
blisc Jan 5, 2024
2429ca7
remove gradients
blisc Jan 8, 2024
36605c3
run encoder once
blisc Jan 8, 2024
973f611
finish caching kv for both self and cross
blisc Jan 9, 2024
a39931a
fix positional embedding
blisc Jan 9, 2024
ef58ce4
switch to multilingual ASR setup
blisc Jan 9, 2024
8658c4a
working inference
blisc Jan 11, 2024
6a27793
Speechllm 2312 phones paarth (#16)
paarthneekhara Jan 24, 2024
4c854a1
add some changes for new langs
blisc Jan 26, 2024
3295186
remove check in getittem
blisc Jan 26, 2024
4c8ba20
add g2p update to infer config
blisc Jan 30, 2024
a108972
add tokenizer script
blisc Jan 31, 2024
41d1b18
Speechllm 2312 phones paarth feb24 (#17)
paarthneekhara Feb 7, 2024
be0f30a
add saving context and change some statements
blisc Feb 7, 2024
c563dba
Merge branch 'speechllm_2312_phones' of github.com:blisc/NeMo into sp…
blisc Feb 7, 2024
6efe75d
merge with main
blisc Feb 7, 2024
cb81db4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2024
4b00f67
the easy clean up bits
blisc Feb 7, 2024
dca4a84
move confs
blisc Feb 8, 2024
dc72731
move examples
blisc Feb 8, 2024
17c1d20
move datasets
blisc Feb 8, 2024
ad9978b
remove changes to codec; remove gpt pretraining parts; cleanup asr+co…
blisc Feb 9, 2024
df9ab97
initial cleanup of gpt inside nlp/models
blisc Feb 9, 2024
51af97f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 9, 2024
c6426c7
initial cleanup of t5 in nlp/models
blisc Feb 9, 2024
462488d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 9, 2024
13dc877
bubble up vocab_parallel_cross_entropy changes
blisc Feb 13, 2024
9a70acc
bubble up get_ltor_masks_and_position_ids changes
blisc Feb 13, 2024
42d27de
undo changes in build_position_id
blisc Feb 13, 2024
8152e92
undo changes in build_position_id
blisc Feb 13, 2024
a49c02b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 13, 2024
235c65c
general cleanup of nlp sections
blisc Feb 14, 2024
eddec89
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 14, 2024
4680ba3
changes before merge
blisc Feb 14, 2024
f7b07dd
Merge remote-tracking branch 'nvidia/main' into speechllm_tts_2402
blisc Feb 14, 2024
f509996
clean up yaml configs; working t5 inference script
blisc Feb 14, 2024
eb23f17
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 14, 2024
aa751fe
remove gpt model from PR
blisc Feb 14, 2024
f291b58
final cleanup
blisc Feb 15, 2024
119d46a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 15, 2024
fe4c0e9
remove some files
blisc Feb 16, 2024
d24b924
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 16, 2024
37b4831
add examples folder
blisc Mar 7, 2024
9201716
add more missing files
blisc Mar 7, 2024
d20705f
update with latest code
blisc Mar 20, 2024
f8a4f51
add interpolator option and bug fixes for EN models
blisc Mar 21, 2024
9e75654
allow for different attention setups in decoder
blisc Apr 2, 2024
1945810
update data paths
blisc Apr 5, 2024
53027fd
update inference; add from scratch yamls
blisc Apr 19, 2024
c4ce85c
fix inference
blisc May 6, 2024
2d838eb
added autoregressive inference to validation. tested changes. trainin…
shehzeen May 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions examples/tts/speechllm/conf/megatron_t5_speechllm_inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
name: megatron_t5_speechllm_tts_inference
checkpoint_path: ???

trainer:
devices: 1
accelerator: gpu
num_nodes: 1
precision: 32
logger: False
enable_checkpointing: False
use_distributed_sampler: False
max_epochs: 10000
max_steps: -1
log_every_n_steps: 10
val_check_interval: null
check_val_every_n_epoch: 3
gradient_clip_val: 1.0

exp_manager:
exp_dir: null
name: ${name}
create_wandb_logger: False
resume_if_exists: False
resume_ignore_no_checkpoint: True
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val_loss
save_top_k: 2
mode: min
save_nemo_on_train_end: False # Should be false, correct prompt learning model file is saved at model.nemo_path set below
filename: "megatron_t5_speechllm_tts--{${exp_manager.checkpoint_callback_params.monitor}:.3f}-{step}"
model_parallel_size: ${model.tensor_model_parallel_size}
save_best_model: True
create_early_stopping_callback: False
early_stopping_callback_params:
monitor: "val_loss"
mode: "min"
min_delta: 0.001
patience: 10
verbose: True

model:
seed: 1234
nemo_path: ${name}.nemo # .nemo filename/absolute path to where the virtual prompt model parameters will be saved
virtual_prompt_style: "p-tuning" # one of 'prompt-tuning', 'p-tuning', or 'inference'
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
global_batch_size: 16
micro_batch_size: 16 # micro batch size should equal global batch size when pipeline parallel = 1
validation_global_batch_size: ${model.global_batch_size}
validation_micro_batch_size: ${model.micro_batch_size}
validation_drop_last: False
report_validation_metric: False
validation_metric: accuracy
num_speech_tokens: 10112 # Vocabulary size pertaining to speech
seq_pattern: "parallel" # parallel, delay_parallel, flatten
temperature: 0.7 # Temperature to be used for inference
top_k: 80 # Top k to be used for inference
max_inference_timesteps: 1000 # Maximum number of timesteps to run inference for

restore_path: null # Path to an existing p-tuned/prompt tuned .nemo model you wish to add new tasks to or run inference with
language_model_path: ??? # Path to the pretrained T5 language model .nemo file, always required
save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training.
existing_tasks: []
new_tasks: ["squad"]

task_templates:
- taskname: "squad"
prompt_template: "<|VIRTUAL_PROMPT_0|> {context} {question} {answer}"
total_virtual_tokens: 3
virtual_token_splits: [3]
truncate_field: context
answer_field: answer

p_tuning: # P-tuning specific params
encoder_type: "mlp" # Either "mlp" or "lstm", mlp is default
num_layers: 2 # 2 recommended for MLP, 1 recommended for LSTM, must be at least 2 for mlp
dropout: 0.0

prompt_tuning: # Prompt tunin specific params
new_prompt_init_methods: ['text'] # List of 'text' or 'random', should correspond to tasks listed in new tasks
new_prompt_init_text: ['some init text goes here'] # some init text if init method is text, or None if init method is random

data:
grapheme_prefix: null
train_ds: null
validation_ds: null
test_ds: ???
max_seq_length: 1536
sample_rate: 24000
add_eos: true
add_bos: false
decoder_starts_with_pad: False
add_eos_to_decoder_output: True
add_sentinel_to_input: True
ul2_prompt_token: null # <extra_id_s>, <extra_id_r>, <extra_id_x>
shuffle: true
num_workers: 4
pin_memory: true
speech_offset: 30000
train_task: asr
sup_data_path: None
g2p:
english:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt"
heteronyms: "scripts/tts_dataset_files/heteronyms-052722"
phoneme_probability: 0.8
ignore_ambiguous_words: False
use_chars: True
use_stresses: True
grapheme_prefix: ${model.data.grapheme_prefix}
spanish:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/es_ES/es_ES_nv230301.dict"
phoneme_probability: 0.8
use_chars: True
use_stresses: True
ignore_ambiguous_words: False
grapheme_prefix: ${model.data.grapheme_prefix}
locale: "es-ES"
mandarin:
_target_: nemo.collections.tts.g2p.models.zh_cn_pinyin.ChineseG2p
phoneme_dict: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt"
word_segmenter: "jieba"
phoneme_prefix: ""
phoneme_case: "lower"
tone_prefix: "#"
ascii_letter_prefix: ${model.data.grapheme_prefix}
ascii_letter_case: "upper"
german:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/de/de_nv230119.dict"
heteronyms: "scripts/tts_dataset_files/de/de_nv230119.heteronym"
phoneme_probability: 0.8
ignore_ambiguous_words: False
use_chars: True
use_stresses: True
grapheme_case: mixed
grapheme_prefix: ${model.data.grapheme_prefix}
locale: "de-DE"

optim:
name: fused_adam
lr: 5e-5
weight_decay: 0.01
betas:
- 0.9
- 0.98
221 changes: 221 additions & 0 deletions examples/tts/speechllm/conf/megatron_t5_speechllm_inference_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
name: megatron_t5_speechllm_tts_inference
checkpoint_path: ???

trainer:
devices: 1
accelerator: gpu
num_nodes: 1
precision: 32
logger: False
enable_checkpointing: False
use_distributed_sampler: False
max_epochs: 10000
max_steps: -1
log_every_n_steps: 10
val_check_interval: null
check_val_every_n_epoch: 3
gradient_clip_val: 1.0

exp_manager:
exp_dir: null
name: ${name}
create_wandb_logger: False
resume_if_exists: False
resume_ignore_no_checkpoint: True
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val_loss
save_top_k: 2
mode: min
save_nemo_on_train_end: False # Should be false, correct prompt learning model file is saved at model.nemo_path set below
filename: "megatron_t5_speechllm_tts--{${exp_manager.checkpoint_callback_params.monitor}:.3f}-{step}"
model_parallel_size: ${model.tensor_model_parallel_size}
save_best_model: True
create_early_stopping_callback: False
early_stopping_callback_params:
monitor: "val_loss"
mode: "min"
min_delta: 0.001
patience: 10
verbose: True

model:
seed: 1234
nemo_path: ${name}.nemo # .nemo filename/absolute path to where the virtual prompt model parameters will be saved
virtual_prompt_style: "p-tuning" # one of 'prompt-tuning', 'p-tuning', or 'inference'
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
global_batch_size: 16
micro_batch_size: 16 # micro batch size should equal global batch size when pipeline parallel = 1
validation_global_batch_size: ${model.global_batch_size}
validation_micro_batch_size: ${model.micro_batch_size}
validation_drop_last: False
report_validation_metric: False
validation_metric: accuracy
num_speech_tokens: 10112 # Vocabulary size pertaining to speech
seq_pattern: "parallel" # parallel, delay_parallel, flatten
speech_head_type: "linear" # token_level, linear
cross_entropy_type: "vocab_parallel" # regular, vocab_parallel
temperature: 0.7 # Temperature to be used for inference
top_k: 80 # Top k to be used for inference
max_inference_timesteps: 1000 # Maximum number of timesteps to run inference for

restore_path: null # Path to an existing p-tuned/prompt tuned .nemo model you wish to add new tasks to or run inference with
save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training.
existing_tasks: []
new_tasks: ["squad"]
codecmodel_type: nemo_codec
codecmodel_path: ???
english_only_model: true
context_conditioning: decoder
train_from_scratch: true
override_tokenizer_vocab_file: ???
use_flash_attention: true
lm_vocab_size: 30000

frozen_model:
# micro_batch_size: null
# global_batch_size: null
# megatron_amp_O2: true
# seq_length: 512
# max_position_embeddings: 512
# precision: bf16
# Above is overridden in code
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: 0
make_vocab_size_divisible_by: 128
pre_process: true
post_process: true
gradient_as_bucket_view: true
native_amp_init_scale: 4294967296
native_amp_growth_interval: 1000
fp16_lm_cross_entropy: false
seed: 1234
use_cpu_initialization: false
apex_transformer_log_level: 30
tokenizer:
library: megatron
type: BertWordPieceCase
model: null
vocab_file: null
merge_file: null
# num_sentinel_tokens: 100
optim:
name: null
data:
dataset_type: t5
encoder:
arch: transformer
bias_activation_fusion: false
use_flash_attention: ${model.use_flash_attention}
num_layers: 12
hidden_size: 768
ffn_hidden_size: 2048
num_attention_heads: 12
init_method_std: 0.015
hidden_dropout: 0.1
attention_dropout: 0.1
kv_channels: 64
activation: geglu
decoder:
arch: transformer
bias_activation_fusion: false
use_flash_attention: ${model.use_flash_attention}
num_layers: 12
hidden_size: 768
ffn_hidden_size: 2048
num_attention_heads: 12
init_method_std: 0.015
hidden_dropout: 0.1
attention_dropout: 0.1
kv_channels: 64
activation: geglu

task_templates:
- taskname: "squad"
prompt_template: "<|VIRTUAL_PROMPT_0|> {context} {question} {answer}"
total_virtual_tokens: 3
virtual_token_splits: [3]
truncate_field: context
answer_field: answer

p_tuning: # P-tuning specific params
encoder_type: "mlp" # Either "mlp" or "lstm", mlp is default
num_layers: 2 # 2 recommended for MLP, 1 recommended for LSTM, must be at least 2 for mlp
dropout: 0.0

prompt_tuning: # Prompt tunin specific params
new_prompt_init_methods: ['text'] # List of 'text' or 'random', should correspond to tasks listed in new tasks
new_prompt_init_text: ['some init text goes here'] # some init text if init method is text, or None if init method is random

data:
grapheme_prefix: null
train_ds: null
validation_ds: null
test_ds: ???
max_seq_length: 1536
sample_rate: 24000
add_eos: true
add_bos: false
decoder_starts_with_pad: False
add_eos_to_decoder_output: True
add_sentinel_to_input: True
ul2_prompt_token: null # <extra_id_s>, <extra_id_r>, <extra_id_x>
shuffle: true
num_workers: 4
pin_memory: true
speech_offset: 30000
train_task: asr
sup_data_path: None
num_speech_codebooks: 8
codebook_fps: 86
context_duration_min: 2.9
context_duration_max: 2.9
g2p:
english:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt"
heteronyms: "scripts/tts_dataset_files/heteronyms-052722"
phoneme_probability: 0.8
ignore_ambiguous_words: False
use_chars: True
use_stresses: True
grapheme_prefix: ${model.data.grapheme_prefix}
spanish:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/es_ES/es_ES_nv230301.dict"
phoneme_probability: 0.8
use_chars: True
use_stresses: True
ignore_ambiguous_words: False
grapheme_prefix: ${model.data.grapheme_prefix}
locale: "es-ES"
mandarin:
_target_: nemo.collections.tts.g2p.models.zh_cn_pinyin.ChineseG2p
phoneme_dict: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt"
word_segmenter: "jieba"
phoneme_prefix: ""
phoneme_case: "lower"
tone_prefix: "#"
ascii_letter_prefix: ${model.data.grapheme_prefix}
ascii_letter_case: "upper"
german:
_target_: nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p
phoneme_dict: "scripts/tts_dataset_files/de/de_nv230119.dict"
heteronyms: "scripts/tts_dataset_files/de/de_nv230119.heteronym"
phoneme_probability: 0.8
ignore_ambiguous_words: False
use_chars: True
use_stresses: True
grapheme_case: mixed
grapheme_prefix: ${model.data.grapheme_prefix}
locale: "de-DE"

optim:
name: fused_adam
lr: 5e-5
weight_decay: 0.01
betas:
- 0.9
- 0.98
Loading
Loading