Skip to content

Commit

Permalink
Merge branch 'main' into replicate
Browse files Browse the repository at this point in the history
  • Loading branch information
yidong72 committed Jan 27, 2023
2 parents a6a529b + e1b3f5e commit d8a53a1
Show file tree
Hide file tree
Showing 130 changed files with 159,451 additions and 862 deletions.
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ FROM ${BASE_IMAGE} as nemo-deps
# Ensure apt-get won't prompt for selecting options
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
libsndfile1 sox \
libfreetype6 \
Expand Down
78 changes: 1 addition & 77 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -2546,82 +2546,6 @@ pipeline {
}
}

stage('L2: NMT with HuggingFace') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
parallel {
stage('L2: NMT Pretrained HF Encoder') {
steps {
sh 'cd examples/nlp/machine_translation && \
python enc_dec_nmt.py \
--config-path=conf \
--config-name=huggingface \
model.shared_tokenizer=False \
model.encoder_tokenizer.library=huggingface \
model.encoder.library=huggingface \
model.encoder.model_name=distilbert-base-cased \
model.encoder.pretrained=true \
model.train_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.train_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref \
model.validation_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.validation_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.test_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.test_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.train_ds.tokens_in_batch=128 \
model.validation_ds.tokens_in_batch=128 \
model.test_ds.tokens_in_batch=128 \
model.decoder_tokenizer.tokenizer_model=/home/TestData/nlp/nmt/toy_data/tt_tokenizer.BPE.4096.model \
model.decoder.hidden_size=768 \
model.decoder.inner_size=256 \
trainer.devices=[0] \
trainer.accelerator="gpu" \
+trainer.fast_dev_run=true \
exp_manager=null \
'
}
}

stage('L2: NMT Custom HF Encoder') {
steps {
sh 'cd examples/nlp/machine_translation && \
python enc_dec_nmt.py \
--config-path=conf \
--config-name=huggingface \
model.shared_tokenizer=True \
model.encoder_tokenizer.library=yttm \
model.encoder_tokenizer.tokenizer_model=/home/TestData/nlp/nmt/toy_data/tt_tokenizer.BPE.4096.model \
model.encoder.library=huggingface \
model.encoder.model_name=null \
model.encoder.pretrained=false \
+model.encoder._target_=transformers.BertConfig \
+model.encoder.hidden_size=48 \
model.train_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.train_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref \
model.validation_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.validation_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.test_ds.src_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.test_ds.tgt_file_name=/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src \
model.train_ds.tokens_in_batch=128 \
model.validation_ds.tokens_in_batch=128 \
model.test_ds.tokens_in_batch=128 \
model.decoder_tokenizer.tokenizer_model=/home/TestData/nlp/nmt/toy_data/tt_tokenizer.BPE.4096.model \
model.decoder.hidden_size=48 \
model.decoder.inner_size=256 \
trainer.devices=[1] \
trainer.accelerator="gpu" \
+trainer.fast_dev_run=true \
exp_manager=null \
'
}
}
}
}


stage('L2: NMT Tarred Dataset Creation') {
when {
Expand Down Expand Up @@ -4585,4 +4509,4 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
cleanWs()
}
}
}
}
10 changes: 9 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Key Features
* `Language Modelling for ASR <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html>`_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer
* Streaming and Buffered ASR (CTC/Transducer) - `Chunked Inference Examples <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_chunked_inference>`_
* `Support of long audios for Conformer with memory efficient local attention <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html#inference-on-long-audio>`_
* `Speech Classification and Speech Command Recognition <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html>`_: MatchboxNet (Command Recognition)
* `Speech Classification, Speech Command Recognition and Language Identification <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html>`_: MatchboxNet (Command Recognition), AmberNet (LangID)
* `Voice activity Detection (VAD) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/models.html#marblenet-vad>`_: MarbleNet
* ASR with VAD Inference - `Example <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_vad>`_
* `Speaker Recognition <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_recognition/intro.html>`_: TitaNet, ECAPA_TDNN, SpeakerNet
Expand Down Expand Up @@ -251,6 +251,14 @@ NeMo Text Processing, specifically (Inverse) Text Normalization, requires `Pynin
Docker containers:
~~~~~~~~~~~~~~~~~~
We release NeMo containers alongside NeMo releases. For example, NeMo ``r1.14.0`` comes with container ``nemo:22.11``, you may find more details about released containers in `releases page <https://github.com/NVIDIA/NeMo/releases>`_.

To use built container, please run

.. code-block:: bash
docker pull nvcr.io/nvidia/nemo:22.11
To build a nemo container with Dockerfile from a branch, please run

.. code-block:: bash
Expand Down
Loading

0 comments on commit d8a53a1

Please sign in to comment.