Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 15 additions & 36 deletions examples/deepspeech2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,19 @@ References: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)

```yaml
model_config:
conv_conf:
conv_type: 2
conv_kernels: [[11, 41], [11, 21], [11, 11]]
conv_strides: [[2, 2], [1, 2], [1, 2]]
conv_filters: [32, 32, 96]
conv_dropout: 0
rnn_conf:
rnn_layers: 5
rnn_type: lstm
rnn_units: 512
rnn_bidirectional: True
rnn_rowconv: False
rnn_dropout: 0
fc_conf:
fc_units: [1024]
fc_dropout: 0
conv_type: conv2d
conv_kernels: [[11, 41], [11, 21], [11, 11]]
conv_strides: [[2, 2], [1, 2], [1, 2]]
conv_filters: [32, 32, 96]
conv_dropout: 0.1
rnn_nlayers: 5
rnn_type: lstm
rnn_units: 512
rnn_bidirectional: True
rnn_rowconv: 0
rnn_dropout: 0.1
fc_nlayers: 0
fc_units: 1024
```

## Architecture
Expand All @@ -30,24 +27,6 @@ model_config:

## Training and Testing

See `python examples/deepspeech2/run_ds2.py --help`
See `python examples/deepspeech2/train_ds2.py --help`

## Results on VIVOS Dataset

* Features: Spectrogram with `80` frequency channels
* KenLM: `alpha = 2.0` and `beta = 1.0`
* Epochs: `20`
* Train set split ratio: `90:10`
* Augmentation: `None`
* Model architecture: same as [vivos.yaml](./configs/vivos.yml)

**CTC Loss**

<img src="./figs/ds2_vivos_ctc_loss.svg" alt="ds2_vivos_ctc_loss" width="300px" />

**Error rates**

| | WER (%) | CER (%) |
| :-------------- | :------------: | :------------: |
| *BeamSearch* | 43.75243 | 17.991581 |
| *BeamSearch LM* | **20.7561836** | **11.0304441** |
See `python examples/deepspeech2/test_ds2.py --help`
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ speech_config:
normalize_per_feature: False

decoder_config:
vocabulary: /mnt/Projects/asrk16/TiramisuASR/vocabularies/vietnamese.txt
vocabulary: ./vocabularies/vietnamese.characters
blank_at_zero: False
beam_width: 500
lm_config:
Expand All @@ -33,21 +33,20 @@ decoder_config:
beta: 1.0

model_config:
conv_conf:
conv_type: 2
conv_kernels: [[11, 41], [11, 21], [11, 11]]
conv_strides: [[2, 2], [1, 2], [1, 2]]
conv_filters: [32, 32, 96]
conv_dropout: 0
rnn_conf:
rnn_layers: 5
rnn_type: lstm
rnn_units: 512
rnn_bidirectional: True
rnn_rowconv: False
rnn_dropout: 0
fc_conf:
fc_units: null
name: deepspeech2
conv_type: conv2d
conv_kernels: [[11, 41], [11, 21], [11, 11]]
conv_strides: [[2, 2], [1, 2], [1, 2]]
conv_filters: [32, 32, 96]
conv_dropout: 0.1
rnn_nlayers: 5
rnn_type: lstm
rnn_units: 512
rnn_bidirectional: True
rnn_rowconv: 0
rnn_dropout: 0.1
fc_nlayers: 0
fc_units: 1024

learning_config:
augmentations: null
Expand Down
1 change: 0 additions & 1 deletion examples/deepspeech2/figs/ds2_vivos_ctc_loss.svg

This file was deleted.

148 changes: 0 additions & 148 deletions examples/deepspeech2/model.py

This file was deleted.

11 changes: 4 additions & 7 deletions examples/deepspeech2/test_ds2.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
setup_environment()
import tensorflow as tf

DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "configs", "vivos.yml")
DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "config.yml")

tf.keras.backend.clear_session()

Expand Down Expand Up @@ -54,7 +54,7 @@
from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
from tensorflow_asr.featurizers.text_featurizers import CharFeaturizer
from tensorflow_asr.runners.base_runners import BaseTester
from model import DeepSpeech2
from tensorflow_asr.models.deepspeech2 import DeepSpeech2

tf.random.set_seed(0)
assert args.export
Expand All @@ -63,13 +63,10 @@
speech_featurizer = TFSpeechFeaturizer(config["speech_config"])
text_featurizer = CharFeaturizer(config["decoder_config"])
# Build DS2 model
ds2_model = DeepSpeech2(input_shape=speech_featurizer.shape,
arch_config=config["model_config"],
num_classes=text_featurizer.num_classes,
name="deepspeech2")
ds2_model = DeepSpeech2(**config["model_config"], vocabulary_size=text_featurizer.num_classes)
ds2_model._build(speech_featurizer.shape)
ds2_model.load_weights(args.saved, by_name=True)
ds2_model.summary(line_length=150)
ds2_model.summary(line_length=120)
ds2_model.add_featurizers(speech_featurizer, text_featurizer)

if args.tfrecords:
Expand Down
11 changes: 4 additions & 7 deletions examples/deepspeech2/train_ds2.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
setup_environment()
import tensorflow as tf

DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "configs", "vivos.yml")
DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "config.yml")

tf.keras.backend.clear_session()

Expand Down Expand Up @@ -60,7 +60,7 @@
from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
from tensorflow_asr.featurizers.text_featurizers import CharFeaturizer
from tensorflow_asr.runners.ctc_runners import CTCTrainer
from model import DeepSpeech2
from tensorflow_asr.models.deepspeech2 import DeepSpeech2

config = UserConfig(DEFAULT_YAML, args.config, learning=True)
speech_featurizer = TFSpeechFeaturizer(config["speech_config"])
Expand Down Expand Up @@ -100,12 +100,9 @@
ctc_trainer = CTCTrainer(text_featurizer, config["learning_config"]["running_config"])
# Build DS2 model
with ctc_trainer.strategy.scope():
ds2_model = DeepSpeech2(input_shape=speech_featurizer.shape,
arch_config=config["model_config"],
num_classes=text_featurizer.num_classes,
name="deepspeech2")
ds2_model = DeepSpeech2(**config["model_config"], vocabulary_size=text_featurizer.num_classes)
ds2_model._build(speech_featurizer.shape)
ds2_model.summary(line_length=150)
ds2_model.summary(line_length=120)
# Compile
ctc_trainer.compile(ds2_model, config["learning_config"]["optimizer_config"],
max_to_keep=args.max_ckpts)
Expand Down
20 changes: 20 additions & 0 deletions examples/jasper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Jasper

References: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)

## Model YAML Config Structure

```yaml
model_config:

```

## Architecture

<img src="./figs/jasper_arch.png" alt="jasper_arch" width="500px" />

## Training and Testing

See `python examples/jasper/train_jasper.py --help`

See `python examples/jasper/test_jasper.py --help`
Loading