Skip to content

Sat Training Config

AI-TOOLKIT edited this page Feb 17, 2018 · 1 revision

SAT Model Training Configuration Options (tri3b.conf, tri3c.conf)

This configuration file is passed to the TrainSat() function in VoiceBridge. All possible parameters in this configuration file together with their default values are documented hereunder. In most of the cases the default values will be ok and you will not need to use and change these parameters.

scale-opts : Scale options for gmm-align-compiled.

type: string, default: --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1,
usage example: --scale-opts=--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1

NOTE: The scale-opts parameter contains 3 other parameters. Take a look at the usage example and:

  1. Notice that there is no space between the equal signs and the properties!
  2. Notice that after the first equal sign there is no " or ' and that the properties are listed with a space between them on one line without a line break!

context-opts : Accumulate statistics for phonetic-context tree building options (AccTreeStats). E.g. use '--context-width=5 --central-position=2' for quinphone.

type: string, default: ,
usage example: --context-opts=--context-width=5 --central-position=2

Possible options:

--var-floor, Variance floor for tree clustering, type: BaseFloat, default: 0.01
--context-width, Context window size, type: int, default: 3
--central-position, Central context-window position (zero-based), type: int, default: 1

NOTE: The context-opts parameter may contain several other parameters. Take a look at the usage example and:

  1. Notice that there is no space between the equal signs and the properties!
  2. Notice that after the first equal sign there is no " or ' and that the properties are listed with a space between them on one line without a line break!

tree-stats-opts : Accumulate statistics for phonetic-context tree building options.

type: string, default: see below,
usage example: --tree-stats-opts=--context-width=3 --central-position=1

Possible options:

--var-floor, Variance floor for tree clustering, type: double, default: 0.01
--context-width, Context window size, type: int, default: 3
--central-position, Central context-window position (zero-based), type: int, default: 1

NOTE: The tree-stats-opts parameter may contain several other parameters. Take a look at the usage example and:

  1. Notice that there is no space between the equal signs and the properties!
  2. Notice that after the first equal sign there is no " or ' and that the properties are listed with a space between them on one line without a line break!

cluster-phones-opts : Cluster phones (or sets of phones) into sets for various purposes options.

type: string, default: see below,
usage example: --cluster-phones-opts=--central-position=1 --pdf-class-list=1

Possible options:

--central-position, Central context-window position (zero-based)[must match acc-tree-stats], type: int, default: 1
--pdf-class-list, Colon-separated list of HMM positions to consider [Default = 1: just central position for 3-state models], type: string, default: 1

NOTE: The cluster-phones-opts parameter may contain several other parameters. Take a look at the usage example and:

  1. Notice that there is no space between the equal signs and the properties!
  2. Notice that after the first equal sign there is no " or ' and that the properties are listed with a space between them on one line without a line break!

compile-questions-opts : Compile questions options.

type: string, default: see below,
usage example: --compile-questions-opts=--context-width=3 --central-position=1

Possible options:

--context-width, Context window size, type: int, default: 3
--central-position, Central context-window position (zero-based), type: int, default: 1
--num-iters-refine, Number of iters of refining questions at each node.  0 -> questions not refined, type: int, default: 0

NOTE: The compile-questions-opts parameter may contain several other parameters. Take a look at the usage example and:

  1. Notice that there is no space between the equal signs and the properties!
  2. Notice that after the first equal sign there is no " or ' and that the properties are listed with a space between them on one line without a line break!

num-iters : Number of iterations of training.

type: int, default: 35, usage example: --num-iters=35

max-iter-inc : Last iteration to increase #Gauss on.

type: int, default: 25, usage example: --max-iter-inc=25

totgauss : Target #Gaussians.

type: int, default: -, usage example: --totgauss=15000

boost-silence : Factor by which to boost silence likelihoods in alignment.

type: double, default: -, usage example: --boost-silence=1.0

realign-iters : Realign data in these iteration steps.

type: string, default: 10 20 30, usage example: --realign-iters=10 20 30

fmllr-iters : Estimating FMLLR in these iteration steps.

type: string, default: 2 4 6 12, usage example: --mllt-iters=2 4 6 12

stage : Start from this stage. Can be used to skip some of the steps which have been done already before.

type: int, default: -5, usage example: --stage=-5

NOTE: The stage parameter is a value <= 0. This parameter should not be used in production ready systems but only in testing phase. Make sure that you remove this parameter from the config file after you do not need it! This parameter can also be set in the source code.

power : Exponent to determine number of gaussians from occurrence counts.

type: double, default: 0.2, usage example: --power=0.2

beam : Decoding beam used in alignment.

type: int, default: 10, usage example: --beam=10

retry-beam : Decoding beam for second try at alignment.

type: int, default: 40, usage example: --retry-beam=40

careful : If true, do 'careful' alignment, which is better at detecting alignment failure (involves loop to start of decoding graph).

type: bool, default: false, usage example: --careful=false

cluster-thresh : For build-tree control final bottom-up clustering of leaves.

type: int, default: -1, usage example: --cluster-thresh=-1

silence-weight : Weight on silence in fMLLR estimation.

type: double, default: 0.0, usage example: --silence-weight=0.0

fmllr-update-type : Estimate global fMLLR transforms options. Possible options: full, diag, offset, none.

type: string, default: full, usage example: --fmllr-update-type=full

NOTE: this information is based on Kaldi (http://kaldi-asr.org).

You may also visit the VoiceBridge official website for more info: VoiceBridge website.