Demo

https://jeremycchsu.github.io/vc-vawgan/

Hyper-parameters and other Experimental Settings

z_dim: 64 the dimenstion of the phonetic content space
y_dim: 10 the number of speakers
merge_dim: 171 the dimension of speaker space
clamping: 0.01 the K-Lipschitz scalar

For the CNN architecture, please see the architecture.json file.

Mean Opinion Scores (MOS)

Scales:
5: Excellent
4: Good
3: Fair
2: poor
1: bad

In the intra-gender experiment (SF1 to TF2), the evaluators have access to two extra audio files for reference (per pair): a GMM baseline whose mean MOS was 1.53 and the true target whose mean MOS was 4.72.

Note:

The error bar in Fig. 2 indicates standard deviation of the sample, not confidence interval of the mean.
The ANOVA tests on the MOS scores returned very small p-values, so we used the word significant in our paper. (intra-gender: VAW-GAN against VAE; inter-gender: VAW-GAN against VAE)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md
architecture.json		architecture.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

.gitignore

.gitignore

README.md

README.md

architecture.json

architecture.json

Repository files navigation

Demo

Hyper-parameters and other Experimental Settings

Mean Opinion Scores (MOS)

About

Releases

Packages

BenJamesbabala/vc-vawgan

Folders and files

Latest commit

History

Repository files navigation

Demo

Hyper-parameters and other Experimental Settings

Mean Opinion Scores (MOS)

About

Resources

Stars

Watchers

Forks