Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

We explore the performance of a phoneme-based text generation model. Character based models have a limited amount of potential inputs and as such require high computation costs to model long term dependencies. Word-based models are accurate and require less computational costs, but in contrast to character-based, have an overwhelming input size with tens of thousands possible unique words. A phoneme-based attempts to bridge this gap by offering a greater amount of unique inputs as compared to the character-based but substantially less than a word-based model. We evaluate the performance of this phoneme-based model against a character and word based using BLEU, ROUGE, and human based metrics.

Final project for LIGN 167 Deep Learning for Natural Language Understanding, UCSD.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.idea		.idea
Corpus Comp		Corpus Comp
comparison_metrics		comparison_metrics
corpus_composition_tool		corpus_composition_tool
corpus_composition_tool_V2		corpus_composition_tool_V2
language_model		language_model
models		models
training data		training data
utils		utils
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
Arpagen Poster.png		Arpagen Poster.png
LICENSE		LICENSE
LIGN 167 Final Project Writeup.pdf		LIGN 167 Final Project Writeup.pdf
README.md		README.md
environment.yaml		environment.yaml

License

camille-004/arpagen

Folders and files

Latest commit

History

Repository files navigation

Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages