Skip to content

Codebase for Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation.

License

Notifications You must be signed in to change notification settings

camille-004/arpagen

Repository files navigation

Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

We explore the performance of a phoneme-based text generation model. Character based models have a limited amount of potential inputs and as such require high computation costs to model long term dependencies. Word-based models are accurate and require less computational costs, but in contrast to character-based, have an overwhelming input size with tens of thousands possible unique words. A phoneme-based attempts to bridge this gap by offering a greater amount of unique inputs as compared to the character-based but substantially less than a word-based model. We evaluate the performance of this phoneme-based model against a character and word based using BLEU, ROUGE, and human based metrics.

Final project for LIGN 167 Deep Learning for Natural Language Understanding, UCSD.

About

Codebase for Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •