Skip to content

A GAN (Generative Adversarial Network) can currently create a very good speech vocoder using linked voice and text data. This voice and text data exists license free on the web. This means the knowledge, data and tools exist to create a license free Speech Engine and to enable further work.

Notifications You must be signed in to change notification settings

JWJPaton/GAN-Speech_engine_and_research_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

GAN-Speech_engine_and_research_tools

A GAN (Generative Adversarial Network) can currently create a very good speech vocoder using linked voice and text data. This voice and text data exists license-free on the web. This means the knowledge, data and tools exist to create a license free Speech Engine and to enable further work.

To do

  • Investigate the feasibility of using Tensorflow.js for the GAN
  • Investigate the feasibility of using Web Audio API for audio input and output

Datasets

The "LJ Speech Dataset"

Librivox for other voices and prosody learning

+Any Voice data that is now out of copyright (Ethically this should be used for prosody learning rather than anything that would enable mimicking of the voice. They have not consented to have their voice stolen).

Research possibilities

  • Basic speech recognition tool
  • Text/Speech synchronisation tool (SMIL generator?)
  • Prosody generator based on:
    • grammar guessing from common words (the, at, in, a, and, or ...)
    • position in document (start/end of paragraph, start/end of document)
    • common rhetorical patterns (repetions of words or phrases, alliteration etc)
  • Emphasis recognition/generation for better natural language voice interfaces.
  • What is the effect of using IPA-phoneme generation rules for different languages? Does that produce accents? (anecdotal evidence says yes)
  • Accent generator/guesser (Does it help speech understanding/worldview for children to hear many accents? Should speech generators assign random (gentle) accents?)
  • Noise reduction
    1. Apply speech recognition on audio,
    2. generate probabilities of frequencies expecting for that text,
    3. attenuate frequencies that don't meet a probability threshold

About

A GAN (Generative Adversarial Network) can currently create a very good speech vocoder using linked voice and text data. This voice and text data exists license free on the web. This means the knowledge, data and tools exist to create a license free Speech Engine and to enable further work.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published