Skip to content

earthspecies/audio-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PLEASE NOTE: This project has now transformed into the work being done here. Please head over there for current status.

Unsupervised Audio-to-Audio Translation

This is the main repository for the Implementing and extending unsupervised human-human text/audio translation project.

How does this fit into the ESP roadmap towards translating animal communication? Unsupervised audio-to-audio translation requires learning how to create useful semantic embeddings directly from audio, which allows for coorelation with other behavioral models or comparison across species.

Project overview

Goal: achieving unsupervised audio to audio translation

  1. Build text embeddings and demonstrate translation without rosetta stone
    • Good opportunity to test and demonstrate embedding alignment (this technique is what we will want to leverage once we obtain audio embeddings).
    • Findings can lend themselves well to clarifying our approach and also to sharing the efficacy of embeddings with a broader public
  2. Implement Audio Word2Vec - train acoustic embeddings using a denoising AE architecture
    • Good opportunity to get acquainted with the LibriSpeech dataset
    • These embeddings could lend themselves well to comparison against semantic embeddings
  3. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech - train semantic embeddings using an RNN encoder-decoder architecture
    • This architecture, or one of similar capability, is one we want to leverage for unsupervised audio to audio translation
  4. Reproduce #3 with a transformer architecture (tentative)
    • Transformers offer an ease of training, they can be trained efficiently on vast amounts of data
  5. Obtain or synthesize a bilingual dataset of speech, train semantic word embeddings (using #3 or #4) and perform unsupervised translation (using #1).

For an overview of papers and other resources that inspire us and that we feel are instrumental to this work, please take a look at our bookshelf for this project here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •