Unsupervised Audio-to-Audio Translation

PLEASE NOTE: This project has now transformed into the work being done here. Please head over there for current status.

Unsupervised Audio-to-Audio Translation

This is the main repository for the Implementing and extending unsupervised human-human text/audio translation project.

How does this fit into the ESP roadmap towards translating animal communication? Unsupervised audio-to-audio translation requires learning how to create useful semantic embeddings directly from audio, which allows for coorelation with other behavioral models or comparison across species.

Project overview

Goal: achieving unsupervised audio to audio translation

Build text embeddings and demonstrate translation without rosetta stone
- Good opportunity to test and demonstrate embedding alignment (this technique is what we will want to leverage once we obtain audio embeddings).
- Findings can lend themselves well to clarifying our approach and also to sharing the efficacy of embeddings with a broader public
Implement Audio Word2Vec - train acoustic embeddings using a denoising AE architecture
- Good opportunity to get acquainted with the LibriSpeech dataset
- These embeddings could lend themselves well to comparison against semantic embeddings
Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech - train semantic embeddings using an RNN encoder-decoder architecture
- This architecture, or one of similar capability, is one we want to leverage for unsupervised audio to audio translation
Reproduce #3 with a transformer architecture (tentative)
- Transformers offer an ease of training, they can be trained efficiently on vast amounts of data
Obtain or synthesize a bilingual dataset of speech, train semantic word embeddings (using #3 or #4) and perform unsupervised translation (using #1).

For an overview of papers and other resources that inspire us and that we feel are instrumental to this work, please take a look at our bookshelf for this project here.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
data		data
old_experiments_take1		old_experiments_take1
old_experiments_take2		old_experiments_take2
.gitignore		.gitignore
01_text_embeddings.ipynb		01_text_embeddings.ipynb
01a_text_embeddings_balanced_epochs.ipynb		01a_text_embeddings_balanced_epochs.ipynb
01b_text_embeddings_balanced_epochs_better_vocab.ipynb		01b_text_embeddings_balanced_epochs_better_vocab.ipynb
01c_text_embeddings_dot_product.ipynb		01c_text_embeddings_dot_product.ipynb
01d_text_embeddings_spacy.ipynb		01d_text_embeddings_spacy.ipynb
01e_text_embeddings_spacy_wikitext103.ipynb		01e_text_embeddings_spacy_wikitext103.ipynb
01f_text_embeddings_spacy_only_wikitext103.ipynb		01f_text_embeddings_spacy_only_wikitext103.ipynb
01g_text_embeddings_spacy_only_wikitext103_all.ipynb		01g_text_embeddings_spacy_only_wikitext103_all.ipynb
02_rnn_spacy_wikitext103.ipynb		02_rnn_spacy_wikitext103.ipynb
02_simple_rnn.ipynb		02_simple_rnn.ipynb
02a_rnn_spacy_wikitext103_spec.ipynb		02a_rnn_spacy_wikitext103_spec.ipynb
03_transformers_chars_wikitext103.ipynb		03_transformers_chars_wikitext103.ipynb
03a_transformers_words_spacy_wikitext103.ipynb		03a_transformers_words_spacy_wikitext103.ipynb
03aa_transformers_words_spacy_wikitext103_longer_training.ipynb		03aa_transformers_words_spacy_wikitext103_longer_training.ipynb
03aaa_transformers_words_spacy_wikitext103_larger_vocab.ipynb		03aaa_transformers_words_spacy_wikitext103_larger_vocab.ipynb
03b_transformers_words_spacy_wikitext103_audio.ipynb		03b_transformers_words_spacy_wikitext103_audio.ipynb
README.md		README.md
bookshelf.md		bookshelf.md
utils.py		utils.py

earthspecies/audio-embeddings

Folders and files

Latest commit

History

Repository files navigation

PLEASE NOTE: This project has now transformed into the work being done here. Please head over there for current status.

Unsupervised Audio-to-Audio Translation

Project overview

About

Resources

Stars

Watchers

Forks

Languages