Skip to content

Aa-Aanegola/visually-indicated-sounds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visually Indicated Sounds

Implementation and extension of the paper, Visually Indicated Sounds by Andrew et al. which proposes the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene.

Brief Description of the paper

The authors present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. The authors show that the sounds predicted by their model are realistic enough to fool participants in a “real or fake” psychophysical experiment and that they convey significant information about material properties and physical interactions.

Implementations

All implementation code can be found inside /src/experiments, where each experiment is contained within its own directory.

  • PaperModel: The model architecture used in the paper.
  • BiLSTMModel: A modification to the architecture described in the paper, by replacing the LSTM with a Bidirectional LSTM.
  • VMAEModel: Using modern transformer based architecture for Feature Extraction.
  • LatentVMAEModel: Switching out Cochleagrams with a Learned Latent Space Representation of the waves through an AutoEncoder, fed into the VMAEModel.