Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.
/ crossitlearn Public archive

Cross-situational word learning from raw images and speech

License

Notifications You must be signed in to change notification settings

bootphon/crossitlearn

Repository files navigation

Datasets

Pascal1K
Flickr8K (ImagesData)

Code

TODO: one line on the code to transform LUCID into our dataset.

TODO: one line on the code to extract R-CNN features from images.

To train the multi-modal embedding net, do:

THEANO_FLAGS="device=gpu1" python run_exp_AB.py --dataset-path=/fhgfs/bootphon/scratch/gsynnaeve/learning_semantics2014/pascal_full/ --prefix-output-fname="maxnorm" --debug-print=1 --debug-time

Results

TODO: scores && plots

"Say"-based corpus

(On Mac OS X only) Use bash say_words.sh with a words.txt file in the same directory. Then you can sox the produced *.aif to *.wav files (you may need to for loop), then transform them to filterbanks with python mfsc.py *.wav (for too) and finally put them in one big *.npz formatted dictionary with python npz_fbanks.py FOLDER.
Finally use python simple_dnn.py to train a DNN if wanted (look into it).

About

Cross-situational word learning from raw images and speech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published