Skip to content

BushMinusZero/deep-learning-skunk-works

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning Skunk Works Projects

Contents

  1. Implementation of Word2Vec using Continuous Bag of Words (CBOW)
  2. Intrinsic evaluation metric using analogy labels from "Efficient Estimation of Word Representations in Vector Space"
  3. Implementation of Word2Vec using Skip-gram
  4. TSNE for visualizing embeddings of analogy pairs
  5. Nearest Neighbors analysis for finding similar words

Next up

  1. TODO: filter to the N most common words in the training corpus and mark the rest as OOV
  2. TODO: download a larger dataset (GloVe paper uses Gigaword5, Wikipedia2014, and Common Crawl)
  3. TODO: Train GloVe embeddings
  4. TODO: increase the size of the context vector to 300 depending on training speed
  5. TODO: Evaluate on word similarity task WordSim-353 used in GloVe paper
  6. TODO: Extrinsic model evaluation (NER)
  7. TODO: Write unit tests for model training and inference on small data

Setup

  • Developed using Python 3.9 but probably works on Python 3 version
cd deep-learning-skunk-works/
export PYTHONPATH=`pwd`
export PROJECT_ROOT=`pwd`
pip install -r requirements.txt

Training models

  • Set model name (e.g. cbow, skipgram, ...)
    export MODEL='cbow'
  • Launch tensorboard
    tensorboard --logdir=data/$MODEL/models/
  • Train model
    python src/main.py --train --model $MODEL
  • Evaluate model
    python src/main.py --eval --model $MODEL