Skip to content
forked from smujjiga/tfkld

Fast scalable implementation of Term Frequency Kullback Leibler Divergence

Notifications You must be signed in to change notification settings

dickchanym/tfkld

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Fast and scalable implemtation of Term Frequencey Kullback Leibler Divergence (TFKLD)

This reimplementation is based on https://github.com/jiyfeng/tfkld aimed to drastically speeds up weight calculation.TFKLD was propsed in this 2013 EMNLP paper.

Also available is the test script (fe_quora.py) to extract TFKLD features of the Quora dataset hosted on kaggle as part of a competition.

Steps to run the test script

  • Download the dataset from here.
  • Extract the zip file and place it in the same directory as that of tflkd.py and fe_quora.py
  • Execute fe_quora.py.
  • It should take some time and after it finishes you should have train-tfkld-dr.pkl, dev-tfkld-dr.pkl and test-tfkld-dr.pkl pickle files corresponding to the test, development and test data corrspondingly.
  • TFKLD features are reduced to 200 dimensions using SVD.

About

Fast scalable implementation of Term Frequency Kullback Leibler Divergence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%