GitHub - dickchanym/tfkld: Fast scalable implementation of Term Frequency Kullback Leibler Divergence

Fast and scalable implemtation of Term Frequencey Kullback Leibler Divergence (TFKLD)

This reimplementation is based on https://github.com/jiyfeng/tfkld aimed to drastically speeds up weight calculation.TFKLD was propsed in this 2013 EMNLP paper.

Also available is the test script (fe_quora.py) to extract TFKLD features of the Quora dataset hosted on kaggle as part of a competition.

Download the dataset from here.
Extract the zip file and place it in the same directory as that of tflkd.py and fe_quora.py
Execute fe_quora.py.
It should take some time and after it finishes you should have train-tfkld-dr.pkl, dev-tfkld-dr.pkl and test-tfkld-dr.pkl pickle files corresponding to the test, development and test data corrspondingly.
TFKLD features are reduced to 200 dimensions using SVD.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
fe_quora.py		fe_quora.py
tfkld.py		tfkld.py