This reimplementation is based on https://github.com/jiyfeng/tfkld aimed to drastically speeds up weight calculation.TFKLD was propsed in this 2013 EMNLP paper.
Also available is the test script (fe_quora.py) to extract TFKLD features of the Quora dataset hosted on kaggle as part of a competition.
- Download the dataset from here.
- Extract the zip file and place it in the same directory as that of tflkd.py and fe_quora.py
- Execute fe_quora.py.
- It should take some time and after it finishes you should have train-tfkld-dr.pkl, dev-tfkld-dr.pkl and test-tfkld-dr.pkl pickle files corresponding to the test, development and test data corrspondingly.
- TFKLD features are reduced to 200 dimensions using SVD.