LibShortText: A Library for Short-text Classification and Analysis, Migrated in Pure C++.
- Only for predicting part
- LibShortText (Python) model files adapted
First make sure that you have CMake and an C++ compiler environment installed.
Then open a terminal, go to the source directory and type the following commands:
$ mkdir build
$ cd build
$ cmake ..
$ make
Convert the model file from LibShortText (Python) into the one we use in cpp-libshorttext.
# test/stub/train_file.model => test/stub/train_file.model_converted
$ python model_converter.py test/stub/train_file.model
$ tree test/stub/train_file.model_converted
test/stub/train_file.model_converted
├── class_map.txt
├── feat_gen.txt
├── liblinear_model
├── options.txt
└── text_prep.txt
#include "libshorttext.hpp"
using namespace libshorttext;
int main()
{
// init LibShortText
string model_path = "../../test/stub/train_file.model_converted";
lst_load_model(model_path);
// init LibLinear
liblinear::ll_load_model(model_path + "/liblinear_model");
// ************
// predict
string text = "multicolor inlay sterling silver post earrings jewelry";
char sep = ' ';
vector<string> tokens = lst_text2tok(text, sep);
predict_label = lst_predict(tokens);
// ************
// free allocatd memory
liblinear::ll_destroy_model();
}
After building this project you may run its unit tests by using these commands:
$ make test # To run all tests via CTest
$ make catch # Run all tests directly, showing more details to you
Download LibShortText zip file, and cd demo
directory.
Execute the following commands, and you will obtain the benchmark data.
python ../text-train.py -P 0 -G 1 -F 1 -N 0 -L 3 -f train_file
python ../text-predict.py -f test_file train_file.model predict_result
- Ignoring extra file in model file:
converter/extra_file_ids.pickle
andconverter/extra_nr_feats.pickle
- Ignoring
-P
,-G
options in LibShortText, i.e., use unigram & bigram. - Ignoring IDF feature
- TextGrocery for better understanding of the logic LibShortText
- PicklingTools not work, so I have to convert the model file by
model_converter.py
This program is Free Software: You can use, study share and improve it at your will. Specifically you can redistribute and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.