Skip to content

Guangyi-Z/cpp-libshorttext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpp-libshorttext

LibShortText: A Library for Short-text Classification and Analysis, Migrated in Pure C++.

  • Only for predicting part
  • LibShortText (Python) model files adapted

Building From Source

First make sure that you have CMake and an C++ compiler environment installed.

Then open a terminal, go to the source directory and type the following commands:

$ mkdir build
$ cd build
$ cmake ..
$ make

Usage

Model Converter

Convert the model file from LibShortText (Python) into the one we use in cpp-libshorttext.

# test/stub/train_file.model => test/stub/train_file.model_converted
$ python model_converter.py test/stub/train_file.model

$ tree test/stub/train_file.model_converted
test/stub/train_file.model_converted
├── class_map.txt
├── feat_gen.txt
├── liblinear_model
├── options.txt
└── text_prep.txt

Predicting

#include "libshorttext.hpp"

using namespace libshorttext;

int main()
{
    // init LibShortText
    string model_path = "../../test/stub/train_file.model_converted";
    lst_load_model(model_path);

    // init LibLinear
    liblinear::ll_load_model(model_path + "/liblinear_model");

    // ************
    // predict
    string text = "multicolor inlay sterling silver post earrings jewelry";
    char sep = ' ';
    vector<string> tokens = lst_text2tok(text, sep);
    predict_label = lst_predict(tokens);
    // ************

    // free allocatd memory
    liblinear::ll_destroy_model();
}

Running unit tests

After building this project you may run its unit tests by using these commands:

$ make test  # To run all tests via CTest
$ make catch # Run all tests directly, showing more details to you

About testing stub

Download LibShortText zip file, and cd demo directory. Execute the following commands, and you will obtain the benchmark data.

python ../text-train.py -P 0 -G 1 -F 1 -N 0 -L 3 -f train_file
python ../text-predict.py -f test_file train_file.model predict_result

Denpendency

What is missing

  • Ignoring extra file in model file: converter/extra_file_ids.pickle and converter/extra_nr_feats.pickle
  • Ignoring -P, -G options in LibShortText, i.e., use unigram & bigram.
  • Ignoring IDF feature

Trial and error

License

GNU GPLv3 Image

This program is Free Software: You can use, study share and improve it at your will. Specifically you can redistribute and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

About

LibShortText: A Library for Short-text Classification and Analysis, in Pure C++

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages