Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
GENIA Sentence Splitter
Fetching latest commit…
Cannot retrieve the latest commit at this time.
SS MaxEnt¤òÍÑ¤¤¤¿Sentence Splitter * How to use 1) make 2) ./geniass arg1 arg2 arg1 is a target file to split. arg2 is an output file name. You need to run geniass in the directory which has EventExtracter.rb, Classifying2Splitting.rb, model1-1.0. If you want to get stand-off format file, please run 3) ruby sentence2standOff.rb arg1 arg2 arg3 arg1 and arg2 are same with 2). arg3 is an output stand-off file name. ------------ SS MaxEnt This is a simple C++ class library for maximum entropy classifiers. If you are familiar with C++ and STL, you will easily understand how to use the library by having a look at the sample code. The main features of this library are: - fast parameter estimation using the BLMVM algorithm (Benson and More, 2001) - smoothing with Gausian prior (Chen and Rosenfeld, 1999) - modelling with inequality constraints (Kazama and Tsujii, 2003) - saving/loading the model to/from a file - can integrate the model data into your source code. * How to use 1) make - if you encounter errors with hash, try commenting out #define USE_HASH_MAP in "maxent.h". 2) ./a.out 3) see sample.cpp and maxent.h * Tips 1) If you have many samples for training, use a portion of the data as held-out data to see if overfitting is happening or not. ex.) model.set_heldout(1000); 2) If you see overfitting, try one of the followings: - feature cut-off ex.) model.train(3); - Gausian prior ex.) model.train(0, 1000, 0); - inequality constrains ex.) model.train(0, 0, 1.0); * I like the third one because it produces a compact model and gives equally good performance with gausian prior. 3) If you want to integrate the generated model file into your code, see model2c.cpp. * References  Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of Maximum Entropy Models with Inequality Constraints, In the Proceedings of EMNLP 2003, pp. 137-144.  Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901 http://www-unix.mcs.anl.gov/~benson/blmvm/  Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer Science Department, Carnegie Mellon University, 1999. * History 2005 Jul. 8 version 1.2.2 - initial public release 2005 Sep. 13 version 1.3 - requires less memory in training 2005 Sep. 13 version 1.3.1 - update README 2005 Oct. 28 version 1.3.2 - fix for overflow (thanks to Ming Li) ------------------------------------------------------------------------- Yoshimasa Tsuruoka (email@example.com)