-
Notifications
You must be signed in to change notification settings - Fork 1
a fork of Chong Wang's Supervised Latent Dirichlet Allocation
License
danstowell/slda
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
********************************************************** SUPERVISED LATENT DIRICHLET ALLOCATION FOR CLASSIFICATION ********************************************************** (C) Copyright 2009, Chong Wang, David Blei and Li Fei-Fei written by Chong Wang, chongw@cs.princeton.edu, part of code is from http://www.cs.princeton.edu/~blei/lda-c/index.html. This file is part of slda. slda is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. slda is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ------------------------------------------------------------------------ This is a C++ implementation of supervised latent Dirichlet allocation (sLDA) for classification. Note that this code requires the Gnu Scientific Library, http://www.gnu.org/software/gsl/ ------------------------------------------------------------------------ TABLE OF CONTENTS A. COMPILING B. ESTIMATION C. INFERENCE ------------------------------------------------------------------------ A. COMPILING Type "make" in a shell. Make sure the GSL is installed. ------------------------------------------------------------------------ B. ESTIMATION Estimate the model by executing: slda [est] [data_path] [label_path] [settings_path] [alpha] [k] ["seeded"/"random"/model_path] [out_path] For example, this is a specific run of the program, included to illustrate ./slda est ../data/train-data.dat ../data/train-label.dat settings.txt 1 20 random tmp The saved models are in two files: <iteration>.model is the model saved in the binary format, which is easy and fast to use for inference. <iteration>.model.txt is the model saved in the text format, which is convenient for printing topics or analysis using python. The variational posterior Dirichlets are in: <iteration>.gamma Data format (1) [data] is a file where each line is of the form: [M] [term_1]:[count] [term_2]:[count] ... [term_N]:[count] where [M] is the number of unique terms in the document, and the [count] associated with each term is how many times that term appeared in the document. (2) [label] is a file where each line is the corresponding label for [data]. The labels must be 0, 1, ..., C-1, if we have C classes. ------------------------------------------------------------------------ C. INFERENCE To perform inference on a different set of data (in the same format as for estimation), execute: slda [inf] [data] [label] [settings] [model] [directory] where [model] is the binary file from the estimation. The predictive labels are in: inf-labels.dat The variational posterior Dirichlets are in: inf-gamma.dat
About
a fork of Chong Wang's Supervised Latent Dirichlet Allocation
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published