#gibbslda
This is a java version of gibbs lda implement by David2 Dai. You can use or modify it as you wish.
The jar files are already included in the lib folder, so you can use them without recompile.
If you want to make some modification of the code and need recopile the project, you should install maven first and then execute the build.sh script in the project root directory. The jars in the folder ./lib will be fresh.
There is a script called gibbslda in the project root folder. In the linux terminal, you can type the following cmd to use it.
gibbslda -est [-ntopics <int>] [-alpha <float>] [-beta <float>] [-niters <int>] [-file <string>]
gibbslda -inf [-niters <int>] [-model <string>] [-file <string>]
-est: do estimate/train for a data file.
-inf: do inference for a new unseen data file.
-ntopics: indicate the topics number.
-niters: indicate the iterate times.
-alpha: hyper-parameter alpha.
-beta: hyper-parameter beta.
-file: indicate the data file for est/inf.
-model: indicate the trained model for the inference of unseen data.
Both data for training/estimating the model and new data (i.e., previously unseen data) have the same format as follows.
[M]
[document_1]
[document_2]
...
[document_M]
[word]:[z] [word]:[z]
...
[word]:[z] [word]:[z]
[theta_1_0] [theta_1_1] ... [theta_1_K-1]
...
[theta_M_0] [theta_M_1] ... [theta_M_K-1]
[phi_0_0] [phi_0_1] ... [phi_0_V]
...
[phi_K-1_0] [phi_K-1_1] ... [phi_K-1_V]
[Topic0]
...