Skip to content

ckling/mgtm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

***************************
Geographical Topic Model Using multi-Dirichlet process mixtures
***************************

(C) Copyright 2012, Christoph Carl Kling

Based on "Knoceans" by Gregor Heinrich Gregor Heinrich (gregor :: arbylon : net)
and JGibbsLDA by Xuan-Hieu Phan and Cam-Tu Nguyen (ncamtu :: gmail : com)
published under GNU GPL.

Tartarus Snowball stemmer by Martin Porter and Richard Boulton published under 
BSD License (see http://www.opensource.org/licenses/bsd-license.html ), with Copyright 
(c) 2001, Dr Martin Porter, and (for the Java developments) Copyright (c) 2002, 
Richard Boulton. 

Java Delaunay Triangulation (JDT) by boaz88 :: gmail : com published under Apache License 2.0 
(http://www.apache.org/licenses/LICENSE-2.0)

MGTM is free software; you can redistribute it and/or modify it 
under the terms of the GNU General Public License as published by the Free 
Software Foundation; either version 3 of the License, or (at your option) 
any later version.

MGTM is distributed in the hope that it will be useful, but WITHOUT 
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS 
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA

***************************
Notes
***************************

This is the implementation of MGTM, a geographical topic model using multi-Dirichlet processes.

The source code of MGTM is found in /sourcecode/nhdp3/ 

The topic sampler is found in Estimator.java
The parameter samplers in RandomSamplers.java
The model options in Model.java
The MisesFisher clustering and the Delaunay triangulation call in MF_Delaunay.java

A list of variables used in the model is given in variables.html

Example call of MGTM for car dataset (available on request from MGTM[at]c-kling.de):

java -Xmx3000M -jar MGTD.jar -dir ./example/ -dfile car.txt -est -L 500 -beta 0.5 -gamma 1.0 -alpha0 1.0 -Alpha 1.0 -sampleHyper true -gammaa 1.0 -gammab 0.1 -alpha0a 1.0 -alpha0b 0.1 -Alphaa 0.1 -Alphab 0.1 -delta 10.0 -savestep 5 -twords 20 -niters 200

dir is the directory of the dataset. The output is stored in this directory.
L gives the number of geographical regions (clusters) for the initial clustering
gamma, beta, alpha0, Alpha, delta are the initial parameters for the Dirichlet distributions. 
Alphaa, Alphab and the corresponding parameters for the other parameters are Gamma-distributed hyper-parameters for the Dirichlet parameters.
The number of topics is inferred.

Data format:
The first line gives the number of documents in the file.
Every following line corresponds to a document, using the format: 
latitude longitude word1 word2 ... 

Example file format for three documents:

3
56.3 6.4 this is a test
46.2 5.2 words are separated by spaces
65.3 12.3 that is all you need

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published