Bayesian Naive Bayes (BNB)

This is an implementation of unsupervised Bayesian Naive Bayes with Gibbs sampling. Although it is efficient enough to be applied to real data it should not be viewed as a stable tool.

Background

This type of unsupervised model was first used in

Ted Pedersen. 1997. Knowledge lean word sense disambiguation. In Proceedings of AAAI’97/IAAI’97.

For a simple introduction to Gibbs sampling methods please refer to

Philip Resnik and Eric Hardisty. 2010. Gibbs sampling for the uninitiated. Technical report, University of Maryland.

Requirements

Numpy needs to be installed. The code was tested with Python 2.7.3 and Numpy 1.6.1.

Data

The data needs to be converted into the C-LDA format (http://www.cs.princeton.edu/~blei/lda-c/). We supply a very small toy dataset that is hopefully self-explanatory. Basically, each word is replaced by an integer. This leads to two files:

A .dat file where each word and its frequency are listed. This is similar to the SVNlight format, only that the first entry in each line is the number of tokens in total.
A .vocab file that contains the words in order of their index. This means that there are no unassigned indexes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
toy		toy
README.md		README.md
corpus.py		corpus.py
nb_gibbs.py		nb_gibbs.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toy

toy

README.md

README.md

corpus.py

corpus.py

nb_gibbs.py

nb_gibbs.py

util.py

util.py

Repository files navigation

Bayesian Naive Bayes (BNB)

Background

Requirements

Data

About

Releases

Packages

christianscheible/BNB

Folders and files

Latest commit

History

Repository files navigation

Bayesian Naive Bayes (BNB)

Background

Requirements

Data

About

Resources

Stars

Watchers

Forks