naive_bayes

This is a python implementation of Naive Bayes Classifier. This implementation is now supporting:

	- Multivariate Bernoulli Model
	- Multinomial Model
	- Transformed Weight-normalized Complement Naive Bayes Model (TWCNB)

If you want to know more about TWCNB, see the following paper:

	Tackling the Poor Assumptions of Naive Bayes Text Classifiers,
	Rennie+, ICML'03
	http://machinelearning.wustl.edu/mlpapers/paper_files/icml2003_RennieSTK03.pdf

Finally, you can use some feature selection methods as follows:

feature selection using word frequency (FS_FREQ)
feature selection using PMI (FS_PMI)

Requirements

pip -r requirements.txt

Install

cd ./calc_score
python setup.py build_ext --inplace

Usage

Training

from .naive_bayes import (
	MultivariateBernoulli as MB
	Multinomial as MNMAL
	TWCNB
)
from .const import (
	NBParam,
	LEARNING_MLE,
	LEARNING_MAP,
	FS_NO,
	FS_FREQ,
	FS_PMI
)

# hyper parameters
SNUM = 100  # # of stopwords
VSIZE = 10000  # # of vocabulary
alpha = 2  # hyper parameter for dirichlet distribution for MAP

param = NBParam(
	LEARNING_MAP,  # choose MLE or MAP
	FS_FREQ,  # choose method for feature selection (NO, FREQ or PMI)
	SNUM,
	VSIZE,
	alpha
)

# init model (MB, MNMAL or TWCNB)
model = TWCNB(param, use_cython=True)

# get training data
"""
The format for training data is following:
	{class1: [[word_111, word112, ..., word_cij, ..., word_11N], ...]
	 class2: [[word_211, word212, ..., word_cij, ..., word_21N], ...]
	 ...
	}
where word_cij is j-th word for document i of class c.

If you want to use MB, bag of words for docs should be unique.
"""
train = make_data()

# training
model.learn(train)
model.save("/path/to/save")

Testing

from .naive_bayes import (
	MultivariateBernoulli as MB
	Multinomial as MNMAL
	TWCNB
)

# load model (MB, MNMAL or TWCNB)
model = TWCNB.load("/path/to/load")

# get test data
"""
The format for test data is follwing:
	[
	 [word_11, word12, ..., word_ij, ..., word_1N],
	 [word_21, word22, ..., word_ij, ..., word_2N],
	 ...
	]
where word_ij is j-th word for test document j.
"""
test = make_data()

# testing
labels = model.predict(test)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

naive_bayes

Requirements

Install

Usage

Training

Testing

About

Releases

Packages

Languages

License

AkihikoWatanabe/naive_bayes

Folders and files

Latest commit

History

Repository files navigation

naive_bayes

Requirements

Install

Usage

Training

Testing

About

Resources

License

Stars

Watchers

Forks

Languages