Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #4 - no prior used in Bayes Classifier #5

Merged
merged 1 commit into from Dec 31, 2013

Conversation

bmuller
Copy link
Contributor

@bmuller bmuller commented Dec 8, 2010

Without taking the prior probabilities for each class under consideration, you have a classifier that uses only likelihood (more specifically, log likelihood) rather than a classifier that uses Bayes theorem.

@dabble
Copy link
Contributor

dabble commented Oct 8, 2011

Thank you.

I spent a while reviewing http://en.wikipedia.org/wiki/Naive_Bayes_classifier
to better understand this change.

My feedback would be to update your change to replace spaces with tabs as
the original file is mostly tabbed. [Note: I don't have rights to this repository.]

The only downside I see to accepting this change is if someone called train_
multiple times per document instead of 1 time per document.

@bmuller
Copy link
Contributor Author

bmuller commented Oct 9, 2011

Calling train more than once per document would result in bias even for a classifier just using log likelihoods.

Right now, the classifier uses only the likelihood to estimate the posterior - which means it's not a naive bayes classifier at all. The prior must be taken into account.

Since it looks like this rather egregious bug is not going to be fixed - I suggest using https://github.com/livingsocial/ankusa

@dabble
Copy link
Contributor

dabble commented Oct 9, 2011

Actually, I was thinking of calling train_ once each for title, author, body of a document.
Since without this fix, it only counted words, this would not affect the outcome.

The only reason I had for doing it (and I didn't) was to avoid concatenating strings only
to break them apart again.

As you point out, without this, this isn't really naive bayes. (I'm not sure counting words
multiple times per document is really naive bayes either...)

I hadn't run across the ankusa classifier. Thanks.

cardmagic pushed a commit that referenced this pull request Dec 31, 2013
Fix for #4 - no prior used in Bayes Classifier
@cardmagic cardmagic merged commit 8eefec5 into cardmagic:master Dec 31, 2013
@cardmagic
Copy link
Owner

I am sorry it took so long to merge this in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants