This repository has been archived by the owner on Dec 3, 2018. It is now read-only.
semisupervised naive bayes
matpalm/semisupervised_naive_bayes
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
an experiment in semi supervised naive bayes text classification a walk through of the project is available at http://matpalm.com/semi_supervised_naive_bayes in general run things up with bash> bzcat perez.bz2 thereg.bz2 | shuf | head -300 | ./shorten_urls.rb | ./main.rb git tag v1_diy_fractions nominal naive bayes implementation with diy rational arithmetic fails due to numerical overflow git tag v2_fractions_using_rational rewrite using ruby's native Rational object fails for same reason as v1 git tag v3_multinominal_rewrite rewrite using multinominal naive bayes and explicit bucketing of articles into a class rather than retained distributions (quick test showed that the unlabelled articles almost ALWAYS followed a distribution along the lines of 0.99/0.01 anyways (??)) TODO convergence of unlabelled set is always in a single iteration; this rings some warning bells for me. even though it works something isn't right; perhaps should introduce unlabelled values incrementally?