-
Notifications
You must be signed in to change notification settings - Fork 125
Implement proper Laplace smoothing in Bayes classifier #64
Copy link
Copy link
Closed
Labels
area: bayesBayes classifierBayes classifierbugSomething isn't workingSomething isn't workingpriority: highHigh priorityHigh priority
Milestone
Description
Summary
The current smoothing in the Bayesian classifier uses ad-hoc magic numbers instead of proper probabilistic smoothing.
Problem Location
File: lib/classifier/bayes.rb:75
s = category_words.key?(word) ? category_words[word] : 0.1 # Magic numberWhy This Matters
- The hardcoded
0.1violates proper probabilistic foundations - Smoothing factor should be proportional to vocabulary size (standard Laplace smoothing uses α=1)
- Classification accuracy degrades on sparse vocabularies
- Can produce pathological results when vocabulary size varies significantly
Proposed Fix
Implement proper add-k (Laplace) smoothing:
def classifications(text)
vocab_size = @categories.values.flat_map(&:keys).uniq.size
alpha = 1.0 # Laplace smoothing parameter
@categories.each do |category, category_words|
total = @category_word_count[category] + (alpha * vocab_size)
word_hash.each_key do |word|
# P(word|category) = (count + α) / (total + α * vocab_size)
s = (category_words[word] || 0) + alpha
score[category.to_s] += Math.log(s / total)
end
end
endBenefits
- Mathematically sound probabilistic model
- Better accuracy on small training sets
- Configurable smoothing parameter for tuning
Impact
Severity: High - affects classification correctness
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: bayesBayes classifierBayes classifierbugSomething isn't workingSomething isn't workingpriority: highHigh priorityHigh priority