Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第4章_朴素贝叶斯 - ApacheCN #427

Closed
jiangzhonglian opened this issue Aug 24, 2018 · 3 comments
Closed

第4章_朴素贝叶斯 - ApacheCN #427

jiangzhonglian opened this issue Aug 24, 2018 · 3 comments

Comments

@jiangzhonglian
Copy link
Member

http://ailearning.apachecn.org/ml/4.NaiveBayesian/

ApacheCN 专注于优秀项目维护的开源组织

@woaiios
Copy link

woaiios commented May 16, 2019

我们现在用 p1(x,y) 表示数据点 (x,y) 属于类别 1(图中用圆点表示的类别)的概率,用 p2(x,y) 表示数据点 (x,y) 属于类别 2(图中三角形表示的类别)的概率,那么对于一个新数据点 (x,y),可以用下面的规则来判断它的类别:

如果 p1(x,y) > p2(x,y) ,那么类别为1
如果 p2(x,y) > p1(x,y) ,那么类别为2

这是不是写错了?

如果 newPoint(x,y) > p2(x,y) ,那么类别为1
如果 newPoint(x,y) > p1(x,y) ,那么类别为2

@jiangzhonglian
Copy link
Member Author

jiangzhonglian commented May 20, 2019

没写错。是计算每个点在2个分类中的概率,谁大就属于谁

@ksufer
Copy link

ksufer commented Sep 4, 2019

第一个例子的 spamTest() 函数中最后使用了词集模型来统计

for docIndex in testSet:
wordVector = setOfWords2Vec(vocabList, docList[docIndex])

但是计算概率的时候的分母是不是用了词袋模型的分母呢,把所有词出现的次数都加起来了

for i in range(numTrainDocs):
if trainCategory[i] == 1:
# 累加辱骂词的频次
p1Num += trainMatrix[i]
# 对每篇文章的辱骂的频次 进行统计汇总
p1Denom += sum(trainMatrix[i])
else:
p0Num += trainMatrix[i]
p0Denom += sum(trainMatrix[i])

如果是词集模型的话,分母不应该是 p1Denom += 1 和 p0Denom += 1 吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants