## 计算Bayes

请先阅读<http://blog.genesino.com/2016/09/bayes/>。

代码为学习《Think Bayes》的记录，为了方便理解，我对代码做了解构，力求从0开始，逐步实现整个类的构建。

### 定义一个新的类 Bayes

这个类实际上是一个字典，以hypo-prob为键值对。对象的初始化参数可以是一个含所有假设的列表，如`['H1','H2','H3]`, 初始化之后，所有的假设的起始概率相等； 另外也可以用一个字典或其它Bayes类初始化，如`{'H1':1, 'H2':5, 'H3':4}`。

同时内建`Set`函数用于设置单个假设的概率值。

In [34]:
from __future__ import division, unicode_literals

import logging

class Bayes(object):
    """A bayes class, mainly a dictionary"""
    def __init__(self, hypos=None, name=''):
        """
        Initialize the distribution.
        
        hypos: sequence of hypotheses
        """
        self.name = name
        self.pmf = {}
        if hypos is None:
            return
        
        # Initiate the class object
        # Three initalize methods are used to deal with different types of input
        # 
        init_methods = [
            self.InitPmf,
            self.InitMapping,  #A dict
            self.InitSequence, #equal probability for all hypos
            self.InitFailure,
        ]
        
        for method in init_methods:
            try:
                method(hypos)
                break
            except AttributeError:
                continue
        
        if len(self):
            self.Normalize()
    
    def __str__(self):
        '''
        Stringlize self.pmf
        '''
        tmpL = ["Probability table"]
        for hypo, prob in sorted(self.pmf.iteritems()):
            tmpL.append('\t'.join([str(hypo), str(prob)]))
        return '\n'.join(tmpL)
    
    def InitSequence(self, hypos):
        """
        Initialize with a sequence of hypos with equal probabilities.
        
        hypos: ['H1','H2','H3',...]
        """
        for hypo in hypos:
            self.Set(hypo, 1)
    
    def InitMapping(self, hypos):
        """
        Initialize with a map from value to probablity (a dict).
        
        hypos = {'H1':1, 'H2':5, 'H3':4}
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitPmf(self, hypos):
        """
        Initialize with a Bayes object.
        
        hypos = Bayes()
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitFailure(self, hypos):
        """Raise an errot"""
        raise ValueError("None of the initialization methods works.")
    
    def __len__(self):
        return len(self.pmf)
    
    def Set(self, hypo, prob=0):
        """
        Set hypo-prob pair
        """
        self.pmf[hypo] = prob
    
    def Print(self):
        """Print the values and freqs in asending order."""
        for hypo, prob in sorted(self.pmf.iteritems()):
            print hypo, prob
    
    def Normalize(self):
        """
        Normalize probability
        """
        total = float(sum(self.pmf.values()))
        if total == 0.0:
            raise ValueError('total probability is zero.')
            logging.warning('Normalize: total probability is zero.')
            return total
        
        factor = 1 / total
        
        for hypo in self.pmf:
            self.pmf[hypo] *= factor
            

### 初始化Bayes类

#### 使用假设列表初始化Bayes类

假如有一个质地均匀的筛子，其初始概率初始化如下：

In [35]:
hypos = [1,2,3,4,5,6]

die = Bayes(hypos)

#die.Print()
print die

Probability table
1	0.166666666667
2	0.166666666667
3	0.166666666667
4	0.166666666667
5	0.166666666667
6	0.166666666667


In [36]:
hypos = [1,2,3,4]

die2 = Bayes()

for hypo in hypos:
    die2.Set(hypo, 1)

print "Before normalize"
die2.Print()

die2.Normalize()
print "\nAfter normalize"
print die2

Before normalize
1 1
2 1
3 1
4 1

After normalize
Probability table
1	0.25
2	0.25
3	0.25
4	0.25


#### 使用假设-概率字典初始化Bayes类

假如有一个质地不均匀的筛子，其初始概率初始化如下：

In [37]:
hypos = {1:0.1,2:0.2,3:0.3,4:0.4}
die3 = Bayes(hypos)
print die3

Probability table
1	0.1
2	0.2
3	0.3
4	0.4


In [38]:
hypos = {1:1,2:2,3:3,4:4}
die3 = Bayes(hypos)
print die3

Probability table
1	0.1
2	0.2
3	0.3
4	0.4


### 利用Bayes类解决我们在<http://blog.genesino.com/2016/09/bayes/>中提到的问题

在开始之前，我们需要在Bayes类中再定义一个方法`Mult`，用于计算先验概率与似然值的乘机以获得后验概率（其实是与后验概率成正比的数值）。

除此之外再定义一个Prob函数，用于提取某个假设的概率。

In [44]:
from __future__ import division, unicode_literals

import logging

class Bayes(object):
    """A bayes class, mainly a dictionary"""
    def __init__(self, hypos=None, name=''):
        """
        Initialize the distribution.
        
        hypos: sequence of hypotheses
        """
        self.name = name
        self.pmf = {}
        if hypos is None:
            return
        
        # Initiate the class object
        # Three initalize methods are used to deal with different types of input
        # 
        init_methods = [
            self.InitPmf,
            self.InitMapping,  #A dict
            self.InitSequence, #equal probability for all hypos
            self.InitFailure,
        ]
        
        for method in init_methods:
            try:
                method(hypos)
                break
            except AttributeError:
                continue
        
        if len(self):
            self.Normalize()
    
    def __str__(self):
        '''
        Stringlize self.pmf
        '''
        tmpL = ["Probability table"]
        for hypo, prob in sorted(self.pmf.iteritems()):
            tmpL.append('\t'.join([str(hypo), str(prob)]))
        return '\n'.join(tmpL)
    
    def InitSequence(self, hypos):
        """
        Initialize with a sequence of hypos with equal probabilities.
        
        hypos: ['H1','H2','H3',...]
        """
        for hypo in hypos:
            self.Set(hypo, 1)
    
    def InitMapping(self, hypos):
        """
        Initialize with a map from value to probablity (a dict).
        
        hypos = {'H1':1, 'H2':5, 'H3':4}
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitPmf(self, hypos):
        """
        Initialize with a Bayes object.
        
        hypos = Bayes()
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitFailure(self, hypos):
        """Raise an errot"""
        raise ValueError("None of the initialization methods works.")
    
    def __len__(self):
        return len(self.pmf)
    
    def Set(self, hypo, prob=0):
        """
        Set hypo-prob pair
        """
        self.pmf[hypo] = prob
    
    def Print(self):
        """Print the values and freqs in asending order."""
        for hypo, prob in sorted(self.pmf.iteritems()):
            print hypo, prob
    
    def Normalize(self):
        """
        Normalize probability
        """
        total = float(sum(self.pmf.values()))
        if total == 0.0:
            raise ValueError('total probability is zero.')
            logging.warning('Normalize: total probability is zero.')
            return total
        
        factor = 1 / total
        
        for hypo in self.pmf:
            self.pmf[hypo] *= factor
    
    def Mult(self, hypo, likelihood):
        '''
        Update hypo probability by given likelihood
        '''
        self.pmf[hypo] = self.pmf.get(hypo,0) * likelihood
    
    def Prob(self, hypo, default=0):
        """
        Get the probability of given hypo.
        """
        return self.pmf.get(hypo, default)

#### 假设有袋圆球，罐1中有30个黑球和10个白球，罐2中黑球和白球各20个。某人随机的从一个罐子中取出一粒球，发现是黑球，问这个黑球从罐1中取出的概率有多大？

In [47]:
# 我们用 
#    Bowl 1表示黑球来源于罐1
#    Bowl 2表示黑球来源于罐2

hypos = ["Bowl 1", "Bowl 2"]

bowl = Bayes(hypos)

print "\nPrior probability\n"
print bowl

bowl.Mult("Bowl 1", 0.75)
bowl.Mult("Bowl 2", 0.5)

# 注意这个Normalize；
#   因为我们的穷举了所有独立的假设，因此所有假设的概率和为1.
#   我们依据此进行Normalize.
bowl.Normalize()

print "\nPosterior probability\n"
print bowl

print "\nProbabiliy for 黑球来源于罐1:", bowl.Prob('Bowl 1')


Prior probability

Probability table
Bowl 1	0.5
Bowl 2	0.5

Posterior probability

Probability table
Bowl 1	0.6
Bowl 2	0.4

Probabiliy for 黑球来源于罐1: 0.6


上面的代码中对后验概率的计算为每个假设分布进行，我们定义了`Update`函数，使得可以同时对所有假设的后验概率进行更新。

In [50]:
from __future__ import division, unicode_literals

import logging

class Bayes(object):
    """A bayes class, mainly a dictionary"""
    def __init__(self, hypos=None, name=''):
        """
        Initialize the distribution.
        
        hypos: sequence of hypotheses
        """
        self.name = name
        self.pmf = {}
        if hypos is None:
            return
        
        # Initiate the class object
        # Three initalize methods are used to deal with different types of input
        # 
        init_methods = [
            self.InitPmf,
            self.InitMapping,  #A dict
            self.InitSequence, #equal probability for all hypos
            self.InitFailure,
        ]
        
        for method in init_methods:
            try:
                method(hypos)
                break
            except AttributeError:
                continue
        
        if len(self):
            self.Normalize()
    
    def __str__(self):
        '''
        Stringlize self.pmf
        '''
        tmpL = ["Probability table"]
        for hypo, prob in sorted(self.pmf.iteritems()):
            tmpL.append('\t'.join([str(hypo), str(prob)]))
        return '\n'.join(tmpL)
    
    def InitSequence(self, hypos):
        """
        Initialize with a sequence of hypos with equal probabilities.
        
        hypos: ['H1','H2','H3',...]
        """
        for hypo in hypos:
            self.Set(hypo, 1)
    
    def InitMapping(self, hypos):
        """
        Initialize with a map from value to probablity (a dict).
        
        hypos = {'H1':1, 'H2':5, 'H3':4}
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitPmf(self, hypos):
        """
        Initialize with a Bayes object.
        
        hypos = Bayes()
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitFailure(self, hypos):
        """Raise an errot"""
        raise ValueError("None of the initialization methods works.")
    
    def __len__(self):
        return len(self.pmf)
    
    def Set(self, hypo, prob=0):
        """
        Set hypo-prob pair
        """
        self.pmf[hypo] = prob
    
    def Print(self):
        """Print the values and freqs in asending order."""
        for hypo, prob in sorted(self.pmf.iteritems()):
            print hypo, prob
    
    def Normalize(self):
        """
        Normalize probability
        """
        total = float(sum(self.pmf.values()))
        if total == 0.0:
            raise ValueError('total probability is zero.')
            logging.warning('Normalize: total probability is zero.')
            return total
        
        factor = 1 / total
        
        for hypo in self.pmf:
            self.pmf[hypo] *= factor
    
    def Mult(self, hypo, likelihood):
        '''
        Update given hypo probability by given likelihood
        '''
        self.pmf[hypo] = self.pmf.get(hypo,0) * likelihood
    
    def Prob(self, hypo, default=0):
        """
        Get the probability of given hypo.
        """
        return self.pmf.get(hypo, default)
    
    def Update(self):
        '''
        Update all hypo probability by given likelihood
        '''
        for hypo, prob in self.pmf.iteritems():
            self.pmf[hypo] = prob * self.Likelihood(hypo)
        self.Normalize()
        

上面定义的类`Bayes`中的`Update`函数依赖另一个函数Likelihood。对于不同的问题，`Prior probability`和`likelihood`不同；`Prior probability`可以在初始化时给定，对于`likelihood`我们也可以在类的外面定义函数计算`likelihood`并在初始化时给定，但是如果观察数据较多，我们就得多次初始化，操作起来比较繁琐。这儿采用的是类继承的方式，继承基类`Bayes`，在新的类中定义`Likelihood`方法。

In [53]:
from __future__ import division, unicode_literals

import logging

class Bowl(Bayes):
    likelihood = {"Bowl 1": 0.75, "Bowl 2": 0.5}
    def Likelihood(self, hypo):
        return self.likelihood[hypo]

hypos = ["Bowl 1", "Bowl 2"]

bowl = Bowl(hypos)
print "\nPrior probability\n"
print bowl

bowl.Update()
print "\nPosterior probability\n"
print bowl


Prior probability

Probability table
Bowl 1	0.5
Bowl 2	0.5

Posterior probability

Probability table
Bowl 1	0.6
Bowl 2	0.4


上面的代码中，似然值是固化在里面的，且只针对黑球，我们再做进一步的改变，使得程序可以处理多次取出不同球判断球来自于Bowl1的概率（取出后放回）

In [2]:
from __future__ import division, unicode_literals

import logging

class Bayes(object):
    """A bayes class, mainly a dictionary"""
    def __init__(self, hypos=None, name=''):
        """
        Initialize the distribution.
        
        hypos: sequence of hypotheses
        """
        self.name = name
        self.pmf = {}
        if hypos is None:
            return
        
        # Initiate the class object
        # Three initalize methods are used to deal with different types of input
        # 
        init_methods = [
            self.InitPmf,
            self.InitMapping,  #A dict
            self.InitSequence, #equal probability for all hypos
            self.InitFailure,
        ]
        
        for method in init_methods:
            try:
                method(hypos)
                break
            except AttributeError:
                continue
        
        if len(self):
            self.Normalize()
    
    def __str__(self):
        '''
        Stringlize self.pmf
        '''
        tmpL = ["Probability table"]
        for hypo, prob in sorted(self.pmf.iteritems()):
            tmpL.append('\t'.join([str(hypo), str(prob)]))
        return '\n'.join(tmpL)
    
    def InitSequence(self, hypos):
        """
        Initialize with a sequence of hypos with equal probabilities.
        
        hypos: ['H1','H2','H3',...]
        """
        for hypo in hypos:
            self.Set(hypo, 1)
    
    def InitMapping(self, hypos):
        """
        Initialize with a map from value to probablity (a dict).
        
        hypos = {'H1':1, 'H2':5, 'H3':4}
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitPmf(self, hypos):
        """
        Initialize with a Bayes object.
        
        hypos = Bayes()
        """
        for hypo, prob in hypos.iteritems():
            self.Set(hypo, prob)
    
    def InitFailure(self, hypos):
        """Raise an errot"""
        raise ValueError("None of the initialization methods works.")
    
    def __len__(self):
        return len(self.pmf)
    
    def Set(self, hypo, prob=0):
        """
        Set hypo-prob pair
        """
        self.pmf[hypo] = prob
    
    def Print(self):
        """Print the values and freqs in asending order."""
        for hypo, prob in sorted(self.pmf.iteritems()):
            print hypo, prob
    
    def Normalize(self):
        """
        Normalize probability
        """
        total = float(sum(self.pmf.values()))
        if total == 0.0:
            raise ValueError('total probability is zero.')
            logging.warning('Normalize: total probability is zero.')
            return total
        
        factor = 1 / total
        
        for hypo in self.pmf:
            self.pmf[hypo] *= factor
    
    def Mult(self, hypo, likelihood):
        '''
        Update given hypo probability by given likelihood
        '''
        self.pmf[hypo] = self.pmf.get(hypo,0) * likelihood
    
    def Prob(self, hypo, default=0):
        """
        Get the probability of given hypo.
        """
        return self.pmf.get(hypo, default)
    
    def Update(self,dataL):
        '''
        Update all hypo probability by given obervation.
        
        dataL: A list of observations.
        '''
        for data in dataL:
            for hypo, prob in self.pmf.iteritems():
                self.pmf[hypo] = prob * self.Likelihood(hypo, data)
        self.Normalize()

In [58]:
## 黑白球，取出后放回

from __future__ import division, unicode_literals

import logging

class Bowl(Bayes):
    state = {'Bowl 1': {"black": 0.75, "white": 0.25}, 'Bowl 2': {"black":0.5, "white":0.5}}
    def Likelihood(self, hypo, data):
        return self.state[hypo][data]

hypos = ["Bowl 1", "Bowl 2"]

bowl = Bowl(hypos)
print "\nPrior probability\n"
print bowl

#dataL = ["black",'white','black']
dataL = ['black']

bowl.Update(dataL)
print "\nPosterior probability after get %s ball.\n" % ','.join(dataL)
print bowl

dataL = ['white']

bowl.Update(dataL)
print "\nPosterior probability after get %s ball.\n" % ','.join(dataL)
print bowl

dataL = ['black']

bowl.Update(dataL)
print "\nPosterior probability after get %s ball.\n" % ','.join(dataL)
print bowl


Prior probability

Probability table
Bowl 1	0.5
Bowl 2	0.5

Posterior probability after get black ball.

Probability table
Bowl 1	0.6
Bowl 2	0.4

Posterior probability after get white ball.

Probability table
Bowl 1	0.428571428571
Bowl 2	0.571428571429

Posterior probability after get black ball.

Probability table
Bowl 1	0.529411764706
Bowl 2	0.470588235294


In [61]:
## 黑白球，取出后放回

from __future__ import division, unicode_literals

import logging

class Bowl(Bayes):
    state = {'Bowl 1': {"black": 0.75, "white": 0.25}, 'Bowl 2': {"black":0.5, "white":0.5}}
    def Likelihood(self, hypo, data):
        return self.state[hypo][data]

hypos = ["Bowl 1", "Bowl 2"]

bowl = Bowl(hypos)
print "\nPrior probability\n"
print bowl

dataL = ["black",'white','black']

bowl.Update(dataL)
print "\nPosterior probability after get <%s> ball.\n" % ','.join(dataL)
print bowl


Prior probability

Probability table
Bowl 1	0.5
Bowl 2	0.5

Posterior probability after get <black,white,black> ball.

Probability table
Bowl 1	0.529411764706
Bowl 2	0.470588235294


黑白球问题，如果是取出之后不放，这个问题应该怎么计算？ 这里我们做一个假设，取出的球来自于当次取出后，计算出的后验概率大的罐子。

In [28]:
## 黑白球，取出后不放回

from __future__ import division, unicode_literals

import logging

class Bowl(Bayes):
    state = {'Bowl 1': {"black": 30, "white": 10}, 'Bowl 2': {"black":20, "white":20}}
    def Likelihood(self, hypo, data):
        return self.state[hypo][data]/sum(self.state[hypo].values())
    
    def Update_state(self, data):
        choose = ''
        prob  = 0
        for hypo, post_prob in self.pmf.items():
            if post_prob > prob:
                prob = post_prob
                choose = hypo
        self.state[choose][data] -= 1
        print "We assume <%s> ball from <%s>." % (data, choose)
    
    def Update(self, dataL):
        '''
        Update probability using observed data <dataL>
        '''
        for data in dataL:
            print "Current state:", self.state
            print "Get <%s> ball" % data
            for hypo, prob in self.pmf.items():
                self.pmf[hypo] = prob * self.Likelihood(hypo,data)
            self.Normalize() #可以不要，只是测试用
            for hypo, prob in self.pmf.items():
                print hypo, prob
                
            self.Update_state(data)
            print
            
        
hypos = ["Bowl 1", "Bowl 2"]

bowl = Bowl(hypos)
print "\nPrior probability\n"
print bowl
print 

dataL = ["black",'white','black']

bowl.Update(dataL)
print "\nPosterior probability after get <%s> ball.\n" % ','.join(dataL)
print bowl


Prior probability

Probability table
Bowl 1	0.5
Bowl 2	0.5

Current state: {u'Bowl 2': {u'white': 20, u'black': 20}, u'Bowl 1': {u'white': 10, u'black': 30}}
Get <black> ball
Bowl 2 0.4
Bowl 1 0.6
We assume <black> ball from <Bowl 1>.

Current state: {u'Bowl 2': {u'white': 20, u'black': 20}, u'Bowl 1': {u'white': 10, u'black': 29}}
Get <white> ball
Bowl 2 0.565217391304
Bowl 1 0.434782608696
We assume <white> ball from <Bowl 2>.

Current state: {u'Bowl 2': {u'white': 19, u'black': 20}, u'Bowl 1': {u'white': 10, u'black': 29}}
Get <black> ball
Bowl 2 0.472727272727
Bowl 1 0.527272727273
We assume <black> ball from <Bowl 1>.


Posterior probability after get <black,white,black> ball.

Probability table
Bowl 1	0.527272727273
Bowl 2	0.472727272727


#### Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”

In [4]:
from __future__ import division, unicode_literals

import logging

class Monty(Bayes):
    def Likelihood(self, hypo, data):
        if hypo == "Door 1" and data == "Door 3":
            return 0.5
        elif hypo == "Door 2" and data == "Door 3":
            return 1
        elif hypo == "Door 3" and data == "Door 3":
            return 0
        else:
            raise ValueError('Unknown hypos or data')
            logging.warning('Likelihood: Unknown hypos or data')
            

hypos = ["Door 1", "Door 2", "Door 3"]

game = Monty(hypos)
print "\nPrior probability\n"
print game

dataL = ["Door 3"]

game.Update(dataL)
print "\nPosterior probability for each door after the host picking <%s>.\n" % ','.join(dataL)
print game


Prior probability

Probability table
Door 1	0.333333333333
Door 2	0.333333333333
Door 3	0.333333333333

Posterior probability for each door after the host picking <Door 3>.

Probability table
Door 1	0.333333333333
Door 2	0.666666666667
Door 3	0.0


#### 公司在不同年份生产的M&M豆包含的不同颜色的豆的比例不同， 1994年产的M&M豆包装中，棕色30%，黄色20%，红色20%，绿色10%，橙色10%， 茶色10%；1996年产的M&M豆包装中，棕色13%，黄色14%，红色13%，绿色20%， 橙色16%，蓝色24%。假设手中有两粒M&M豆，分别是橙色和绿色， 一个来自1994年包装，一个来自1996年包装，求算橙色来源于1994年包装的概率？

In [11]:
##一个解法，有点乱，没有理顺关系
from __future__ import division, unicode_literals

import logging

class MM(Bayes):
    
    mm_state = {'1994':dict(brown=30,yellow=20,red=20,green=10,orange=10,tan=10),
                '1996':dict(blue=24,green=20,orange=16,yellow=14,red=13,brown=13)}
    
    hyposD = {"A": {'1994':'orange', '1996':'green'}, "B":{'1994':'green','1996':'orange'}}
    #hyposD = {"A": {'1994':'yellow', '1996':'green'}, "B":{'1994':'green','1996':'yellow'}}
    
    def Likelihood(self, hypo, data):        
        likelihood = 1
        for hypo_cur, data_cur in self.hyposD[hypo].items():
            likelihood *= self.mm_state[hypo_cur][data_cur] / sum(self.mm_state[hypo_cur].values())
            #print hypo_cur, data_cur,likelihood
        return likelihood
            
            
## hypo A: 橙色1994，绿色1996
## hypo B: 橙色1996，绿色1994

hypos = ["A", "B"]

mm = MM(hypos)
print "\nPrior probability\n"
print mm

data = ['green-orange']

mm.Update(data)
print "\nPosterior probability\n"
print mm


Prior probability

Probability table
A	0.5
B	0.5
1994 orange 0.1
1996 green 0.02
1994 green 0.1
1996 orange 0.016

Posterior probability

Probability table
A	0.555555555556
B	0.444444444444


In [15]:
from __future__ import division, unicode_literals

import logging

class MM(Bayes):
    
    mix1994 = dict(brown=30,yellow=20,red=20,green=10,orange=10,tan=10)
    mix1996 = dict(blue=24,green=20,orange=16,yellow=14,red=13,brown=13)
    
    hyposD = {'A':{'bag1':mix1994,'bag2':mix1996}, 'B':{'bag1':mix1996,'bag2':mix1994}}
    #hyposD = {"A": {'1994':'yellow', '1996':'green'}, "B":{'1994':'green','1996':'yellow'}}
    
    def Likelihood(self, hypo, data):        
        color,bag = data
        likelihood = self.hyposD[hypo][bag][color]/sum(self.hyposD[hypo][bag].values())
        return likelihood
        
## 引入两个变量bag1, bag2代表两个袋子
## 列出我们的观察数据，绿色的来源于其中一个袋子bag1，红色来源于另一个袋子bag2

dataL = [("green","bag1"),("yellow","bag2")]

## 列出假设
## hypo A: bag1是1994，bag2是1996
## hypo B: bag1是1996，bag2是1994

## 这里我们利用中间变量bag1和bag2来提出假设，判断bag1是1994还是1996，而不是根据颜色提出假设。
## 这么做的好处是方便处理观察到的数据。

hypos = ['A','B']

mm = MM(hypos)

mm.Update(dataL)

print mm


Probability table
A	0.259259259259
B	0.740740740741


In [23]:
from __future__ import division, unicode_literals

import logging

class MM(Bayes):
    
    mix1994 = dict(brown=30,yellow=20,red=20,green=10,orange=10,tan=10)
    mix1996 = dict(blue=24,green=20,orange=16,yellow=14,red=13,brown=13)
    
    hyposD = {'A':{'bag1':mix1994,'bag2':mix1996}, 'B':{'bag1':mix1996,'bag2':mix1994}}
    #hyposD = {"A": {'1994':'yellow', '1996':'green'}, "B":{'1994':'green','1996':'yellow'}}
    
    def Likelihood(self, hypo, data):     
        likelihood = 1
        for color,bag in data.items():
            likelihood *= self.hyposD[hypo][bag][color]/sum(self.hyposD[hypo][bag].values())
        return likelihood

#mix1994 = dict(brown=30,yellow=20,red=20,green=10,orange=10,tan=10)
#mix1996 = dict(blue=24,green=20,orange=16,yellow=14,red=13,brown=13)
    
## 引入两个变量bag1, bag2代表两个袋子
## 列出我们的观察数据，绿色的来源于其中一个袋子bag1，红色来源于另一个袋子bag2

dataL = [{"green":"bag1","yellow":"bag2"}]

## 列出假设
## hypo A: bag1是1994，bag2是1996
## hypo B: bag1是1996，bag2是1994
hyposD = {'A':{'bag1':"mix1994",'bag2':"mix1996"}, 'B':{'bag1':"mix1996",'bag2':"mix1994"}}

## 这里我们利用中间变量bag1和bag2来提出假设，判断bag1是1994还是1996，而不是根据颜色提出假设。
## 这么做的好处是方便处理观察到的数据。

hypos = ['A','B']

mm = MM(hypos)

mm.Update(dataL)

for hypo, describD in hyposD.items():
    print "%s: %s from %s, %s from %s" % (hypo, 'green',describD[dataL[0]['green']],'yellow',describD[dataL[0]['yellow']])

print mm

A: green from mix1994, yellow from mix1996
B: green from mix1996, yellow from mix1994
Probability table
A	0.259259259259
B	0.740740740741


**MM问题需要再次消化**

#### 假设一项药物测试的假阳性率（非特异性）和假阴性率（不敏感性）都是1%。 已知人群中服用过该药物的个体约占0.5%。如果随机选择一个个体检测为阳性， 那么他服药的概率是多少？

In [30]:
class Test(Bayes):
    
    testSta = {'positive':{'test_positive':0.99,'test_negative':0.01},
               'negative':{'test_positive':0.01,'test_nagative':0.99}}
    
    def Likelihood(self, hypo, data):
        return self.testSta[hypo][data]

# 观察到的数据
data = ["test_positive"]

# 假设
hyposD = {'positive':0.005, 'negative':0.995}

test = Test(hyposD)
print "Prior probability:\n"
print test

test.Update(data)
print "Posterior probability:\n"
print test


Prior probability:

Probability table
negative	0.995
positive	0.005
Posterior probability:

Probability table
negative	0.667785234899
positive	0.332214765101


In [32]:
class Test(Bayes):
    
    # 提高检测敏感性，降低假阴性率
    testSta = {'positive':{'test_positive':1,'test_negative':0},
               'negative':{'test_positive':0.01,'test_nagative':0.99}}
    
    def Likelihood(self, hypo, data):
        return self.testSta[hypo][data]

# 观察到的数据
data = ["test_positive"]

# 假设
hyposD = {'positive':0.005, 'negative':0.995}

test = Test(hyposD)
print "Prior probability:\n"
print test

test.Update(data)
print "Posterior probability:\n"
print test


Prior probability:

Probability table
negative	0.995
positive	0.005
Posterior probability:

Probability table
negative	0.665551839465
positive	0.334448160535


提高检测特异性，降低假阳性率，有助于更准确表示检测结果。

In [33]:
class Test(Bayes):
    
    # 提高检测特异性，降低假阳性率
    testSta = {'positive':{'test_positive':0.99,'test_negative':0.01},
               'negative':{'test_positive':0.005,'test_nagative':0.995}}
    
    def Likelihood(self, hypo, data):
        return self.testSta[hypo][data]

# 观察到的数据
data = ["test_positive"]

# 假设
hyposD = {'positive':0.005, 'negative':0.995}

test = Test(hyposD)
print "Prior probability:\n"
print test

test.Update(data)
print "Posterior probability:\n"
print test


Prior probability:

Probability table
negative	0.995
positive	0.005
Posterior probability:

Probability table
negative	0.501259445844
positive	0.498740554156
