### Chapter 2. 계산 통계

#### 2.1 분포

* `Pmf` : 확률 질량 함수(probability mass function)

In [1]:
from python_code.thinkbayes import Pmf

- 주사위의 분포

In [2]:
pmf_cube = Pmf()
for x in range(1,7):
    pmf_cube.Set(x, 1/6)

total = 0
for x in range(1,7):
    print(x,':',pmf_cube.Prob(x))
    total += pmf_cube.Prob(x)
print('Total :', total)

1 : 0.16666666666666666
2 : 0.16666666666666666
3 : 0.16666666666666666
4 : 0.16666666666666666
5 : 0.16666666666666666
6 : 0.16666666666666666
Total : 0.9999999999999999


* `Let it go` 가사에 대해서 `Pmf`를 구해보겠습니다.

In [3]:
let_it_go = '''
The snow glows white on the mountain tonight,
not a footprint to be seen
A kingdom of isolation and it looks like I'm the Queen
The wind is howling like this swirling storm inside
Couldn't keep it in, Heaven knows I tried
Don't let them in, don't let them see
Be the good girl you always have to be
Conceal, don't feel, don't let them know
Well, now they know

Let it go, let it go
Can't hold it back anymore
Let it go, let it go
Turn away and slam the door
I don't care what they're going to say
Let the storm rage on
The cold never bothered me anyway

It's funny how some distance
makes everything seem small
And the fears that once controlled me
Can't get to me at all

It's time to see what I can do
To test the limits and break through
No right, no wrong, no rules for me,
I'm free!

Let it go, let it go
I am one with the wind and sky
Let it go, let it go
You'll never see me cry
Here I stand
And here I'll stay
Let the storm rage on

My power flurries through the air into the ground
My soul is spiraling in frozen fractals all around
And one thought crystallizes like an icy blast
I'm never going back, the past is in the past

Let it go, let it go
And I'll rise like the break of dawn
Let it go, let it go
That perfect girl is gone
Here I stand
In the light of day
Let the storm rage on
The cold never bothered me anyway
'''

In [4]:
lines = let_it_go.splitlines()
words = [word for line in lines if len(line) != 0 for word in line.split(' ') if len(word) != 0]

In [5]:
pmf_let_it_go = Pmf()
for word in words:
    pmf_let_it_go.Incr(word, 1)
pmf_let_it_go.Normalize()

276

In [6]:
pmf_let_it_go.Prob('Let')

0.03260869565217391

#### 2.2 쿠키 문제

앞장에서 살펴본 쿠키 문제를 `Pmf`를 사용해서 풀이

- 2개의 그릇을 집을 확률이 각각 `0.5`

In [21]:
pmf_cookie = Pmf()

pmf_cookie.Set('Bowl 1', 0.5)
pmf_cookie.Set('Bowl 2', 0.5)

* 그릇1에서 바닐라 쿠키를 뽑을 확률이 `3/4` 이며, 그릇2에서는 `1/2`이므로

In [22]:
pmf_cookie.Mult('Bowl 1', 0.75)
pmf_cookie.Mult('Bowl 2', 0.5)

pmf_cookie.Normalize()

0.625

- 그럼 이제 바닐라 쿠키를 뽑았을 경우 그게 그릇1인 확률은 ?
  - 1장에서 계산한 결과는 `3/5`였음

In [23]:
pmf_cookie.Prob('Bowl 1')

0.6000000000000001

#### 2.3 쿠키 문제에 대한 베이지안 프레임워크

In [29]:
class Cookie(Pmf):
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        mix = self.mixes[hypo]
        like = mix[data]
        return like
    
    mixes = {
        'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
        'Bowl 2':dict(vanilla=0.5, chocolate=0.5)
    }

In [35]:
hypos = ['Bowl 1', 'Bowl 2']
cookie = Cookie(hypos)

In [36]:
cookie.Update('vanilla')

for hypo, prob in cookie.Items():
    print(hypo, prob)

Bowl 1 0.6000000000000001
Bowl 2 0.4


In [37]:
cookie.Update('chocolate')

for hypo, prob in cookie.Items():
    print(hypo, prob)

Bowl 1 0.4285714285714286
Bowl 2 0.5714285714285714


#### 2.4 몬티 홀 문제

- `__init__` , `Update`는 앞서 본 `Cookie`와 똑같음
- `Likelihood`만 다름

In [39]:
class Monty(Pmf):
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1

In [40]:
hypos = 'ABC'
monty = Monty(hypos)

monty.Update('B')

for hypo, prob in monty.Items():
    print(hypo, prob)

A 0.3333333333333333
C 0.6666666666666666
B 0.0


In [41]:
monty.Update('A')

for hypo, prob in monty.Items():
    print(hypo, prob)

A 0.0
C 1.0
B 0.0


사실 코드가 이해가 되지는 않지만... 일단 책을 끝까지 다 본뒤에 다시 판단하기로...

#### 2.5 Framework Incapsulation

위 코드에서 공통적으로 사용가능한 부분을 `Suite`으로 정의

In [42]:
class Suite(Pmf):
    
    def __init__(self, hypos):
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Update(self, data):
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    def Print(self):
        for hypo, prob in self.Items():
            print(hypo, prob)

`Monty`를 `Suite`를 사용해서 다시 구현

In [43]:
class Monty(Suite):
    
    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1

In [44]:
monty = Monty('ABC')
monty.Update('B')
monty.Print()

A 0.3333333333333333
C 0.6666666666666666
B 0.0


#### 2.6 M&M 문제

- 가설 A : 1번 봉지 :1994년, 2번 봉지 : 1996년
- 가설 B : 반대

In [46]:
mix94 = dict(brown=30, yellow=20, red=20, green=10, orange=10, tan=10)
mix96 = dict(blue=24, green=20, orange=16, yellow=14, red=13, brown=13)

hypoA = dict(bag1=mix94, bag2=mix96)
hypoB = dict(bag1=mix96, bag2=mix94)
hypotheses = dict(A=hypoA, B=hypoB)

In [47]:
class M_and_M(Suite):
    
    def Likelihood(self, data, hypo):
        bag, color = data
        mix = hypotheses[hypo][bag]
        like = mix[color]
        return like

In [48]:
mm = M_and_M('AB')
mm.Update(('bag1','yellow'))
mm.Update(('bag2','green'))

mm.Print()

A 0.7407407407407407
B 0.2592592592592592
