# 출처:http://allendowney.github.io/ThinkBayes2/chap04.html

Whenever you survey people about sensitive issues, you have to deal with social desirability bias, which is the tendency of people to adjust their answers to show themselves in the most positive light. One way to improve the accuracy of the results is randomized response.

As an example, suppose you want to know how many people cheat on their taxes. If you ask them directly, it is likely that some of the cheaters will lie. You can get a more accurate estimate if you ask them indirectly, like this: Ask each person to flip a coin and, without revealing the outcome,

If they get heads, they report YES.

If they get tails, they honestly answer the question “Do you cheat on your taxes?”

If someone says YES, we don’t know whether they actually cheat on their taxes; they might have flipped heads. Knowing this, people might be more willing to answer honestly.

Suppose you survey 100 people this way and get 80 YESes and 20 NOs. Based on this data, what is the posterior distribution for the fraction of people who cheat on their taxes? What is the most likely quantity in the posterior distribution?

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import beta

In [None]:
!pip install empiricaldist
from empiricaldist import Pmf

In [None]:
a=1;b=1 #beta(1,1)
hypos=np.linspace(0,1,101)
x=beta.pdf(hypos,a,b)
prior=Pmf(x,hypos)

In [None]:
plt.plot(hypos,x) #prior plot

In [None]:
likelihood={'Y':0.5+hypos/2, 'N':(1-hypos)/2} #Q1 WHY?? IS THIS BINOMIAL MODEL?

# Q1 : Likelihood 함수가 이와 같이 나오는 이유를 설명해주세요. 그리고 이와 같은 sampling model은 binomial model인가요? 왜 그런가요?

In [None]:
dataset='Y'*80+'N'*20 #DATA 80 YES, 20 NO
posterior1=prior.copy()
for data in dataset:
    posterior1 *= likelihood[data]
posterior1.normalize()

In [None]:
posterior1.plot(label='80 YES, 20 NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend() #Q2 WHY MODE ON 0.6? DATA IS 80 YES & 20 NO. WHY NOT 0.8?

# Q2 : 우리가 구한 Posterior 분포의 꼴이 왜 다음과 같이 나왔을까요? prior의 꼴과 data와 관련하여 설명해주세요. 그리고 data는 YES와 NO의 비율이 4:1인데 왜 posterior의 mode는 0.8이 아니라 0.6인걸까요?

In [None]:
dataset='Y'*800+'N'*200 #DATA 800 YES, 200 NO
posterior2=prior.copy()
for data in dataset:
    posterior2 *= likelihood[data]
posterior2.normalize()

In [None]:
posterior2.plot(label='800 YES, 200 NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend()

In [None]:
posterior1.plot(label='80 YES, 20 NO')
posterior2.plot(label='800 YES, 200 NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend() #Q3 DIFFERENCE? WHY?

# Q3: 두 posterior 분포는 왜 이렇게 다른 꼴이 나오게 되었을까요?

# Q4: PRIOR와 DATA를 마음대로 바꿔가며 POSTERIOR 분포가 어떻게 변하는지 실험해보세요.

In [None]:
#Q4 PRIOR와 DATA를 마음대로 바꿔가며 POSTERIOR 분포가 어떻게 변하는지 실험해보세요.
a=;b= #a,b SET YOUR OWN PRIOR beta(a,b)
hypos=np.linspace(0,1,101)
x=beta.pdf(hypos,a,b)
prior=Pmf(x,hypos)

In [None]:
plt.plot(hypos,x) # YOUR OWN PRIOR

In [None]:
dataset='Y'* +'N'* #GIVE THE DATA
posterior3=prior.copy()
for data in dataset:
    posterior3 *= likelihood[data]
posterior3.normalize()

In [None]:
posterior3.plot(label=' YES,  NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend()

In [None]:
a=;b= #a,b SET YOUR OWN PRIOR 
hypos=np.linspace(0,1,101)
x=beta.pdf(hypos,a,b)
prior=Pmf(x,hypos)
plt.plot(hypos,x) # YOUR OWN PRIOR

In [None]:
dataset='Y'* +'N'* #GIVE THE DATA
posterior4=prior.copy()
for data in dataset:
    posterior4 *= likelihood[data]
posterior4.normalize()

In [None]:
posterior4.plot(label=' YES,  NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend()

In [None]:
posterior3.plot(label=' YES,  NO')
posterior4.plot(label=' YES,  NO',xlabel='Proportion of cheaters',ylabel='PMF')
plt.legend()

%%%눈치채신 분도 계시겠지만 posterior분포의 밑넓이가 1은 아닙니다. 연속형 확률분포의 prior에 likelihood를 곱한게 아니라 linspace(0,1,101)로 쪼갠 x에 해당하는 pdf(x)값을 prior로 사용했기 때문에 엄밀히 말하면 prior가 분포는 되지 못하죠. 하지만 likelihood와의 곱을 통해 posterior의 꼴을 얻어 여러가지의 prior, data의 경우를 가질 때 posterior의 꼴의 비교를 할 수 있다는 점에 의의를 둬주시면 감사하겠습니다.(이산형으로 표현된 posterior의 꼴이고 normalize가 되었으므로 sum(posterior)는 1이 나오긴 합니당)%%% 

In [None]:
sum(posterior1)