In [2]:
import pandas as pd

gss = pd.read_csv('gss_bayes.csv', index_col=0)
gss.head()

Unnamed: 0_level_0,year,age,sex,polviews,partyid,indus10
caseid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1974,21.0,1,4.0,2.0,4970.0
2,1974,41.0,1,5.0,0.0,9160.0
5,1974,58.0,2,6.0,1.0,2670.0
6,1974,30.0,1,5.0,4.0,6870.0
7,1974,48.0,1,5.0,4.0,7860.0


Once the dataset is imported, we can focus only on bankers:

In [3]:
bankers = (gss['indus10'] == 6870) #6870 it's the code for banking
bankers.head()

numOfBankers = bankers.sum() #it treats True as 1 and False as 0
fractionOfBankers = bankers.mean() #it computes the mean, but it's basically the fraction of bankers on the total
print("Probability of a banker: ",round(fractionOfBankers*100,2),"%")

Probability of a banker:  1.48 %


In [4]:
def probability(A: pd.DataFrame):
    return A.mean()

probability(bankers)

0.014769730168391155

In [5]:
female = (gss['sex'] == 2)
probability(female)

0.5378575776019476

Now let's analyze the political views:

In [6]:
liberal = (gss['polviews']<=3)
democrat = (gss['partyid']<=1)

probability(liberal)
probability(democrat)

0.3662609048488537

Let's analyze the conjunction of the events:

In [7]:
probability(bankers & democrat) #here we are using the & (elementwise and operator)

0.004686548995739501

Let's find out the probability of democrats if liberal:

In [8]:
selected = democrat[liberal] # this kind of behaviour is only possible because of pandas
probability(selected)

0.5206403320240125

Fraction of bankers that are female:

In [9]:
selected = female[bankers]
probability(selected)

0.7706043956043956

In [10]:
def conditionalProbability(proposition: pd.DataFrame,given:pd.DataFrame):
    return probability(proposition[given]) #using only the Trues in the proposition we see how much of them are True even in the given

print(conditionalProbability(female,bankers)) #fraction of female inside the bankers set
print(conditionalProbability(bankers,female)) #fraction of bankers inside the female set

0.7706043956043956
0.02116102749801969


We can see that:
- 77% of bankers are female;
- 2% of females are bankers.

In [13]:
print(conditionalProbability(female,given = liberal & democrat))
print(conditionalProbability(liberal & female, given=bankers))

0.576085409252669
0.17307692307692307


Instead here we can see that:
- 57% of liberal and democrat people, are female;
- 17% of bankers are liberale and female.

# Theorem 1 of probability
Now let's see how to calculate the fraction of females given the banker subset in a different way from what we saw in conditional probability:
$$P(Female|Bankers) = \frac{P(\text{Female and Bankers})}{P(Bankers)}$$

In [14]:
print(female[bankers].mean()) # equal to conditionalProbability(female,bankers)
print(probability(female & bankers) / probability(bankers))

0.7706043956043956
0.7706043956043956


# Theorem 2 of probability
To calculate the probability of the conjunction of liberal and democrats P(liberal&democrats), we can multiply both sides (using the formula above with the current predicates) with democrats, obtaining this:
$$P(liberal|democrats)\times P(democrats) = P(\text{liberal and democrats})$$

In [17]:
print(probability(liberal & democrat))
print(probability(democrat) * conditionalProbability(liberal,democrat))

0.1425238385067965
0.1425238385067965


# Theorem 3 of probability (Bayes's Theorem)
Given that the conjunction is commutative, we can apply the second theorem from both sides and obtain:
$$P(A|B)\times P(B) = P(B|A)\times P(A)$$
Now if we divide from both sides with P(B) we obtain the Bayes's Theorem:
$$P(A|B) = \frac{P(B|A)\times P(A)}{P(B)}$$

In [18]:
print(conditionalProbability(liberal, given=bankers))
print(conditionalProbability(bankers,given=liberal)*probability(liberal)/probability(bankers))

0.2239010989010989
0.2239010989010989


# Law of total probability
The law says:
$$P(A) = P(B_{1} and A) + P(B_{2}and A)$$
Only if $B_{1}$ and $B_{2}$ are:
- mutually exclusive (only one of them can be true);
- collectively exhaustive, which means that one of them must be true.

In [20]:
male = (gss['sex'] == 1)

print(probability(bankers))
print(probability(bankers & male) + probability(bankers & female))

# Using theorem 2
print(conditionalProbability(bankers,given=male)*probability(male) + conditionalProbability(bankers,given=female)*probability(female))

0.014769730168391155
0.014769730168391155
0.014769730168391153


In case of multiple mutually exclusive and collectively exhaustive propositions to check, we can write the formula as a summation:
$$P(A) = \sum_{i}P(B_i)\times P(A|B_i)$$

In [22]:
politicalViews = gss['polviews']
politicalViews.value_counts().sort_index()

1.0     1442
2.0     5808
3.0     6243
4.0    18943
5.0     7940
6.0     7319
7.0     1595
Name: polviews, dtype: int64

In [23]:
# we can compute the probability of bankers in this way:
sum([probability((politicalViews == i)*conditionalProbability(bankers,(politicalViews==i))) for i in range(1,8)])

0.014769730168391157