A and B each gamble on a lucky game: if rolls Head, win 3 times the bet amount, else just lose the bet.

A bets 1/3 of her wealth every time, B bets 1/2 of his wealth every time. What is expected profit after 1000 games?

In [34]:
import random

a,b = [], []
for i in range(10000):
    A,B = 1, 1
    for _ in range(100):
        if random.randint(0,1) == 1:
            A = A*5/3
            B = 1/2 * B
        else:
            A = A * 2/3
            B = 2 * B
    a.append(A)
    b.append(B)

sum(a)/10000, sum(b)/10000


(5795620.188276811, 43002225.614707455)

In [33]:
(5/4)**100

4909093465.297727

Throw dice until 6. When a six is rolled, if this is the nth roll, receives payoff of n/6. 

What is the fair value?

1/6 * 1/6 + 5/6 *1/6 * 2/6 + (5/6)^2 *1/6 * 3/6 + ... = 1

In [39]:
pays = []
N = 100000
for _ in range(N):
    count = 0
    while random.randint(1,6) != 6:
        count += 1
    count += 1
    payoff = count / 6
    pays.append(payoff)
sum(pays)/N


1.0005883333334071

Given that on both saturday and sunday, chance of raining is 0.2. 

The correlation between raining on saturday and sunday is 0.6. 

what is chance that at least one day rains?


Corr = (E(XY) - E(X)E(Y)) / sqrt( var(x) * var(y) ) = 0.6

Note var(X) = var(y) = 0.2 * 0.8 = 0.16

so E(XY) - E(X)E(Y) = 0.6 * 0.16 = 0.096

Note E(X)E(Y) = 0.2^2 = 0.04, so E(XY) = 0.136

This is the chance of both day raining, so at least one day rain is 0.2+0.2-0.136 = 0.264

In [44]:
import random

p = 0.2  # Probability of rain on both Saturday and Sunday
correlation = 0.6  # Correlation between raining on Saturday and Sunday
N = 1000000  # Number of simulations

def simulate_rain(p, correlation):
    rain_saturday = random.random() < p
    
    if rain_saturday:
        adjusted_p_sunday = p + correlation * (1 - p)
    else:
        adjusted_p_sunday = p - correlation * p
    
    rain_sunday = random.random() < adjusted_p_sunday
    return rain_saturday, rain_sunday

at_least_one_day_rain = 0

for _ in range(N):
    rain_saturday, rain_sunday = simulate_rain(p, correlation)
    if rain_saturday or rain_sunday:
        at_least_one_day_rain += 1

expected_probability = at_least_one_day_rain / N
expected_probability


0.264858

# Double data points Linear regression

If we double all the data points for linear regression, what happens to beta, R^2 and t statistic?

beta = cov(x, y) / var(x)

If we double all data points, both cov and var would stay the same, so beta stays the same.

New cov becomes

$$
\text{Cov}_{\text{new}}(x, y) = \frac{1}{2n} \sum_{i=1}^{2n} (x_i - \bar{x})(y_i - \bar{y})
$$

Similarly the new var stays the same.


R^2 = 1-RSS/TSS, where TSS = sum(yi-E(y))^2 would double, and RSS would also double, so R2 stay the same.

The t statistic and the SE is

$$
t = \frac{\hat{\beta_1}}{\text{SE}(\hat{\beta_1})}
\\
\\
\text{SE}(\hat{\beta_1}) = \frac{\sigma}{\sqrt{\sum (x_i - \bar{x})^2}}


$$

So the SE drops by $ 1/ \sqrt{2} $, so t-stat increases by $ \sqrt{2} $

# y ~ x vs x ~ y

Let beta_1 be coeff for y~x, and alpha_1 for x~y

$$
\beta_1 = \frac{\text{Cov}(x, y)}{\text{Var}(x)}\\
\\
\alpha_1 = \frac{\text{Cov}(x, y)}{\text{Var}(y)}
\\
\beta_1 \alpha_1 = corr(x, y)^2
$$

Their product ranges from 0 to 1


# y~x1, y~x2 vs y~x1+x2
Given R^2 for y ~ x1 and y ~ x2, 

this is [coeff of multiple correlation](https://en.wikipedia.org/wiki/Coefficient_of_multiple_correlation)


# Interpretation of R^2 and P-val

R^2 represents the proportion of the variance in y explained by x

Small p-val means we have evidence against null hypothesis: beta1 = 0, or we can believe there is a significant relationship between x and y.

We can't conclude anything about linearity!

# Interpret of heteroskedasticity and multicolinearity

If the residuals have constant variance, the model's assumptions are met.

Otherwise  the model's standard errors may be biased, leading to unreliable t-statistics and p-values

Multicollinearity occurs when two or more independent variables in a regression model are highly correlated with each other

# ATM query problem

Given a list `amount` = [2,5,3,10] and `k`, the max amount a person can withdraw from an ATM in one single time, return the order of people taking out all their money.

If a person couldn't withdraw all their amount in one take, they will auto line up at the end of the queue.

In [45]:
'''Brute force, worst time complexity O(max(amount)/k * n)'''
from collections import deque

def atm_withdrawal_order(amount, k):
    n = len(amount)
    queue = deque(range(n))  # queue holding indices of people
    order = []  # final order of completion

    while queue:
        person = queue.popleft()  # take the person at the front of the queue
        if amount[person] <= k:
            order.append(person + 1)  # 1-based index
        else:
            amount[person] -= k
            queue.append(person)

    return order

amount = [2,5,3,10]
k = 2
atm_withdrawal_order(amount, k)


[1, 3, 2, 4]

In [51]:
'''Faster method, sort the zipped tuple'''
def atm_withdrawal_order(amount, k):
    from math import ceil as c

    # number of times each person need to take all the money
    times = [c(a/k) for a in amount]
    times_zipped = list(enumerate(times)) # zip the times with the person index
    t = sorted(times_zipped, key=lambda x: x[1]) # sort by times
    return [i[0]+1 for i in t] # index+1 so that 0-index means 1st person 

amount = [2,5,3,10]
k = 2
atm_withdrawal_order(amount, k)

[1, 3, 2, 4]

# Max of 3 dice throws

Let M = max(X1, X2, X3)

Then P(M=1) = p(X1<2)^3 = 1/216 = 0.004

P(M=2) = p(X1<3)^3 - p(M=1) = 1/27 - 1/216 = 7/216 = 0.032

Similarly we get p(M=3) = 19/216 = 0.088

p(M=4) = 37/216 = 0.171, and so on

In [13]:
import random
xi = []
for _ in range(1000):
    x = max([random.randint(1,6) for _ in range(3)])
    xi.append(x)

print("the mean", sum(xi)/1000)
from collections import Counter
frequency = Counter(xi)
frequency


the mean 4.923


Counter({6: 409, 5: 280, 4: 185, 3: 83, 2: 37, 1: 6})

# Bayesian posterior on fairness of coins

Consider 7 coins, 3 fair coins, 2 coins with only heads, 2 coins with only tails. 

We draw 3 coins and observed 3 heads. What is the chance we drew 3 fair coins?

Let A,B,C denote fair, only-head and only-tails.

We are interested in p(AAA | x=HHH), which we can infer with bayesian

p(x=HHH | AAA) * p(AAA) = 1/(7 choose 3) * 1/8 = 1/280

p(AAA) = p(x| AAA) * p(AAA) + p(x|AAB) * p(AAB) + p(x|ABB) * p(ABB)

= 1/280 + 6/(7 choose 3) * 1/4 + 3/(7 choose 3) * 1/2 = 1/280 + 3/70 + 3/70

So we have 1/25 as the answer

# Parents want equal number of boys and girls

Parents want to stop giving birth until they have equal number of boys and girls.

The expected value is inf, this is divergent series.

sum (2n) * (1/2)^(2n) * (2n choose n)