# Exercise 2.1: Posterior inference

Since we have a Beta(4,4) prior for $\theta$, we have that:
\begin{equation*}
p(\theta) \propto \theta^3\cdot (1-\theta)^3
\end{equation*}

Now we are told that $n=10$ and $Y<3$, therefore:
\begin{equation*}
Pr(Y<3 \mid \theta) = \sum\limits_{y=0}^{2} p(Y=y \mid \theta) = {10 \choose 0} (1-\theta)^{10} + {10 \choose 1} \theta (1-\theta)^9 + {10 \choose 2} \theta^2 (1-\theta)^8 
\end{equation*}
Hence by Bayes' theorem:
\begin{equation*}
Pr(\theta \mid Y<3) \propto Pr(Y<3 \mid \theta) \cdot Pr(\theta) = \theta^3 (1-\theta)^{13} + 10 \cdot \theta^4 (1-\theta)^{12} + 45\cdot \theta^5 (1-\theta)^{11} 
\end{equation*}
Sketch of the distribution:
<img src="figures/fig2.1.png">

---
Python code to generate the distribution graph:
```python
import numpy as np
import plotly.graph_objects as go

theta = np.linspace(0,1, 100)
get_p_theta = lambda t: t**3 * (1-t)**13 + 10 * t**4 * (1-t)**12 + 45 * t**5 * (1-t)**11
p_theta = get_p_theta(theta)


fig = go.Figure(
        go.Scatter(
            x=theta,
            y=p_theta/np.sum(p_theta)
        )
)
fig.update_layout(
    title={'text': 'Fig 2.1 - Posterior distribution density of θ',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'θ'},
    yaxis={'title': 'p(θ | y)'})

fig
```

# Exercise 2.2: Predictive distributions

**Notation** Let $T$ be the event of resulting in tail, let $N$ be the random variable denoting the number of additional spins until head shows up, let $p_i = Pr(H \mid C_i)$ for $i=1,2$. We want to compute $E\left[ N \mid TT \right]$ where $TT$ denotes the event that the first two spins from the chosen coin are tails.

By the law of total probabilities we have:

\begin{equation}
p(N=n\mid TT) = \sum\limits_{i=1,2} p(N=n, C=C_i \mid TT) = \sum\limits_{i=1,2} p(N=n \mid TT, C=C_i)\cdot p(C=C_i \mid TT)
\end{equation}

Let's compute both factors of equation (1):

\begin{equation*}
p(N=n \mid TT, C=C_i) = p(N=n \mid C=C_i) = p_i\cdot(1-p_i)^{n-1} \:\: \text{for } i=1,2
\end{equation*}
and 

\begin{equation*}
\begin{split}
p(C=C_i  \mid TT ) &  = \frac{ p(TT\mid C=C_i) \cdot p(C=C_i) }{\sum\limits_{j=1,2} p(TT\mid C=C_j) \cdot p(C=C_j)} = \frac{p(T\mid C=C_i)^2 \cdot p(C=C_i)}{\sum\limits_{j=1,2} p(T\mid C=C_j)^2 \cdot p(C=C_j)} \\[5pt]
&  =\begin{cases} \frac{\left(\frac{2}{5}\right)^2 \cdot \frac{1}{2}}{ \left(\frac{2}{5}\right)^2 \cdot \frac{1}{2} + \left(\frac{3}{5}\right)^2 \cdot \frac{1}{2} }\\
    \frac{\left(\frac{3}{5}\right)^2 \cdot \frac{1}{2}}{ \left(\frac{2}{5}\right)^2 \cdot \frac{1}{2} + \left(\frac{3}{5}\right)^2 \cdot \frac{1}{2} }
    \end{cases}
=\begin{cases} \frac{4}{13} \text{ if } i=1 \\
    \frac{9}{13} \text{ if } i=2
    \end{cases}
\end{split}
\end{equation*}
Combining the results and recalling the geometric series equality: $\sum_{n=1}^{\infty}nq^{n-1}=\dfrac{1}{(1-q)^2}$ for $q\in (0,1) $ we obtain:


\begin{equation}
\begin{split}
E[N\mid TT] & = \sum_{n=1}^{\infty} n \cdot p(N=n\mid TT) = \\[5pt]
& = \sum_{n=1}^{\infty} n \cdot \left( \frac{4}{13} p_1 (1-p_1)^{n-1} + \frac{9}{13} p_2 (1-p_2)^{n-1}  \right)\\[5pt]
& = \frac{4 p_1}{13} \cdot \sum_{n=1}^{\infty} n (1-p_1)^{n-1} + \frac{9p_2}{13} \cdot \sum_{n=1}^{\infty} n (1-p_2)^{n-1}\\[5pt]
& = \frac{4 p1}{13} \cdot \frac{1}{p_1^2} + \frac{9 p_2}{13} \cdot \frac{1}{p_2^2}\\[5pt]
& = \frac{4}{13} \cdot \frac{1}{p_1} + \frac{9}{13}\cdot \frac{1}{p_2} \approx 2.24
\end{split}
\end{equation}

# Exercise 2.3: Predictive distributions

(a) We have $y \sim Bin\left(n=1000, p=\frac{1}{6}\right)$, hence 
\begin{equation*}
\begin{split}
& E[y] = np = 166.7 \\
& \text{var}(y) = np(1-p) = 138.9 \approx 11.8^2
\end{split}
\end{equation*}

Therefore the normal approximation for $y$ is $N(166.7, 11.8^2)$ with distribution:

<img src="figures/fig2.2.png">

(b) Using [scipy.stats.norm.ppf](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) function: 
```python
import numpy as np
import scipy.stats as stats

v = stats.norm(loc=166.7, scale=11.8)
y = v.pdf(x)

print(np.rint(v.ppf([0.05, 0.25, 0.5, 0.75, 0.95])))

array([147., 159., 167., 175., 186.])
```
Notice that we round to the nearest integer since $y$ is an integer valued random variable.

---

Python code to generate the distribution in point (a):
```python
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go

v = stats.norm(loc=166.7, scale=11.8)
x = np.linspace(100, 250, 300)
y = v.pdf(x)

fig = go.Figure(
        go.Scatter(
            x=x,
            y=y
        )
)
fig.update_layout(
    title={'text': 'Fig 2.2 - Approx distribution for y',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'y'}, yaxis={'title': 'p(y)'})

fig

```

# Exercise 2.4: Predictive distributions

We want to approximate $y\mid \theta \sim N(\mu, \sigma^2)$.

We have: 
\begin{equation*}
\mu = E[y\mid \theta] = n \theta = 
\begin{cases}
    1000\frac{1}{12} \\
    1000\frac{1}{4} \\
    1000\frac{1}{6} \\
\end{cases}
= \begin{cases}
    83.3 \:\text{ for } \theta=\frac{1}{2}   \\
    250.0 \text{ for } \theta= \frac{1}{4} \\
    166.7 \text{ for } \theta=\frac{1}{6} \\
\end{cases}
\end{equation*}

and 
\begin{equation*}
\sigma^2 = n \theta (1-\theta)= 
\begin{cases}
    1000\frac{1}{12} \frac{11}{12} \\
    1000\frac{1}{4} \frac{3}{4} \\
    1000\frac{1}{6} \frac{5}{6}\\
\end{cases}
= \begin{cases}
    8.7^2 \text{ for } \theta=\frac{1}{2}   \\
    13.7^2 \text{ for } \theta= \frac{1}{4} \\
    11.8^2 \text{ for } \theta=\frac{1}{6} \\
\end{cases}
\end{equation*}

Therefore using conditional normal approximation:

\begin{equation*}
\begin{split}
p(y) & = \sum_\theta p(y, \theta) = \sum_\theta p(y \mid \theta) p(\theta)\\[5pt]
& \approx 0.25 \cdot N(83.3, 8.7^2) + 0.5\cdot N(166.7, 11.8^2) + 0.25\cdot N(250, 13.7^2)
\end{split}
\end{equation*}

Resulting in a multimodal distribution:

<img src="figures/fig2.3.png">


(b) To approximate quantiles:
```python
import numpy as np
import scipy.stats as stats

v = stats.norm(loc=[83.3, 166.7, 250], scale=[8.7, 11.8, 13.7])
x = np.linspace(0, 350, 10000)[:, None]

weights = np.array([0.25, 0.5, 0.25]).reshape(1,3)
y = np.sum(v.pdf(x)/v.pdf(x).sum(axis=0) * weights, axis=1)

quantiles = np.array([.05, .25, .50, .75, .95])
for q in quantiles:
    
    print(q, np.rint(np.mean(x[np.isclose(y.cumsum(), q, rtol = x[1])])))
    
0.05 76.0
0.25 120.0
0.50 167.0
0.75 209.0
0.95 263.0
```
As before we round to nearest integer since $y$ is discrete.

---

Python code to generate the distribution in point (a):
```python
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go

v = stats.norm(loc=[83.3, 166.7, 250], scale=[8.7, 11.8, 13.7])
x = np.linspace(0, 350, 10000)[:, None]

weights = np.array([0.25, 0.5, 0.25]).reshape(1,3)
y = np.sum(v.pdf(x)/v.pdf(x).sum(axis=0) * weights, axis=1)

fig = go.Figure(
        go.Scatter(
            x=x.squeeze(),
            y=y
        )
)
fig.update_layout(
    title={'text': 'Fig 2.3 - Approx distribution for y',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'y'}, yaxis={'title': 'p(y)'})

fig
```

 # Exercise 2.5: Posterior distribution as a compromise between prior information and data
 
(a) For each $k = 0, 1,\dots , n$, we have
\begin{equation*}
\begin{split}
Pr(y = k) & = \int^1_0 Pr(y = k\mid \theta)d\theta \\[5pt]
& = \int^1_0 {n \choose k} \theta^k (1-\theta)^{n-k} d\theta \\[5pt]
& = {n \choose k} \cdot \frac{\Gamma(k+1) \Gamma(n-k+1)}{\Gamma(n+2)}\\[5pt]
& = \frac{n! k! (n-k)!}{k!(n-k)!(n+1)!} = \frac{1}{n+1}\\[5pt]
\end{split}
\end{equation*}

(b) Choosing a $Beta(\alpha, \beta)$ prior for $\theta$ leads to the following beta posterior distribution: 

\begin{equation*}
    p(\theta \mid y) \propto Beta(\theta \mid \alpha + y, \beta + n -y) 
\end{equation*}

which has (posterior) mean 
\begin{equation*}
    E[\theta \mid y]= \frac{\alpha + y}{\alpha + \beta + n}
\end{equation*}

Now notice that 
\begin{equation*}
\begin{split}
E[\theta \mid y] & =  \frac{\alpha + \beta}{\alpha + \beta + n}\cdot E[\theta] + \frac{n}{\alpha + \beta + n}  \cdot \frac{y}{n}\\[5pt]
& = w_1 \cdot E[\theta]  + (1-w_1)  \cdot \frac{y}{n}  \\
\end{split}
\end{equation*}

i.e. $E[\theta \mid y]$ is a convex combination of $\frac{\alpha}{\alpha + \beta}$ and $\frac{y}{n}$, therefore it lies in between these two values.

(c) Let $\theta \sim Unif[0,1] = Beta(1,1)$, thus has variance $var(\theta) = \frac{1}{12}$. 
Given the data $y$, $n$, the posterior variance is 

\begin{equation}
\begin{split}
var(\theta \mid y) & = \frac{(1+y)(1+n-y)}{(1+y+1+n-y)^2(1+y+1+n-y+1)}\\[5pt]
& = \frac{(1+y)(1+n-y)}{(2+n)^2 (3+n)}\\[5pt]
& = \left(\frac{1+y}{2+n}\right) \cdot \left(\frac{1+n-y}{2+n}\right) \cdot \left(\frac{1}{3+n}\right)\\[5pt]
& < \frac{1}{4} \cdot \frac{1}{3} = \frac{1}{12}
\end{split}
\end{equation}

Since the first two factors $\left(\frac{1+y}{2+n}\right), \left(\frac{1+n-y}{2+n}\right)$ are in $(0,1)$ and sum to 1, therefore their product is less than or equal to $\frac{1}{4}$. The third factor is less than $\frac{1}{3} \:\: \forall n>0$.

(d) Let $\theta \sim Beta(\alpha, \beta)$ for some $\alpha, \beta >0$, its prior and posterio variances are 

\begin{equation*}
\begin{split}
& var(\theta) = \frac{\alpha \beta}{(\alpha+\beta)^2(\alpha + \beta + 1)}\\[5pt]
& var(\theta\mid y) = \frac{(\alpha+y)(\beta+n-y)}{(\alpha+\beta+n)^2(\alpha + \beta + n +1)}
\end{split}
\end{equation*}

Trying a brute force (terrible) python loop we can get few values:

|    |   a |   b |   n |   y |
|---:|----:|----:|----:|----:|
|  0 |   1 |   5 |   2 |   1 |
|  1 |   1 |   5 |   3 |   2 |
|  2 |   1 |   5 |   4 |   3 |
|  3 |   1 |   5 |   5 |   4 |
|  4 |   3 |   1 |   1 |   0 |
|  5 |   4 |   1 |   1 |   0 |

and so on..

Python code

---

```python
import pandas as pd

prior_var = lambda a,b: a*b/((a+b)**2 * (a+b+1))      
post_var = lambda a,b, n,y: (a+y)*(b+n-y)/((a+b+n)**2 *(a+b+n+1))   

df = pd.DataFrame(columns=list('abny'))
cnt = 0
for a in range(1, 6):
    for b in range(1, 6):
        prior = prior_var(a,b)
        for n in range(1, 6):
            for y in range(n):
                post = post_var(a,b,n,y)
                if prior < post:
                    df.loc[cnt] = (a,b,n,y)
                    cnt+=1
                    
print(df.to_markdown())
```

# Exercise 2.7: Noninformative prior densities:

(a) First and foremost let's deduce the natural parameter expressing $p(y\mid \theta)$ in an exponential form:

\begin{equation*}
\begin{split}
p(y \mid \theta) & = {n \choose y} \theta^y (1-\theta)^{n-y} \\[5pt]
& = {n \choose y} \text{exp}\Big( y \text{log}(\theta) + (n-y) \text{log}(1-\theta)\Big) \\[5pt]
& = {n \choose y} \text{exp}\Big( \text{log}\left(\frac{\theta}{1-\theta}\right) \cdot y \Big) \cdot \text{exp} \Big( n \text{log}(1-\theta)\Big) \\[5pt]
& = {n \choose y} (1-\theta)^n \text{exp}\Big( \text{log}\left(\frac{\theta}{1-\theta}\right) \cdot y \Big) \\[5pt]
\end{split}
\end{equation*}

Hence we have $f(y) = {n \choose y}$, $g(\theta) = (1-\theta)$, $t(y) = y$ and the natural parameter is $\phi(\theta) = \text{log}\left( \frac{\theta}{1-\theta}\right)$.

Now assume that the prior for $\phi$ is uniform i.e. $p(\phi) \propto 1$, then using the variable change to $\theta$ we get:

\begin{equation*}
\begin{split}
p(\theta) & = p(\phi) \Big| \frac{d\phi}{d\theta}\Big| \propto \frac{d \text{log}\left( \frac{\theta}{1-\theta}\right)}{ d\theta}\\[5pt]
& = \frac{d \Big(\text{log}(\theta) - \text{log}(1-\theta) \Big)}{ d\theta} \\[5pt]
& = \frac{1}{\theta} + \frac{1}{1-\theta} = \theta^{-1} \cdot (1-\theta)^{-1}
\end{split}
\end{equation*}

(b) For the posterior of $\theta$ we have:

\begin{equation*}
p(\theta\mid y) \propto p(y \mid \theta) p(\theta) = {n \choose y} \theta^{y-1} (1-\theta)^{n-y-1}
\end{equation*}

Therefore for the two cases we have:

\begin{equation*}
p(\theta\mid y) \propto 
\begin{cases}
\theta^{-1} (1-\theta)^{n-1} \text{ for } y=0 \\ 
\theta^{n-1} (1-\theta)^{-1} \text{ for } y=n 
\end{cases}
\end{equation*}

which is not integrable near $0$, $1$ respectively.

# Exercise 2.8: Normal distribution with unknown mean

We have $\bar{y} = 150$, $y_i \sim N(\theta, \sigma^2 = 20^2)$ and $\theta \sim N(\mu_0 = 180, \tau^2_0 = 40^2)$ 

(a) As developed in the chapter theory $\theta \mid y \sim N(\mu_1, \tau^2_1)$ with:

\begin{equation*}
\begin{split}
\mu_1 & = \frac{ \frac{1}{\tau^2_0} \mu_0 + \frac{n}{\sigma^2} \bar{y}}{\frac{1}{\tau^2_0} + \frac{n}{\sigma^2}} =  \frac{ \mu_0\sigma^2 + n\tau^2_0\bar{y}}{\sigma^2 +  \tau^2_0 n } = \frac{180\cdot 20^2 + 150 \cdot 40^2 n}{20^2 + 40^2 n}\\[5pt]
\tau^2_1 & = \frac{1}{\frac{1}{\tau^2_0} + \frac{n}{\sigma^2}} = \frac{ \sigma^2 + \tau^2_0}{\sigma^2 +  \tau^2_0 n} = \frac{20^2 \cdot 40^2}{20^2 + 40^2 n}
\end{split}
\end{equation*}

(b) Let $\tilde{y}$ be a new observation, then as developed in the chapter theory:

\begin{equation*}
\tilde{y} \mid y \sim N(\mu_1, \sigma^2+\tau^2_1) 
\end{equation*}

with $\mu_1$, $\sigma^2$ and $\tau^2_1$ as above.

(c) & (d) Let's use python to get both answers:

```python
import scipy.stats as stats
import numpy as np

def get_params(n):
    """
    Returns mu and tau^2 based on n
    """
    
    d = (20**2 + 40**2 * n)
    mu = (180* 20**2 + 150 * 40**2 * n)/d
    tau2 = (20*40)**2/d
    return mu, tau2

for n in [10, 100]:
    
    mu, t2 = get_params(n=n)
    theta_dist = stats.norm(loc=mu, scale=np.sqrt(t2))
    y_dist = stats.norm(loc=mu, scale=np.sqrt(t2 + 20**2))
    
    print(f'n={n}', '-'*5,
          f'95% posterior interval for θ: {theta_dist.ppf([0.025, 0.975]).round(2).tolist()}', 
          f'95% posterior interval for y: {y_dist.ppf([0.025, 0.975]).round(2).tolist()}',
          '\n',
          sep='\n')
          
n=10
-----
95% posterior interval for θ: [138.49, 162.98]
95% posterior interval for y: [109.66, 191.8]


n=100
-----
95% posterior interval for θ: [146.16, 153.99]
95% posterior interval for y: [110.68, 189.47]
```

Remark that the posterior is for $\tilde{y}$, I just couldn't manage to write it in python.

# Exercise 2.9: Setting parameters for a beta prior distribution

Let $\theta \sim \text{Beta}(\alpha, \beta)$ be such that $E[\theta] = \frac{6}{10}$ and $\text{std}(\theta) = \frac{3}{10}$.

(a) To determine $\alpha$ and $\beta$ we solve the system of equations:

\begin{equation*}
\begin{cases}
\alpha, \beta >0 \\
\frac{\alpha}{\alpha + \beta} = \frac{6}{10}\\
\frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} = \big(\frac{3}{10} \big)^2
\end{cases}
\Longrightarrow
\begin{cases}
\alpha, \beta >0 \\
\beta = \frac{2}{3} \alpha\\
\frac{2}{3}\alpha^2 = \big(\frac{3}{10} \big)^2 \big(\frac{5}{3} \big)^2\alpha^2 \big(\frac{5}{3}\alpha + 1 \big)
\end{cases}
\Longrightarrow
\begin{cases}
\alpha = 1\\
\beta = \frac{2}{3}
\end{cases}
\end{equation*}

Hence $\theta \sim \text{Beta}(1, 0.67)$
<img src="figures/fig2.4.png">

---

Python code:
```python
import scipy.stats as stats 
import numpy as np
import plotly.graph_objects as go

t = np.linspace(0, 1, 100)
pt = stats.beta(a=1, b=0.67)

fig = go.Figure(
        go.Scatter(
            x=t,
            y=pt.pdf(t)
        )
)

fig.update_layout(
    title={'text': 'Fig 2.4 - Beta(1,0.67) distribution',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'θ'}, yaxis={'title': 'p(θ)'})

fig
```

(b) Assume $y\mid \theta \sim \text{Bin}(n, \theta)$, then:

\begin{equation*}
p(\theta \mid y) \propto \theta^{651} (1-\theta)^{350.67}
\end{equation*}

Hence $\theta \mid y \sim \text{Beta}(651, 350.67)$ which has mean and variance:

\begin{equation*}
\begin{split}
& E\big[ \theta \mid y \big] = \frac{651}{651 + 350.67} \approx 0.6499 \\
& \text{var}(\theta \mid y) = \frac{651 \cdot 350.67}{(651 + 350.67)^2(651 + 350.67 + 1)} \approx 0.00022 = 0.015^2
\end{split}
\end{equation*}
<img src="figures/fig2.5.png">

```python
import scipy.stats as stats 
import numpy as np
import plotly.graph_objects as go

t = np.linspace(0.5, 0.8, 500)
pt = stats.beta(a=651, b=350.67)

fig = go.Figure(
        go.Scatter(
            x=t,
            y=pt.pdf(t)
        )
)

fig.update_layout(
    title={'text': 'Fig 2.5 - Beta(651, 350.67) distribution',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'title': 'θ'}, yaxis={'title': 'p(θ|y)'})

fig
```

(c) Trying some different values of $\alpha$ and $\beta$:

```python
import pandas as pd
import numpy as np
import scipy.stats as stats

alpha_beta = np.array([10, 20, 100, 200, 500, 1000, 2000])
df = (pd.DataFrame({'prior_mean': [0.6]*alpha_beta.size,
                    'prior_alpha+beta': alpha_beta})
      .assign(prior_alpha = lambda x: x[['prior_alpha+beta', 'prior_mean']].prod(axis=1).astype(int))
      .assign(prior_beta = lambda x: x['prior_alpha+beta'] - x['prior_alpha'])
      .assign(prior_std = lambda x: stats.beta(a=x['prior_alpha'], b=x['prior_beta']).std())
      .assign(post_alpha = lambda x: x['prior_alpha'] + 650)
      .assign(post_beta = lambda x: x['prior_beta'] + 350)
      .assign(post_mean = lambda x: x['post_alpha']/(x[['post_alpha', 'post_beta']].sum(axis=1)))
      .assign(post_std = lambda x: stats.beta(a=x['post_alpha'], b=x['post_beta']).std())
     )

df.to_markdown()
```

|    |prior_mean |prior_alpha+beta |prior_alpha |prior_beta |prior_std |post_alpha|post_beta|post_mean|post_std |
|---:|----------:|----------------:|-----------:|----------:|---------:|---------:|--------:|--------:|--------:|
|  0 |       0.6 |              10 |          6 |         4 |    0.147 |      656 |     354 | 0.64950 | 0.01500 |
|  1 |       0.6 |              20 |         12 |         8 |   0.1069 |      662 |     358 | 0.64902 | 0.01493 |
|  2 |       0.6 |             100 |         60 |        40 |   0.0487 |      710 |     390 | 0.64545 | 0.01441 |
|  3 |       0.6 |             200 |        120 |        80 |   0.0345 |      770 |     430 | 0.64166 | 0.01383 | 
|  4 |       0.6 |             500 |        300 |       200 |   0.0218 |      950 |     550 | 0.63333 | 0.01243 | 
|  5 |       0.6 |            1000 |        600 |       400 |   0.0154 |     1250 |     750 | 0.625   | 0.01082 | 
|  6 |       0.6 |            2000 |       1200 |       800 |   0.0109 |     1850 |    1150 | 0.61666 | 0.008875|

# Exercise 2.10: Discrete sample spaces

(a) We have $p(N) = \big(\frac{1}{100}\big)\big(\frac{99}{100}\big)^{N−1}$, for $N=1,2,...$. Since we observe $y = 203$, it follows that $N\geq 203$. 

Assume that the phrase "You see a cable car at random" means "uniformely at random" (we are given no further information). Therefore:

\begin{equation*}
p(y\mid N) = \begin{cases}
\frac{1}{N} & \text{for } N\geq 203\\
0 & \text{otherwise} 
\end{cases}
\end{equation*}

Hence for the posterior distribution of $N$ we get:

\begin{equation*}
p(N \mid y) \propto p(y \mid N) \cdot p(N) = \begin{cases}
\frac{1}{N} \big(\frac{1}{100}\big)\big(\frac{99}{100}\big)^{N−1} \propto \frac{1}{N} \big(\frac{99}{100}\big)^{N} & \text{for } N\geq 203\\
0 & \text{otherwise} 
\end{cases}
\end{equation*}


(b) From the previous point we have that 
\begin{equation*}
p(N \mid y) = \begin{cases} c \cdot \frac{1}{N} \big(\frac{99}{100}\big)^{N} & \text{for } N\geq 203\\
0 & \text{otherwise} 
\end{cases}
\end{equation*}
for some constant $c \in \mathbb{R}$.

Let's compute such constant by imposing $1 = \sum_{N=1}^\infty p(N\mid y)$. The right hand side is:

\begin{equation*}
\begin{split}
\sum_{N=1}^\infty p(N\mid y) & =  c \cdot \sum_{N=203}^\infty \frac{1}{N} \left(\frac{99}{100}\right)^{N} \overset{(*)}{\approx} c\cdot 0.0466
\end{split}
\end{equation*}

Therefore $c \approx \frac{1}{0.0466} = 21.47$, and 
\begin{equation*}
p(N \mid y) = \begin{cases} 21.47 \cdot \frac{1}{N} \big(\frac{99}{100}\big)^{N} & \text{for } N\geq 203\\
0 & \text{otherwise} 
\end{cases}
\end{equation*}

Now 

\begin{equation*}
\begin{split}
E[N \mid y] & = 21.47 \cdot \sum_{N=203}^\infty N \cdot \frac{1}{N} \left(\frac{99}{100}\right)^{N}\\
& = 21.47 \cdot \sum_{N=203}^\infty \left(\frac{99}{100}\right)^{N} \\
& = 21.47 \cdot \sum_{N=0}^\infty \left(\frac{99}{100}\right)^{N+203} \\
& = 21.47 \cdot \left(\frac{99}{100}\right)^{203} \cdot \frac{1}{1-0.99} = 279.11\\[10pt]
\text{std}(E\mid y) & = \sqrt{ 21.47 \cdot \sum_{N=203}^\infty \frac{(N - 279.11)^2}{N} \left(\frac{99}{100}\right)^{N}} \overset{(*)}{\approx} 79.96
\end{split}
\end{equation*}


(*) Numerical approximation:
```python
import numpy as np
n = np.arange(203, 10000)
q = .99**n

s = np.sum(q/n)
print(s, 1/s)

0.046580 21.468290

print(np.sqrt(21.47 * np.sum(np.square(n-279.11)/n * q)))
79.96

```


# Exercise 2.11: Computing with a nonconjugate single-parameter model




(a) We have $y_1, \dots, y_5 \mid \theta \sim t_1(\theta, \sigma^2_0)$ for some fixed $\sigma_0>0$, hence 

\begin{equation*}
p(y_i \mid \theta) \approx \frac{1}{(1 + (y_i -\theta)^2)}
\end{equation*}

and $\theta \sim \text{Unif}[0,100]$. Therefore:

\begin{equation*}
\begin{split}
p(\theta \mid y) & \propto p(y\mid \theta) p(\theta) \\
& = \prod_{i=1}^5 p(y_i \mid \theta) p(\theta) \\
& = \prod_{i=1}^5 \frac{1}{(1 + (y_i -\theta)^2)} \cdot \frac{1}{100} \\
& \propto \prod_{i=1}^5 \frac{1}{(1 + (y_i -\theta)^2)}
\end{split}
\end{equation*}

```python
import numpy as np
import plotly.graph_objects as go

def get_posterior(theta, y):
    
    return np.prod(1/(1 + np.square(y - theta[:,None])), axis=1)  

theta = np.linspace(0, 100, 100_000)
y = np.array([43, 44, 45, 46.5, 47.5])
posterior = get_posterior(theta, y)

fig = go.Figure(
        go.Scatter(
            x=theta,
            y=posterior/(theta[1] * posterior.sum()) #normalizing
        )
)

fig.update_layout(
    title={'text': 'Fig 2.6 - Posterior distribution for θ',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'range':[30, 60], 'title': 'θ'},
    yaxis={'title': 'p(θ|y)'}
)
fig
```
<img src="figures/fig2.6.png">

(b)
```python
sample = np.random.choice(theta, p=posterior/posterior.sum(), size=1000)

fig = go.Figure(
        go.Histogram(
            x=sample))

fig.update_layout(
     title={'text': 'Fig 2.7 - Histogram of samples from p(θ|y)',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'range':[30, 60], 'title': 'θ'},
    yaxis={'title': 'Count'}
)
fig
```
<img src="figures/fig2.7.png">

(c)
```python

y6 = stats.cauchy.rvs(loc=sample, scale=1, size=sample.size)

fig = go.Figure(
        go.Histogram(
            x=y6,
            xbins={'start':0, 'end':100, 'size':2}))

fig.update_layout(
     title={'text': 'Fig 2.8 - Histogram of samples from p(y6|y)',
           'y':0.9, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
    xaxis={'range':[0, 100], 'title': 'y6'},
    yaxis={'title': 'Count'}
)
fig
```
<img src="figures/fig2.8.png">

# Exercise 2.12: Jeffreys’ prior distributions

We have $y \mid \theta \sim \text{Poisson}(\theta)$, hence $\text{log} \left(p(y\mid \theta) \right) = y \text{log}(\theta) - \theta - \text{log}(y!)$ and 

\begin{equation*}
\begin{split}
J(\theta) & = - E\left[ \frac{d^2 \text{log}(p(y\mid \theta) }{d\theta^2} \Big| \theta\right]\\
& = - E\left[ \frac{d \left(\frac{y}{\theta} - 1 \right)}{d\theta} \Big| \theta\right]\\
& = - E\left[ \frac{d \left(\frac{y}{\theta} - 1 \right)}{d\theta} \Big| \theta\right]\\
& = - E\left[- \frac{y}{\theta^2} \Big| \theta\right]\\
& = \frac{E\left[y \mid \theta\right]}{\theta^2}\\
& = \frac{\theta}{\theta^2} = \theta^{-1}
\end{split}
\end{equation*}

Therefore Jeffreys's density is $p(\theta) \propto \sqrt{J(\theta)} = \theta^{-\frac{1}{2}}$, and $\theta \sim \text{Gamma}(\frac{1}{2}, 0)$  

# Exercise 2.13: Discrete data

|    |Year |Fatal accidents| Passenger deaths| Death rate|
|---:|----:|--------------:|----------------:|----------:|
|    |1976 |            24 |             734 |      0.19 |
|    |1977 |            25 |             516 |      0.12 |
|    |1978 |            31 |             754 |      0.15 |
|    |1979 |            31 |             877 |      0.16 |
|    |1980 |            22 |             814 |      0.14 |
|    |1981 |            21 |             362 |      0.06 |
|    |1982 |            26 |             764 |      0.13 |
|    |1983 |            20 |             809 |      0.13 |
|    |1984 |            16 |             223 |      0.03 |
|    |1985 |            22 |             1066|      0.15 |


# Exercise 2.14: Algebra of the normal model

(a) 
\begin{equation*}
\begin{split}
p(\theta \mid y ) & \propto \text{exp}\left( -\frac{1}{2} \left( \frac{(y-\theta)^2}{\sigma^2} + \frac{(\theta - \mu_0)^2}{\tau^2_0} \right) \right) \\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{(\tau^2_0 + \sigma^2) \theta^2  - 2\theta (\tau^2_0 y + \sigma^2 \mu_0)}{\tau^2_0\sigma^2} \right) \right) \\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{\tau^2_0 + \sigma^2}{\tau^2_0\sigma^2} \right)  \left( \theta - \left(\frac{\tau^2_0 y  + \sigma^2 \mu_0}{\tau^2_0 + \sigma^2} \right) \right)^2 \right) \\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{1}{\tau^2_0} + \frac{1}{\sigma^2} \right)  \left( \theta - \left(\frac{\mu_0/\tau^2_0 + y/\sigma^2}{1/\tau^2_0 + 1/\sigma^2} \right) \right)^2 \right) \\[5pt]
& = \text{exp}\left( -\frac{1}{2\tau^2_1} \left( \theta - \mu_1\right)^2\right) \\[5pt]
\end{split}
\end{equation*}

where $\mu_1 = \left(\frac{\mu_0/\tau^2_0 + y/\sigma^2}{1/\tau^2_0 + 1/\sigma^2} \right)$ and $\frac{1}{\tau^2_1} = \left( \frac{1}{\tau^2_0} + \frac{1}{\sigma^2} \right)$ 

For multiple observations $y = (y_1, \dots, y_n)$, we have

\begin{equation*}
\begin{split}
p(\theta \mid y) & \propto p(y \mid \theta) p(\theta) = p(\theta) \prod_{i=1}^n p(y_i \mid \theta) \\[5pt]
& \propto \text{exp}\left( -\frac{1}{2\tau^2_0}(\theta - \mu_0)^2 \right) \prod_{i=1}^n \left( -\frac{1}{2\sigma^2}(y_i - \theta)^2 \right) \\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{1}{\tau^2_0}(\theta - \mu_0)^2 + \frac{1}{\sigma^2} \sum_{i=1}^n (y_i - \theta)^2 \right) \right)\\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{1}{\tau^2_0}(\theta^2 - 2\theta\mu_0) + \frac{1}{\sigma^2} \sum_{i=1}^n (\theta^2 - 2\theta y_i) \right) \right)\\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{\sigma^2 (\theta^2 - 2\theta\mu_0) + \tau^2_0 \sum_{i=1}^n (\theta^2 - 2\theta y_i)}{\tau^2_0\sigma^2}  \right) \right)\\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{\theta^2(\sigma^2 + n\tau^2_0) - 2\theta(\sigma^2 \mu_0 + \tau^2_0 n\bar{y}}{\tau^2_0\sigma^2}  \right) \right)\\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{\sigma^2 + n\tau^2_0}{\sigma^2\tau^2_0} \right) \left( \theta^2 - 2\theta \left( \frac{\mu_0\sigma^2 + \tau^2_0 n \bar{y}}{\sigma^2 + n\tau^2_0}\right) \right) \right)\\[5pt]
& \propto \text{exp}\left( -\frac{1}{2} \left( \frac{1}{\tau^2_0} + \frac{n}{\sigma^2} \right) \left( \theta - \left( \frac{\mu_0/\tau^2_0 + \bar{y} n/\sigma^2}{1/\tau^2_0 + n/\sigma^2 }\right) \right)^2 \right)\\[5pt]
\end{split}
\end{equation*}

which proves (2.12) equation.

(b) Given one data point $y_1$, we saw that 

\begin{equation*}
\begin{split}
& \mu_1 = \left(\frac{\mu_0/\tau^2_0 + y_1/\sigma^2}{1/\tau^2_0 + 1/\sigma^2} \right)\\
& \frac{1}{\tau^2_1} = \left( \frac{1}{\tau^2_0} + \frac{1}{\sigma^2} \right)
\end{split}
\end{equation*}

Now let's use induction:
\begin{itemize}
\item[Base case]: Let $y_2$ be a new data point, then:

\begin{equation*}
\begin{split}
\mu_2 & = \left(\frac{\mu_1/\tau^2_1 + y_2/\sigma^2}{1/\tau^2_1 + 1/\sigma^2} \right)\\[5pt]
& = \frac{\left( 1/\tau^2_0 + 1/\sigma^2\right) \frac{\mu_0/\tau^2_0 + y_1/\sigma^2}{1/\tau^2_0 + 1/\sigma} + y_2/\sigma^2}{1/\tau^2_0 + 1/\sigma^2 + 1/\sigma^2} \\[5pt]
& = \frac{ \frac{\mu_0}{\tau^2_0} + \frac{y_1 + y_2}{\sigma^2}}{ \frac{1}{\tau^2_0} + \frac{2}{\sigma^2}} = \frac{ \frac{\mu_0}{\tau^2_0} + \frac{2 \bar{y}}{\sigma^2}}{ \frac{1}{\tau^2_0} + \frac{2}{\sigma^2}}
\end{split}
\end{equation*}

and 

\begin{equation*}
\begin{split}
\frac{1}{\tau^2_2} & = \frac{1}{\tau^2_1} + \frac{1}{\sigma^2} = \frac{1}{\tau^2_0} + \frac{1}{\sigma^2} + \frac{1}{\sigma^2} =  \frac{1}{\tau^2_0} + \frac{2}{\sigma^2}
\end{split}
\end{equation*}

\item[Inductive Step]: Suppose the result holds for $n-1$, then:

\begin{equation*}
\begin{split}
\mu_n & = \left(\frac{\mu_{n-1}/\tau^2_{n-1} + y_n/\sigma^2}{1/\tau^2_{n-1} + 1/\sigma^2} \right)\\[5pt]
& = \frac{\frac{\mu_0}{\tau^2_0} + \frac{\sum_i y_i}{\sigma^2}}{\frac{1}{\tau^2_0} + \frac{n}{\sigma^2}} = \frac{\frac{\mu_0}{\tau^2_0} + \frac{n \bar{y}}{\sigma^2}}{\frac{1}{\tau^2_0} + \frac{n}{\sigma^2}}
\end{split}
\end{equation*}

and 

\begin{equation*}
\begin{split}
\frac{1}{\tau^2_n} & = \frac{1}{\tau^2_{n-1}} + \frac{1}{\sigma^2} = \frac{1}{\tau^2_0} + \frac{n-1}{\sigma^2} + \frac{1}{\sigma^2} =  \frac{1}{\tau^2_0} + \frac{n}{\sigma^2}
\end{split}
\end{equation*}
\end{itemize}

# Exercise 2.15: Beta distribution

Let $Z\sim \text{Beta}(\alpha, \beta)$.

\begin{equation*}
\begin{split}
E[Z^m(1-Z)^n] & = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot \int^1_0 z^m(1-z)^n z^{\alpha-1}(1-z)^{\beta -1} dz\\
& = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot \int^1_0z^{\alpha + m-1}(1-z)^{\beta+n -1} dz\\
& = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot \frac{\Gamma(\alpha + m) \Gamma(\beta + n)}{\Gamma(\alpha + \beta + m +n)}
\end{split}
\end{equation*}

For the mean:

\begin{equation*}
\begin{split}
E[Z] & = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot \frac{\Gamma(\alpha + 1) \Gamma(\beta)}{\Gamma(\alpha + \beta + 1)}\\
& = \frac{(\alpha + \beta -1)!}{(\alpha-1)! (\beta-1)!} \cdot \frac{\alpha!(\beta-1)!}{(\alpha + \beta)!} \\
& = \frac{\alpha}{\alpha + \beta}
\end{split}
\end{equation*}

For the variance, let's first compute:

\begin{equation*}
\begin{split}
E[Z^2] & = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot \frac{\Gamma(\alpha + 2) \Gamma(\beta)}{\Gamma(\alpha + \beta + 2)}\\
& = \frac{\alpha(\alpha +1) }{(\alpha + \beta)(\alpha + \beta +1)}
\end{split}
\end{equation*}

and then we get:

\begin{equation*}
\begin{split}
\text{var}(Z) & = E[Z^2] - E[Z]^2 \\
& = \frac{\alpha(\alpha +1) }{(\alpha + \beta)(\alpha + \beta +1)} - \left(\frac{\alpha}{\alpha + \beta}\right)^2\\
& = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}
\end{split}
\end{equation*}


# Exercise 2.16

Let $y \mid \theta \sim \text{Bin}(n, \theta)$ and $\theta \sim \text{Beta}(\alpha, \beta)$.

(a) 
\begin{equation*}
\begin{split}
p(y) & = \int^1_0 p(y, \theta) d\theta = \int^1_0 p(y\mid \theta) p(\theta) d\theta \\[5pt]
& = \int^1_0 \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} {n \choose y} \theta^y (1-\theta)^{n-y} \theta^{\alpha-1}(1-\theta)^{\beta-1} d\theta \\[5pt]
& = {n \choose y} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + y) \Gamma(\beta + n -y)}{\Gamma(\alpha + \beta + n)} \\[5pt]
& = \frac{\Gamma(n+1)}{\Gamma(y+1)\Gamma(n-y+1)} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + y) \Gamma(\beta + n -y)}{\Gamma(\alpha + \beta + n)} \\[5pt]
\end{split}
\end{equation*}
which is the beta-binomial distribution as one can check in the Appendix A.

(b) Suppose that $p(y) = constant$, then:

\begin{equation*}
\begin{cases}
p(0) = p(n) \\
p(0) = p(1)
\end{cases}
\Leftrightarrow \begin{cases}
\Gamma(\alpha) \Gamma(\beta + n) = \Gamma(\alpha + n) \Gamma(\beta) \\
\Gamma(\alpha) \Gamma(\beta + n) = n \Gamma(\alpha + 1) \Gamma(\beta + n -1)
\end{cases}
\Leftrightarrow \begin{cases}
\alpha = \beta\\
\beta + n -1 = n \alpha
\end{cases}
\Leftrightarrow \begin{cases}
\alpha = \beta\\
(n -1) = (n -1) \alpha
\end{cases}
\end{equation*}

Therefore if $n>1$, $\alpha = \beta = 1$.