Tomoki Okuno has solutions to some questions, and solutions manual also has some solutions. I have tried to solve as many from the rest (i.e. from the ones whose solutions are not in these soluton manuals) as possible.

In [1]:
import numpy as np
import pandas as pd

import statsmodels.api as sm
import statsmodels.stats.api as sms
import pylab as py
import scipy.linalg as la
import statistics
import scipy.stats as stats
import scipy

from math import gamma as tma
import itertools
from scipy.stats import laplace
from scipy.stats import logistic
from scipy.stats import cauchy
from scipy.stats import binom
from scipy.stats import weibull_min as weibull
from scipy.stats import poisson
from scipy.stats import gamma
from scipy.stats import beta
from scipy.stats import norm
from scipy.stats import multivariate_normal as mnorm
from scipy.stats import t as studt
from scipy.stats import f as fdist
from scipy.stats import chisquare as chisq
from scipy.stats import chi2
from scipy.stats import gaussian_kde as gkde
from sklearn.neighbors import KernelDensity
import math
import sympy as sym
import random
import seaborn as sns

import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import matplotlib.patches as mpatches
from matplotlib.cbook import boxplot_stats

import warnings
warnings.filterwarnings('ignore')

#### Exercise 7.6.2. 

Let $X_1, X_2,\cdots, X_n$ denote a random sample from a distribution that is $N(0,\theta)$. Then Y = $\sum X_i^2$ is a complete sufficient statistic for $\theta$. Find the MVUE of $\theta^2$.


#### Solution:

Tomoki and the solution manual writer take the approach where they use the fact that $Y/\theta \sim \chi^2(n).$ There is another approach.

Since $Y$ is a complete sufficient statistic of $\theta$, it is a sufficient statistic for $\theta^2$ as well (see https://stats.stackexchange.com/q/654163/183497). Similar result holds for completeness. So if there is a function of our complete sufficient statistic $Y$ that is unbiased for $\theta^2$, then we have our MVUE.

We know that MLE for $\theta$ (with a known mean of zero) is $Y/n = 1/n \sum X_i^2$ (this has been proved in the text somewhere but you can also see that on SE like in https://math.stackexchange.com/a/2187053/145325). So MLE of $\theta^2$ is $Y^2/n^2$. But we need MVUE so perhaps it is of the same for but for a correction for bias. So we need $E(Y^2/n^2)$. Since $Y^2 = \left ( \sum X_i^2 \right )^2$, 

\begin{align} E(Y^2) &= \sum E(X_i^4) + \sum_{i \neq j} E(X_i^2)E(X_j^2) \\
&= nE(X_1^4)+n(n-1)\left ( E(X_1^2) \right )^2 \\
&= 3n\theta^2 + n(n-1)\theta^2 \\
&= n(n+2)\theta^2
\end{align}

Hence we have $$E\left[ \frac{Y^2}{n(n+2)} \right] = \theta^2$$ and as $Y$ is a complete sufficient statistic for $\theta^2$ as well, we have that MVUE of $\theta^2 = Y^2/[n(n+2)]$.

In [2]:
# As usual, the R code was converted to Python using the online code converter in
# https://www.codeconvert.ai/r-to-python-converter

In [3]:
def bootse1(x, nb=3000):
    n = len(x)
    coll = []
    for i in range(nb):
        xstar = np.random.choice(x, n, replace=True)
        coll.append(np.mean(xstar))
    se1 = np.std(coll,ddof=1)
    return coll,se1

In [4]:
def bootse2(x, nb=3000):
    n = len(x)
    xb = np.mean(x)
    s = np.std(x,ddof=1)
    coll = []
    for i in range(nb):
        xstar = np.random.normal(loc=xb,scale=s,size=n)
        coll.append(np.mean(xstar))
    se2 = np.std(coll,ddof=1)
    return coll,se2

In [5]:
xexmp7p6p4 = [27.5,50.9,71.1,43.1,40.4,44.8,36.6,53.5,65.2,47.7]
xexmp7p6p4.extend([75.7,55.4,61.1,39.8,33.4,57.6,47.9,60.7,27.8,65.2])

coll1,se1 = bootse1(xexmp7p6p4)
coll2,se2 = bootse2(xexmp7p6p4)
print(se1)
print(se2)

2.9984932512927625
3.054082092831336


#### Exercise 7.6.3. 

Consider Example $7.6.3$ where the parameter of interest is $P (X < c)$ for $X$ distributed $N (\theta, 1)$. Modify the R function bootse1.R so that for a specified value of $c$ it returns the MVUE of $P(X < c)$ and the bootstrap standard error of the estimate. Run your function on the data in ex763data.rda with $c = 11$ and $3000$ bootstraps. These data are generated from a $N(10,1)$ distribution. Report (a) the true parameter, (b) the MVUE, and (c) the bootstrap standard error.

#### Answer (from the back of the book)
(a) $0.8413$; 

(b) $0.7702$

(c) Our run $0.0584$.


#### Solution:

In [6]:
def bootcdfse1(x,c,nb=3000):
    n = len(x)
    coll = []
    variable = (c-np.mean(x))*np.sqrt(n/(n-1))
    est = norm.cdf(variable,loc=0,scale=1)
    for i in range(nb):
        xstar = np.random.choice(x, n, replace=True)
        thisxb = np.mean(xstar)
        thisn = len(xstar)
        thisvariable = (c-thisxb)*np.sqrt(thisn/(thisn-1))
        thiscdf = norm.cdf(thisvariable,loc=0,scale=1)
        coll.append(thiscdf)
    cdfse1 = np.std(coll,ddof=1)
    return [est,cdfse1]

In [7]:
def bootcdfse2(x,c,nb=3000):
    n = len(x)
    xb = np.mean(x)
    s = np.std(x,ddof=1)
    coll = []
    variable = (c-np.mean(x))*np.sqrt(n/(n-1))
    est = norm.cdf(variable,loc=0,scale=1)
    for i in range(nb):
        xstar = np.random.normal(loc=xb,scale=s,size=n)
        thisxb = np.mean(xstar)
        thisn = len(xstar)
        thisvariable = (c-thisxb)*np.sqrt(thisn/(thisn-1))
        thiscdf = norm.cdf(thisvariable,loc=0,scale=1)
        coll.append(thiscdf)
    cdfse2 = np.std(coll,ddof=1)
    return [est,cdfse2]

In [8]:
# (a) We have the result using the fact that this data is a realization of N(10,1)

c = 11
norm.cdf(c,loc=10,scale=1)

0.8413447460685429

In [9]:
# (b) We can get the MVUE from the result in example 7.6.3

data = pd.read_csv('data/ex763data.csv')
xexrc7p6p3 = np.array(data['x'].to_list())

thisxb = np.mean(xexrc7p6p3)
thisn = len(xexrc7p6p3)
thisvariable = (c-thisxb)*np.sqrt(thisn/(thisn-1))
norm.cdf(thisvariable,loc=0,scale=1)

0.7701595009361227

In [10]:
# (c) We bootstrap re-sample and find the empirical cdf, and report the std using the function bootcdfse1

[est1,cdfse1] = bootcdfse1(xexrc7p6p3,c)
print(cdfse1,est1)

[est2,cdfse2] = bootcdfse2(xexrc7p6p3,c)
print(cdfse2,est2)

0.05782486515325088 0.7701595009361227
0.06024018884285876 0.7701595009361227


#### Exercise 7.6.4. 

For Example $7.6.4$, modify the R function bootse1.R so that the estimate is the median not the mean. Using $3000$ bootstraps, run your function on the data set discussed in the example and report (a) the estimate and (b) the bootstrap standard error.

#### Solution:

Answer from the back of the book is 

(a) $49.4$; 

(b) Our run: $4.405.$

In [11]:
def bootse1_median(x, nb=3000):
    n = len(x)
    coll = []
    est = np.median(x)
    for i in range(nb):
        xstar = np.random.choice(x, n, replace=True)
        coll.append(np.median(xstar))
    semd = np.std(coll,ddof=1)
    return [est,semd]

In [12]:
# (a)

np.median(xexmp7p6p4)

49.4

In [13]:
# (b)

[est,semd] = bootse1_median(xexmp7p6p4)
print(est,semd)

49.4 4.516307898279994


#### Exercise 7.6.5. 

Let $X_1, X_2, \cdots , X_n$ be a random sample from a uniform $(0, \theta)$ distribution. Continuing with Example $7.6.2$, find the MVUEs for the following functions of $\theta$.

(a) $g(\theta) = \theta^2/12$ , i.e., the variance of the distribution.

(b) $g(\theta) = 1/\theta$ , i.e., the pdf of the distribution.

(c) For $t$ real, $g(\theta) = \frac{e^{t\theta}−1}{t\theta}$, i.e., the mgf of the distribution.

#### Solution:

All of these are straightforward applications of equation $(7.6.2)$ namely

$$u(Y_n) = g(Y_n)+\frac{Y_n}{n}g'(Y_n).$$

(a) MVUE of $g(\theta) = \theta^2/12$ is $$u(Y_n) = \frac{n+2}{n}\frac{Y_n^2}{12}.$$

(b) MVUE of $g(\theta) = 1/\theta$ is $$u(Y_n) = \frac{n-1}{nY_n}.$$

(c) MVUE of $g(\theta) = \frac{e^{t\theta}−1}{t\theta}$ is $$u(Y_n) = \frac{e^{tY_n}−1}{tY_n^2}[(1+t)Y_n-1].$$

#### Exercise 7.6.8. 

As in Example $7.6.3$, let $X_1,X_2,\cdots,X_n$ be a random sample of size $n > 1$ from a distribution that is $N(\theta,1)$. Show that the joint distribution of $X_1$ and $\overline{X}$ is bivariate normal with mean vector $(\theta,\theta)$, variances $\sigma^2 = 1$ and $\sigma^2 = 1/n$, and correlation coefficient $\rho = 1/\sqrt{n}.$

#### Solution:

This is a direct application of Theorem $3.5.2$. Here, 

\begin{equation}
\textbf{A} = 
\begin{bmatrix}
1 & 0 & 0 & \cdots & 0 \\
\frac{1}{n} & \frac{1}{n} & \frac{1}{n} & \cdots & \frac{1}{n}
\end{bmatrix}, ~~
\textbf{X} = 
\begin{bmatrix}
X_1 \\
X_2 \\
. \\
. \\
. \\
X_n
\end{bmatrix},~~
\textbf{b} = 
\begin{bmatrix}
0 \\
0
\end{bmatrix},~~
\textbf{Y} = 
\begin{bmatrix}
X_1 \\
\overline{X}
\end{bmatrix},~~
\pmb{\mu} = 
\begin{bmatrix}
\theta \\
\theta
\end{bmatrix},~~
\pmb{\Sigma} = 
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}.
\end{equation}

We know from that theorem that $\textbf{AX}+\textbf{b} \sim N_2(\textbf{A}\pmb{\mu}+\textbf{b},\textbf{A}\pmb{\Sigma}\textbf{A}′)$ as $m=2$ (with reference to the matrix sizes in Theorem $3.5.2$). Carrying out those calculations using matrices above, we get the mean vector and covariance matrix of $\textbf{Y}$ as 

\begin{equation}
\textbf{A}\pmb{\mu}+\textbf{b} = 
\begin{bmatrix}
\theta \\
\theta
\end{bmatrix},~~ \textbf{A}\pmb{\Sigma}\textbf{A}′ = 
\begin{bmatrix}
1 & 1/n \\
1/n & 1/n
\end{bmatrix}.
\end{equation}

Hence, means of $X_1$ and $\overline{X}$ are $\theta$ and $\theta$, and variances are $\sigma^2 = 1$ and $\sigma^2 = 1/n$ respectively, with correlation coefficient $\rho = 1/\sqrt{n}.$

#### Exercise $7.6.9.$ 
Let a random sample of size $n$ be taken from a distribution that has the pdf $f (x; \theta) = (1/\theta) e^{−x/\theta}I_{(\theta,\infty)} (x).$ Find the mle and MVUE of $P (X \leq 2).$

#### Solution:

The solution manual has the solution for this problem under exercise number $7.6.8.$ Somehow, I felt like I should derive the joint p.d.f as the one given in the manual is not clear (even though the final answer is correct).

Let $Z = X_1$, and let $V = \sum_2^n X_i$. Then $Z$ and $V$ are independent. Let $H_1 = Z$ and $H_2 = Z+V$. Then we need joint pdf of $H_1$ (which is $X_1$) and $H_2$ which is same as $Y = Z+V$.

Now we are looking for transformation of $H_1 = Z$ and $H_2 = Z+V$. We know the pdf of $Z$ (i.e the pdf of $X_1$, which is $\Gamma(1,\theta)$) and $V$ (i.e that of $\sum_2^nX_i$, which is $\Gamma(n-1,\theta)$ since it is a sum of $n-1$ i.i.d. $\Gamma(1,\theta)$ random variables). Inversion of this transformation is $Z = H_1$ and $V = H_2-H_1$ which has a Jacobian of $1$. So the joint pdf of $H_1$ and $H_2$ is (refer to Section $2.7$, page $144$ of the text)

$$g_{H_1,H_2}(h_1,h_2) = f_{Z,V}(h_1,h_2-h_1)|J| = f_Z(h_1)f_{V}(h_2-h_1) =\frac{(h_2-h_1)^{n-2}}{\theta^n(n-2)!}e^{-h_2/\theta}$$ and setting $x_1 = h_1$, and $y = h_2$, this expression matches that in the solutions manual.

One error in the manual is that the expectation integral is w.r.t variable $z$ so there had to be $dz$ but it says $dy$.