From 8e9acfb2595d0262d546fab6589a47a9ce0485c5 Mon Sep 17 00:00:00 2001
From: John Stachurski <john.stachurski@gmail.com>
Date: Sun, 26 Feb 2023 15:19:07 +1100
Subject: [PATCH] misc

---
 lectures/lln_clt.md | 444 ++++++++++++++++++++++++--------------------
 1 file changed, 244 insertions(+), 200 deletions(-)

diff --git a/lectures/lln_clt.md b/lectures/lln_clt.md
index a43cdbac9..d3b18627b 100644
--- a/lectures/lln_clt.md
+++ b/lectures/lln_clt.md
@@ -3,27 +3,32 @@ jupytext:
   text_representation:
     extension: .md
     format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.14.1
 kernelspec:
-  display_name: Python 3
+  display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
 
-
 # LLN and CLT
 
 ## Overview
 
-This lecture illustrates two of the most important theorems of probability and statistics: The
-law of large numbers (LLN) and the central limit theorem (CLT).
+This lecture illustrates two of the most important results in probability and statistics: 
+
+1. the law of large numbers (LLN) and 
+2. the central limit theorem (CLT).
 
-These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic modeling.
+These beautiful theorems lie behind many of the most fundamental results in
+econometrics and quantitative economic modeling.
 
 The lecture is based around simulations that show the LLN and CLT in action.
 
-We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold.
+We also demonstrate how the LLN and CLT break down when the assumptions they
+are based on do not hold.
 
-This lecture will focus on the univariable case to provide the intuitions for proofs and the generalization to multivariate case [later](https://python.quantecon.org/lln_clt.html#the-multivariate-case).
+This lecture will focus on the univariate case (the multivariate case is treated [in a more advanced lecture](https://python.quantecon.org/lln_clt.html#the-multivariate-case)).
 
 We'll need the following imports:
 
@@ -34,14 +39,8 @@ import numpy as np
 import scipy.stats as st
 ```
 
-## Relationships
-
-The LLN gives conditions under which sample moments converge to population moments as sample size increases.
-
-The CLT provides information about the rate at which sample moments converge to population moments as sample size increases.
-
 (lln_mr)=
-## LLN
+## The Law of Large Numbers
 
 ```{index} single: Law of Large Numbers
 ```
@@ -60,12 +59,15 @@ This means that $X$ takes values in $\{0,1\}$ and $\mathbb P\{X=1\} = p$.
 We can think of drawing $X$ as tossing a biased coin where
 
 * the coin falls on "heads" with probability $p$ and
-* we set $X=1$ if the coin is "heads" and zero otherwise.
+* the coin falls on "tails" with probability $1-p$
 
-The mean of $X$ is 
+We set $X=1$ if the coin is "heads" and zero otherwise.
+
+The (population) mean of $X$ is 
 
 $$
-\mathbb E X = 0 \cdot \mathbb P\{X=0\} + 1 \cdot \mathbb P\{X=1\} = \mathbb P\{X=1\} = p
+    \mathbb E X 
+    = 0 \cdot \mathbb P\{X=0\} + 1 \cdot \mathbb P\{X=1\} = \mathbb P\{X=1\} = p
 $$
 
 We can generate a draw of $X$ with `scipy.stats` (imported as `st`) as follows:
@@ -76,7 +78,8 @@ X = st.bernoulli.rvs(p)
 print(X)
 ```
 
-In this setting, the LLN tells us if we flip the coin many times, the fraction of heads that we see will be close to $p$.
+In this setting, the LLN tells us if we flip the coin many times, the fraction
+of heads that we see will be close to the mean $p$.
 
 Let's check this:
 
@@ -94,64 +97,77 @@ X_draws = st.bernoulli.rvs(p, size=n)
 print(X_draws.mean())
 ```
 
-Let's connect this to the discussion above, where we said the sample average converges to the "population mean".
+Let's connect this to the discussion above, where we said the sample average
+converges to the "population mean".
+
+Think of $X_1, \ldots, X_n$ as independent flips of the coin.
 
-The population mean is the mean in an infinite sample, which equals the true mean, or $\mathbb E X$.
+The population mean is the mean in an infinite sample, which equals the 
+expectation $\mathbb E X$.
 
 The sample mean of the draws $X_1, \ldots, X_n$ is
 
 $$
-\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i
+    \bar X_n := \frac{1}{n} \sum_{i=1}^n X_i
 $$
 
-which, in this case, is the fraction of draws that equal one (the number of heads divided by $n$).
+In this case, it is the fraction of draws that equal one (the number of heads divided by $n$).
 
 Thus, the LLN tells us that for the Bernoulli trials above
 
 ```{math}
 :label: exp
-
-\bar X_n \to \mathbb E X = p
-\qquad (n \to \infty)
+    \bar X_n \to \mathbb E X = p
+    \qquad (n \to \infty)
 ```
 
 This is exactly what we illustrated in the code.
 
+
 (lln_ksl)=
 ### Statement of the LLN
 
 Let's state the LLN more carefully.
 
-The traditional version of the law of large numbers concerns independent and identically distributed (IID) random variables.
-
-Let $X_1, \ldots, X_n$ be independent and identically distributed random variables.
+Let $X_1, \ldots, X_n$ be random variables, all of which have the same
+distribution.
 
 These random variables can be continuous or discrete.
 
-For simplicity we will assume they are continuous and we let $f$ denote their density function, so that, for any $i$ in $\{1, \ldots, n\}$
+For simplicity we will 
+
+* assume they are continuous and 
+* let $f$ denote their common density function
+
+The last statement means that for any $i$ in $\{1, \ldots, n\}$ and any
+numbers $a, b$,
 
 
 $$ 
   \mathbb P\{a \leq X_i \leq b\} = \int_a^b f(x) dx
 $$
 
-(For the discrete case, we need to replace densities with probability mass functions and integrals with sums.)
+(For the discrete case, we need to replace densities with probability mass
+functions and integrals with sums.)
+
+Let $\mu$ denote the common mean of this sample.
 
-Let $\mu$ denote the common mean of this sample:
+Thus, for each $i$,
 
 $$
-  \mu := \mathbb E X = \int_{-\infty}^{\infty} x f(x) dx
+  \mu := \mathbb E X_i = \int_{-\infty}^{\infty} x f(x) dx
 $$
 
-In addition, let
+The sample mean is
 
 $$
-\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i
+    \bar X_n := \frac{1}{n} \sum_{i=1}^n X_i
 $$
 
+The next theorem is called Kolmogorov's strong law of large numbers.
 
 ````{prf:theorem}
-The law of large numbers (specifically, Kolmogorov's strong law) states that, if $\mathbb E |X|$ is finite, then
+If $X_1, \ldots, X_n$ are IID and $\mathbb E |X|$ is finite, then
 
 ```{math}
 :label: lln_as
@@ -160,40 +176,53 @@ The law of large numbers (specifically, Kolmogorov's strong law) states that, if
 ```
 ````
 
+Here 
+
+* IID means independent and identically distributed and
+* $\mathbb E |X| = \int_{-\infty}^\infty |x| f(x) dx$
+
+
+
+
 ### Comments on the Theorem
 
-What does this last expression mean?
+What does the probability one statement in the theorem mean?
 
 Let's think about it from a simulation perspective, imagining for a moment that
-our computer can generate perfect random samples (which of course [it can't](https://en.wikipedia.org/wiki/Pseudorandom_number_generator)).
+our computer can generate perfect random samples (although this [isn't strictly true](https://en.wikipedia.org/wiki/Pseudorandom_number_generator)).
 
-Let's also imagine that we can generate infinite sequences so that the statement $\bar X_n \to \mu$ can be evaluated.
+Let's also imagine that we can generate infinite sequences so that the
+statement $\bar X_n \to \mu$ can be evaluated.
 
-In this setting, {eq}`lln_as` should be interpreted as meaning that the probability of the computer producing a sequence where $\bar X_n \to \mu$ fails to occur
-is zero.
+In this setting, {eq}`lln_as` should be interpreted as meaning that the
+probability of the computer producing a sequence where $\bar X_n \to \mu$
+fails to occur is zero.
 
 ### Illustration
 
 ```{index} single: Law of Large Numbers; Illustration
 ```
 
-Let's now illustrate the LLN using simulation.
-
-When we illustrate it, we will use a key idea: the sample mean $\bar X$ is itself a random variable.
+Let's illustrate the LLN using simulation.
 
-In a sense this is obvious but it can be easy to forget.
+When we illustrate it, we will use a key idea: the sample mean $\bar X_n$ is
+itself a random variable.
 
-The reason $\bar X_n$ is a random variable is that it's a function of the random variables $X_1, \ldots, X_n$.
+The reason $\bar X_n$ is a random variable is that it's a function of the
+random variables $X_1, \ldots, X_n$.
 
 What we are going to do now is 
 
-1. Pick some distribution to draw each $X_i$ from  
-1. Set $n$ to some large number
-1. Generate the draws $X_1, \ldots, X_n$
-1. Calculate the sample mean $\bar X_n$ and record its value in an array `sample_means`
-1. Go to step 3
+1. pick some fixed distribution to draw each $X_i$ from  
+1. set $n$ to some large number
 
-We will continue the loop over steps 3-4 a total of $m$ times, where $m$ is some large integer.
+and then repeat the following three instructions.
+
+1. generate the draws $X_1, \ldots, X_n$
+1. calculate the sample mean $\bar X_n$ and record its value in an array `sample_means`
+1. go to step 1.
+
+We will loop over these three steps $m$ times, where $m$ is some large integer.
 
 The array `sample_means` will now contain $m$ draws of the random variable $\bar X_n$.
 
@@ -203,186 +232,185 @@ Moreover, if we repeat the exercise with a larger value of $n$, we should see th
 
 This is, in essence, what the LLN is telling us.
 
-Let's run some simulations to visualize LLN
+To implement these steps, we will use functions.
 
-```{code-cell} ipython3
-def generate_histogram(X_distribution, n, m):
-  fig, ax = plt.subplots(figsize=(10, 6))
+Our first function generates a sample mean of size $n$ given a distribution.
 
-  def draw_means(X_distribution, n):
+```{code-cell} ipython3
+def draw_means(X_distribution,  # The distribution of each X_i
+               n):              # The size of the sample mean
 
-    # Step 3: Generate n draws: X_1, ..., X_n
+    # Generate n draws: X_1, ..., X_n
     X_samples = X_distribution.rvs(size=n)
 
-    # Step 4: Calculate the sample mean
+    # Return the sample mean
     return np.mean(X_samples)
-   
-  # Step 5: Loop m times
-  sample_means = [draw_means(X_distribution, n) for i in range(m)]
-  print(f'The mean of sample mean is {round(np.mean(sample_means),2)}')
-   
-  # Generate a histogram
-  ax.hist(sample_means, bins=30, alpha=0.5, density=True)
-  mu = X_distribution.mean()
-  if not np.isnan(mu):
-    ax.axvline(x=mu, ls="--", lw=3, label=fr"$\mu = {mu}$")
+```
+
+Now we write a function to generate $m$ sample means and histogram them.
+
+```{code-cell} ipython3
+def generate_histogram(X_distribution, n, m): 
+
+    # Compute m sample means
+
+    sample_means = np.empty(m)
+    for j in range(m):
+      sample_means[j] = draw_means(X_distribution, n) 
+
+    # Generate a histogram
+
+    fig, ax = plt.subplots()
+    ax.hist(sample_means, bins=30, alpha=0.5, density=True)
+    μ = X_distribution.mean()  # Get the population mean
+    σ = X_distribution.std()    # and the standard deviation
+    ax.axvline(x=μ, ls="--", c="k", label=fr"$\mu = {μ}$")
      
-  ax.set_xlim(min(sample_means), max(sample_means))
-  ax.set_xlabel(r'$\bar X_n$', size=12)
-  ax.set_ylabel('density', size=12)
-  ax.legend()
-  plt.show()
+    ax.set_xlim(μ - σ, μ + σ)
+    ax.set_xlabel(r'$\bar X_n$', size=12)
+    ax.set_ylabel('density', size=12)
+    ax.legend()
+    plt.show()
 ```
 
+Now we call the function.
+
 ```{code-cell} ipython3
-#Step 1: Pick some distribution to draw each $X_i$ from  
-#Step 2: Set $n$ to some large number
-generate_histogram(st.norm(loc=5, scale=2), n=50_000, m=1000)
+# pick a distribution to draw each $X_i$ from
+X_distribution = st.norm(loc=5, scale=2) 
+# Call the function
+generate_histogram(X_distribution, n=1_000, m=1000)
 ```
 
-We can see that the distribution of $\bar X$ is clustered around $\mathbb E X$ as expected.
+We can see that the distribution of $\bar X$ is clustered around $\mathbb E X$
+as expected.
+
+Let's vary `n` to see how the distribution of the sample mean changes.
+
+We will use a violin plot to show the different distributions.
 
-We can vary values for `n` to see how the distribution changes
+Each distribution in the violin plot represents the distribution of $X_n$ for some $n$, calculated by simulation.
 
 ```{code-cell} ipython3
-def generate_multiple_hist(X_distribution, ns, m, log_scale=False):
-    _, ax = plt.subplots(figsize=(10, 6))
+def means_violin_plot(distribution,  
+                      ns = [1_000, 10_000, 100_000],
+                      m = 10_000):
 
-    def draw_means(X_distribution, n):
-        X_samples = X_distribution.rvs(size=n)
-        return np.mean(X_samples)
-   
+    data = []
     for n in ns:
-        sample_means = [draw_means(X_distribution, n) for i in range(m)]
-        if log_scale:
-            plt.xscale('symlog')
-        ax.hist(sample_means, bins=40, alpha=0.4, density=True, label=fr'$n = {n}$')
+        sample_means = [draw_means(distribution, n) for i in range(m)]
+        data.append(sample_means)
 
-    mu = X_distribution.mean()
-    if not np.isnan(mu):
-        ax.axvline(x=mu, ls="--", lw=3, label=fr"$\mu = {mu}$")
+    fig, ax = plt.subplots()
+
+    ax.violinplot(data)
+    μ = distribution.mean()
+    ax.axhline(y=μ, ls="--", c="k", label=fr"$\mu = {μ}$")
+
+    labels=[fr'$n = {n}$' for n in ns]
+
+    ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
+    ax.set_xlim(0.25, len(labels) + 0.75)
+
+
+    plt.subplots_adjust(bottom=0.15, wspace=0.05)
 
-    ax.set_xlim(min(sample_means), max(sample_means)) 
-    ax.set_xlabel(r'$\bar X_n$', size=12)
     ax.set_ylabel('density', size=12)
     ax.legend()
     plt.show()
 ```
 
+Let's try with a normal distribution.
+
 ```{code-cell} ipython3
-generate_multiple_hist(st.norm(loc=5, scale=2), 
-                       ns=[20_000, 50_000, 100_000], 
-                       m=10_000)
+means_violin_plot(st.norm(loc=5, scale=2))
 ```
 
-The histogram gradually converges to $\mu$ as the sample size n increases.
+As $n$ gets large, more probability mass clusters around the population mean $\mu$.
+
+Now let's try with a Beta distribution.
 
-You can imagine the result when extrapolating this trend for $n \to \infty$.
+```{code-cell} ipython3
+means_violin_plot(st.beta(6, 6))
+```
 
+We get a similar result.
+
++++
 
 ## Breaking the LLN
 
-We have to pay attention to the assumptions in the statement of the LLN when we apply it.
+We have to pay attention to the assumptions in the statement of the LLN.
 
-As indicated by {eq}`lln_as`, LLN can break when $\mathbb E |X|$ is not finite or is not well defined.
+If these assumptions do not hold, then the LLN might fail.
 
-We can demonstrate this using a simple simulation using a [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution) for which it does not have a well-defined $\mu$.
+### Infinite First Moment
 
+As indicated by the theorem, the LLN can break when $\mathbb E |X|$ is not finite.
 
-We lost the convergence we have seen before with normal distribution
+We can demonstrate this using the [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution).
 
-```{code-cell} ipython3
-fig, axes = plt.subplots(1, 2, figsize=(15, 6))
-
-def scattered_mean(distribution, burn_in, n, jump, ax, title, color, ylog=False):
-   
-  #Set a jump to reduce simulation complexity
-    sample_means = [np.mean(distribution.rvs(size=i)) 
-          for i in range(burn_in, n+1, jump)]
-   
-    ax.scatter(range(burn_in, n+1, jump), sample_means, s=10, c=color)
-   
-    #Change the y-axis to log scale if necessary
-    if ylog:
-        ax.set_yscale("symlog")
-    ax.set_title(title, size=10)
-    ax.set_xlabel(r"$n$", size=12)
-    ax.set_ylabel(r"$\bar X_n$", size=12)
-    yabs_max = max(ax.get_ylim())
-    ax.set_ylim(ymin=-yabs_max, ymax=yabs_max)
-    return ax
-
-scattered_mean(distribution=st.cauchy(), 
-             burn_in=1000, 
-             n=1_000_000, 
-             ax=axes[0],
-             jump=2000,
-             title="Cauchy Distribution",
-             color='#1f77b4',
-             ylog=True)
-
-scattered_mean(distribution=st.norm(), 
-             burn_in=1000, 
-             n=1_000_000,
-             ax=axes[1],
-             jump=2000,
-             title="Normal Distribution",
-             color='#ff7f0e')
-
-fig.suptitle('Sample Mean with Different Sample Sizes')
-plt.show()
-```
+The Cauchy distribution has the following property:
 
-We find that unlike normal distribution, Cauchy distribution does not have the convergence that LLN implies.
+If $X_1, \ldots, X_n$ are IID and Cauchy, then so is $\bar X_n$.
 
-It is also not hard to conjecture that LLN can be broken when the independence assumption is violated.
+This means that the distribution of $\bar X_n$ does not eventually concentrate on a single number.
 
-Let's go through a very simple example where LLN fails with IID violated:
+Hence the LLN does not hold.
 
-Assume
+The LLN fails to hold here because the assumpton $\mathbb E|X| = \infty$ is violated by the Cauchy distribution.
 
-$$
-X_0 \sim N(0,1)
-$$
++++
+
+
+### Failure of the IID Condition
+
+The LLN can also fail to hold when the IID assumption is violated.
 
-In addition, assume
+For example, suppose that
 
 $$
-X_t = X_{t-1} \quad \text{for} \quad t = 1, ..., n
+    X_0 \sim N(0,1)
+    \quad \text{and} \quad
+    X_i = X_{i-1} \quad \text{for} \quad i = 1, ..., n
 $$
 
-We can then see that 
+In this case,
 
 $$
-\bar X_n := \frac{1}{n} \sum_{t=1}^n X_i = X_0 \sim N(0,1)
+    \bar X_n = \frac{1}{n} \sum_{i=1}^n X_i = X_0 \sim N(0,1)
 $$
 
-Therefore, the distribution of the mean of $X$ follows $N(0,1)$.
+Therefore, the distribution of $\bar X_n$ is $N(0,1)$ for all $n$!
 
-However,
+Does this contradict the LLN, which says that the distribution of $\bar X_n$
+collapses to the single point $\mu$?
 
-$$
-\mathbb E X_t = \mathbb E X_0 = 0
-$$
+No, the LLN is correct --- the issue is that its assumptions are not
+satisfied.
 
-Since the distribution of $\bar X$ follows a standard normal distribution, but the expectation $\mathbb E X_t$ is a single number.
+In particular, the sequence $X_1, \ldots, X_n$ is not independent.
 
-This violates {eq}`exp`, and thus breaks LLN.
 
 ```{note}
 :name: iid_violation
 
-Although in this case, the violation of IID breaks LLN, it is not always the case for correlated data. 
+Although in this case the violation of IID breaks the LLN, there *are* situations
+where IID fails but the LLN still holds.
 
 We will show an example in the [exercise](lln_ex3).
 ```
 
-## CLT
++++
+
+## Central Limit Theorem
 
 ```{index} single: Central Limit Theorem
 ```
 
-Next, we turn to the central limit theorem, which tells us about the distribution of the deviation between sample averages and population means.
+Next, we turn to the central limit theorem (CLT), which tells us about the
+distribution of the deviation between sample averages and population means.
+
 
 ### Statement of the Theorem
 
@@ -394,8 +422,8 @@ In the IID setting, it tells us the following:
 ````{prf:theorem}
 :label: statement_clt
 
-If the sequence $X_1, \ldots, X_n$ is IID, with common mean
-$\mu$ and common variance $\sigma^2 \in (0, \infty)$, then
+If $X_1, \ldots, X_n$ is IID with common mean $\mu$ and common variance
+$\sigma^2 \in (0, \infty)$, then
 
 ```{math}
 :label: lln_clt
@@ -408,18 +436,17 @@ n \to \infty
 
 Here $\stackrel { d } {\to} N(0, \sigma^2)$ indicates [convergence in distribution](https://en.wikipedia.org/wiki/Convergence_of_random_variables#Convergence_in_distribution) to a centered (i.e., zero mean) normal with standard deviation $\sigma$.
 
-### Intuition
-
-```{index} single: Central Limit Theorem; Intuition
-```
 
 The striking implication of the CLT is that for **any** distribution with
 finite [second moment](https://en.wikipedia.org/wiki/Moment_(mathematics)), the simple operation of adding independent
 copies **always** leads to a Gaussian curve.
 
+
+
+
 ### Simulation 1
 
-Since the CLT seems almost magical, running simulations that verify its implications is one good way to build intuition.
+Since the CLT seems almost magical, running simulations that verify its implications is one good way to build understanding.
 
 To this end, we now perform the following simulation
 
@@ -434,6 +461,7 @@ $F(x) = 1 - e^{- \lambda x}$.
 (Please experiment with other choices of $F$, but remember that, to conform with the conditions of the CLT, the distribution must have a finite second moment.)
 
 (sim_one)=
+
 ```{code-cell} ipython3
 # Set parameters
 n = 250         # Choice of n
@@ -464,7 +492,7 @@ ax.legend()
 plt.show()
 ```
 
-(Notice the absence of for loops --- every operation is vectorized, meaning that the major calculations are all shifted to optimized C code.)
+(Notice the absence of for loops --- every operation is vectorized, meaning that the major calculations are all shifted to fast C code.)
 
 The fit to the normal density is already tight and can be further improved by increasing `n`.
 
@@ -476,7 +504,7 @@ The fit to the normal density is already tight and can be further improved by in
 ```{exercise} 
 :label: lln_ex1
 
-Repeat the simulation in [simulation 1](sim_one) with [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution).
+Repeat the simulation [above1](sim_one) with the [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution).
 
 You can choose any $\alpha > 0$ and $\beta > 0$.
 ```
@@ -519,7 +547,11 @@ plt.show()
 ````{exercise} 
 :label: lln_ex2
 
-Although NumPy doesn't give us a `bernoulli` function, we can generate a draw of $X$ using NumPy via
+At the start of this lecture we discussed Bernoulli random variables.
+
+NumPy doesn't provide a `bernoulli` function that we can sample from.
+
+However, we can generate a draw of Bernoulli $X$ using NumPy via
 
 ```python3
 U = np.random.rand()
@@ -534,7 +566,9 @@ Explain why this provides a random variable $X$ with the right distribution.
 :class: dropdown
 ```
 
-We can write $X$ as $X = \mathbf 1\{U < p\}$ where $\mathbf 1$ is the [indicator function](https://en.wikipedia.org/wiki/Indicator_function) (i.e., 1 if the statement is true and zero otherwise).
+We can write $X$ as $X = \mathbf 1\{U < p\}$ where $\mathbf 1$ is the
+[indicator function](https://en.wikipedia.org/wiki/Indicator_function) (i.e.,
+1 if the statement is true and zero otherwise).
 
 Here we generated a uniform draw $U$ on $[0,1]$ and then used the fact that
 
@@ -556,22 +590,29 @@ We mentioned above that LLN can still hold sometimes when IID is violated.
 
 Let's investigate this claim further.
 
-Assume we have a AR(1) process as below:
+Consider the AR(1) process 
 
 $$
-X_{t+1} = \alpha + \beta X_t + \sigma \epsilon _{t+1}
+    X_{t+1} = \alpha + \beta X_t + \sigma \epsilon _{t+1}
 $$
 
-and
+where $\alpha, \beta, \sigma$ are constants and $\epsilon_1, \epsilon_2,
+\ldots$ is IID and standard norma.
+
+Suppose that
 
 $$
-X_0 \sim N \left(\frac{\alpha}{1-\beta}, \frac{\sigma^2}{1-\beta^2}\right)
+    X_0 \sim N \left(\frac{\alpha}{1-\beta}, \frac{\sigma^2}{1-\beta^2}\right)
 $$
 
-where $\epsilon_t \sim N(0,1)$
+This process violates the independence assumption of the LLN
+(since $X_{t+1}$ depends on the value of $X_t$).
+
+However, the next exercise teaches us that LLN type convergence of the sample
+mean to the population mean still occurs.
 
-1. Prove this process violated the independence assumption but not the identically distributed assumption;
-2. Show LLN holds using simulations with $\alpha = 0.8$, $\beta = 0.2$.
+1. Prove that the sequence $X_1, X_2, \ldots$ is identically distributed.
+2. Show that LLN convergence holds using simulations with $\alpha = 0.8$, $\beta = 0.2$.
 
 ```
 
@@ -581,43 +622,46 @@ where $\epsilon_t \sim N(0,1)$
 
 **Q1 Solution**
 
-Given $X_{t+1}$ is dependent on the value of $X_t$, this process is not independent.
+Regarding part 1, we claim that $X_t$ has the same distribution as $X_0$ for
+all $t$.
 
-To check whether it is identically distributed, we need to check whether the distribution in $T={0...n}$
+To construct a proof, we suppose that the claim is true for $X_t$.
 
-Let's verify the expectation and variance of this AR(1) process using pen and paper first.
+Now we claim it is also true for $X_{t+1}$.
+
+Observe that we have the correct mean:
 
 $$
 \begin{aligned}
-\mathbb E X_{t+1} &= \alpha + \beta \mathbb E X_t \\
-&= \alpha + \beta \frac{\alpha}{1-\beta} \\
-&= \frac{\alpha}{1-\beta}
+    \mathbb E X_{t+1} &= \alpha + \beta \mathbb E X_t \\
+    &= \alpha + \beta \frac{\alpha}{1-\beta} \\
+    &= \frac{\alpha}{1-\beta}
 \end{aligned}
 $$ 
 
+We also have the correct variance:
 
 $$
 \begin{aligned}
-\mathrm{Var}(X_{t+1}) &= \beta^2 \mathrm{Var}(X_{t}) + \sigma^2\\
-&= \frac{\beta^2\sigma^2}{1-\beta^2} + \sigma^2 \\
-&= \frac{\sigma^2}{1-\beta^2}
+    \mathrm{Var}(X_{t+1}) &= \beta^2 \mathrm{Var}(X_{t}) + \sigma^2\\
+    &= \frac{\beta^2\sigma^2}{1-\beta^2} + \sigma^2 \\
+    &= \frac{\sigma^2}{1-\beta^2}
 \end{aligned}
 $$ 
 
-We find that expectation and variance are the same $t = 0, ..., n$.
-
-Given both $X_0$ and $\epsilon _{0}$ are normally distributed and independent from each other, the weighted sum is also normally distributed.
+Finally, since both $X_t$ and $\epsilon_0$ are normally distributed and
+independent from each other, any linear combinary of these two variables is
+also normally distributed.
 
-This holds true for all $X_t$ and $\epsilon _{t}$ where $t = 0, ..., n$
-
-Therefore, 
+We have now shown that
 
 $$
-X_t \sim N \left(\frac{\alpha}{1-\beta}, \frac{\sigma^2}{1-\beta^2}\right) \quad t = 0, ..., n
+    X_{t+1} \sim 
+    N \left(\frac{\alpha}{1-\beta}, \frac{\sigma^2}{1-\beta^2}\right) 
 $$ 
 
-
-We can conclude this AR(1) process violates the independence assumption but is identically distributed.
+We can conclude this AR(1) process violates the independence assumption but is
+identically distributed.
 
 **Q2 Solution**