Merge branch 'main' into fix-codefromfile

mmcky · mmcky · commit 02f97316a031 · 2021-01-08T15:42:05.000+11:00
diff --git a/lectures/exchangeable.md b/lectures/exchangeable.md
@@ -140,16 +140,14 @@ and partial history $W_{t-1}, \ldots, W_0$ contains no information about the pro
 
 So in the IID case, there is **nothing to learn** about the densities of future random variables from past data.
 
-In the general case, there is something go learn from past data.
+In the general case, there is something to learn from past data.
 
-We turn next to an instance of this general case in which there is something to learn from past data.
+We turn next to an instance of this general case.
 
 Please keep your eye out for **what** there is to learn from past data.
 
 ## A Setting in Which Past Observations Are Informative
 
-We now turn to a setting in which there **is** something to learn.
-
 Let $\{W_t\}_{t=0}^\infty$ be a sequence of nonnegative
 scalar random variables with a joint probability distribution
 constructed as follows.
@@ -174,7 +172,7 @@ of them once and for all and then drew an IID sequence of draws from that distri
 
 But our decision maker does not know which of the two distributions nature selected.
 
-The decision maker summarizes his ignorance about this by picking a **subjective probability**
+The decision maker summarizes his ignorance with a **subjective probability**
 $\tilde \pi$ and reasons as if  nature had selected $F$ with probability
 $\tilde \pi \in (0,1)$ and
 $G$ with probability $1 - \tilde \pi$.
@@ -276,7 +274,7 @@ as a **prior probability** that nature selected probability distribution $F$.
 DeFinetti {cite}`definetti` established a related representation of an exchangeable process created by mixing
 sequences of IID Bernoulli random variables with parameters $\theta$ and mixing probability $\pi(\theta)$
 for a density $\pi(\theta)$ that a Bayesian statistician would interpret as a prior over the unknown
-Bernoulli paramter $\theta$.
+Bernoulli parameter $\theta$.
 
 ## Bayes' Law
 
@@ -287,7 +285,7 @@ But how can we learn?
 
 And about what?
 
-The answer to the *about what* question is about $\tilde pi$.
+The answer to the *about what* question is about $\tilde \pi$.
 
 The answer to the *how* question is to use  Bayes' Law.
 
@@ -302,7 +300,7 @@ $$
 \pi = \mathbb{P}\{q = f \}
 $$
 
-where we regard $\pi$ as the decision maker's **subjective probability**  (also called a **personal probability**.
+where we regard $\pi$ as the decision maker's **subjective probability**  (also called a **personal probability**).
 
 Suppose that at $t \geq 0$, the decision maker has  observed a history
 $w^t \equiv [w_t, w_{t-1}, \ldots, w_0]$.
@@ -486,12 +484,12 @@ learning_example()
 Please look at the three graphs above created for an instance in which $f$ is a uniform distribution on $[0,1]$
 (i.e., a Beta distribution with parameters $F_a=1, F_b=1$, while  $g$ is a Beta distribution with the default parameter values $G_a=3, G_b=1.2$.
 
-The graph in the left  plots the likehood ratio $l(w)$ on the coordinate axis against $w$ on the coordinate axis.
+The graph on the left  plots the likehood ratio $l(w)$ on the coordinate axis against $w$ on the ordinate axis.
 
 The middle graph plots both $f(w)$ and $g(w)$  against $w$, with the horizontal dotted lines showing values
 of $w$ at which the likelihood ratio equals $1$.
 
-The graph on the right side plots arrows to the right that show when Bayes' Law  makes $\pi$ increase and arrows
+The graph on the right plots arrows to the right that show when Bayes' Law  makes $\pi$ increase and arrows
 to the left that show when Bayes' Law make $\pi$ decrease.
 
 Notice how the length of the arrows, which show the magnitude of the force from Bayes' Law impelling $\pi$ to change,
diff --git a/lectures/likelihood_bayes.md b/lectures/likelihood_bayes.md
@@ -59,15 +59,14 @@ We begin by reviewing the setting in {doc}`this lecture <likelihood_ratio_proces
 A nonnegative random variable $W$ has one of two probability density functions, either
 $f$ or $g$.
 
-Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from either
-$f$ or $g$.
+Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from $f$ or from $g$.
 
 We will sometimes let $q$ be the density that nature chose once and for all, so
 that $q$ is either $f$ or $g$, permanently.
 
 Nature knows which density it permanently draws from, but we the observers do not.
 
-We do know both $f$ and $g$ but we don’t know which density nature
+We do know both $f$ and $g$, but we don’t know which density nature
 chose.
 
 But we want to know.
diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md
@@ -38,10 +38,10 @@ This lecture describes likelihood ratio processes and some of their uses.
 
 We'll use a setting described in {doc}`this lecture <exchangeable>`.
 
-Among the things that we'll learn about are
+Among  things that we'll learn  are
 
 * A peculiar property of likelihood ratio processes
-* How a likelihood ratio process is the key ingredient in frequentist hypothesis testing
+* How a likelihood ratio process is a key ingredient in frequentist hypothesis testing
 * How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing
 * How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged and asked Milton Friedman to justify to him, a topic to be studied in  {doc}`this lecture <wald_friedman>`
 
@@ -111,8 +111,8 @@ Pearson {cite}`Neyman_Pearson`.
 
 To help us appreciate how things work, the following Python code evaluates $f$ and $g$ as two different
 beta distributions, then computes and simulates an associated likelihood
-ratio process by generating a sequence $w^t$ from *some*
-probability distribution, for example, a sequence of  IID draws from $g$.
+ratio process by generating a sequence $w^t$ from one of the two
+probability distributionss, for example, a sequence of  IID draws from $g$.
 
 ```{code-cell} python3
 # Parameters in the two beta distributions.
@@ -322,7 +322,7 @@ Denote $q$ as the data generating process, so that
 $q=f \text{ or } g$.
 
 Upon observing a sample $\{W_i\}_{i=1}^t$, we want to decide
-which one is the data generating process by performing  a (frequentist)
+whether nature is drawing from $g$ or from $f$ by performing  a (frequentist)
 hypothesis test.
 
 We specify
@@ -341,7 +341,7 @@ where $c$ is a given  discrimination threshold, to be chosen in a way we'll soon
 This test is *best* in the sense that it is a **uniformly most powerful** test.
 
 To understand what this means, we have to define probabilities of two important events that
-allow us to characterize a test associated with given
+allow us to characterize a test associated with a given
 threshold $c$.
 
 The two probabilities are:
@@ -370,7 +370,7 @@ alarm.
 Another way to say the same thing is that  among all possible tests, a likelihood ratio test
 maximizes **power** for a given **significance level**.
 
-To have made a confident inference, we want a small probability of
+To have made a good inference, we want a small probability of
 false alarm and a large probability of detection.
 
 With sample size $t$ fixed, we can change our two probabilities by
@@ -412,7 +412,8 @@ moves toward $-\infty$ when $g$ is the data generating
 process, ; while log$(L(w^t))$ goes to
 $\infty$ when data are generated by $f$.
 
-This diverse behavior is what makes it possible to distinguish
+That disparate  behavior of log$(L(w^t))$ under $f$ and $q$
+is what makes it possible to distinguish
 $q=f$ from $q=g$.
 
 ```{code-cell} python3
@@ -499,9 +500,9 @@ of detection and a smaller probability of false alarm associated with
 a given discrimination threshold $c$.
 
 As $t \rightarrow + \infty$, we approach the perfect detection
-curve that is indicated by a right angle hinging on the green dot.
+curve that is indicated by a right angle hinging on the blue dot.
 
-For a given sample size $t$, a value discrimination threshold $c$ determines a point on the receiver operating
+For a given sample size $t$, the discrimination threshold $c$ determines a point on the receiver operating
 characteristic curve.
 
 It is up to the test designer to trade off probabilities of
@@ -540,7 +541,7 @@ plt.show()
 The United States Navy evidently used a procedure like this to select a sample size $t$ for doing quality
 control tests during World War II.
 
-A Navy Captain who had been ordered to perform tests of this kind had second thoughts about it that he
+A Navy Captain who had been ordered to perform tests of this kind had doubts about it that he
 presented to Milton Friedman, as we describe in  {doc}`this lecture <wald_friedman>`.
 
 ## Sequels
diff --git a/lectures/navy_captain.md b/lectures/navy_captain.md
@@ -106,8 +106,8 @@ impose on him.
 The decision maker pays  a cost $c$ for drawing
 another  $z$
 
-We mainly borrow parameters from the quantecon lecture “A Problem that
-Stumped Milton Friedman” except that we increase both $\bar L_{0}$
+We mainly borrow parameters from the quantecon lecture
+{doc}`A Problem that Stumped Milton Friedman <wald_friedman>` except that we increase both $\bar L_{0}$
 and $\bar L_{1}$ from $25$ to $100$ to encourage the
 frequentist Navy Captain to take more draws before deciding.
 
@@ -270,7 +270,7 @@ Here
   not rejecting $H_0$ when $H_1$ is true
 
 For a given sample size $t$, the pairs $\left(PFA,PD\right)$
-lie on a “receiver operating characteristic curve” and can be uniquely
+lie on a **receiver operating characteristic curve** and can be uniquely
 pinned down by choosing $d$.
 
 To see some receiver operating characteristic curves, please see this
@@ -297,7 +297,7 @@ plt.legend()
 plt.show()
 ```
 
-We can compute sequneces of likelihood ratios using simulated samples.
+We can compute sequences of likelihood ratios using simulated samples.
 
 ```{code-cell} python3
 l = lambda z: wf.f0(z) / wf.f1(z)
@@ -312,7 +312,7 @@ L1_arr = np.cumprod(l1_arr, 1)
 ```
 
 With an empirical distribution of likelihood ratios in hand, we can draw
-“receiver operating characteristic curves” by enumerating
+**receiver operating characteristic curves** by enumerating
 $\left(PFA,PD\right)$ pairs given each sample size $t$.
 
 ```{code-cell} python3
@@ -450,7 +450,7 @@ plt.title('$\overline{V}_{fre}$')
 plt.show()
 ```
 
-The following shows how do optimal sample size $t$ and targeted
+The following shows how optimal sample size $t$ and targeted
 $\left(PFA,PD\right)$ change as $\pi^{*}$ varies.
 
 ```{code-cell} python3
@@ -471,7 +471,7 @@ plt.show()
 
 ## Bayesian Decision Rule
 
-In this lecture {doc}`A Problem that Stumped Milton Friedman <wald_friedman>`,
+In  {doc}`A Problem that Stumped Milton Friedman <wald_friedman>`,
 we learned how Abraham Wald confirmed the Navy
 Captain’s hunch that there is a better decision rule.
 
@@ -603,7 +603,7 @@ plt.legend(borderpad=1.1)
 plt.show()
 ```
 
-The above figure portrays the value function plotted against decision
+The above figure portrays the value function plotted against the decision
 maker’s Bayesian posterior.
 
 It also shows the probabilities $\alpha$ and $\beta$.
@@ -641,6 +641,7 @@ $$
 
 where
 $\pi^{\prime}=\frac{\pi f_{0}\left(z^{\prime}\right)}{\pi f_{0}\left(z^{\prime}\right)+\left(1-\pi\right)f_{1}\left(z^{\prime}\right)}$.
+
 Given a prior probability $\pi_{0}$, the expected loss for the
 Bayesian is
 
@@ -843,7 +844,7 @@ It is always positive.
 
 ## More details
 
-We can provide more insights by focusing soley the case in which
+We can provide more insights by focusing on the case in which
 $\pi^{*}=0.5=\pi_{0}$.
 
 ```{code-cell} python3
@@ -853,7 +854,7 @@ $\pi^{*}=0.5=\pi_{0}$.
 Recall that when $\pi^*=0.5$, the frequentist decision rule sets a
 sample size `t_optimal` **ex ante**
 
-For our parameter settings, we can compute it’s value:
+For our parameter settings, we can compute its value:
 
 ```{code-cell} python3
 t_optimal
@@ -870,7 +871,7 @@ t_idx = t_optimal - 1
 
 By using simulations, we compute the frequency distribution of time to
 deciding for the Bayesian decision rule and compare that time to the
-frequentist rule’sfixed $t$.
+frequentist rule’s fixed $t$.
 
 The following Python code creates a graph that shows the frequency
 distribution of Bayesian times to decide of Bayesian decision maker,
diff --git a/lectures/wald_friedman.md b/lectures/wald_friedman.md
@@ -376,12 +376,12 @@ c + \int \min \{ (1 - \kappa(z', \pi) ) L_0, \kappa(z', \pi)  L_1, h(\kappa(z',
 can be understood as a functional equation, where $h$ is the unknown.
 
 Using the functional equation, {eq}`funceq`, for the continuation value, we can back out
-optimal choices using the RHS of {eq}`optdec`.
+optimal choices using the right side of {eq}`optdec`.
 
 This functional equation can be solved by taking an initial guess and iterating
-to find the fixed point.
+to find a fixed point.
 
-In other words, we iterate with an operator $Q$, where
+Thus, we iterate with an operator $Q$, where
 
 $$
 Q h(\pi) =
@@ -529,7 +529,7 @@ def solve_model(wf, tol=1e-4, max_iter=1000):
 
 ## Analysis
 
-Let's inspect the model's solutions.
+Let's inspect outcomes.
 
 We will be using the default parameterization with distributions like so
 
@@ -747,7 +747,7 @@ simulation_plot(wf)
 
 Increased cost per draw has induced the decision-maker to take fewer draws before deciding.
 
-Because he decides with less, the percentage of time he is correct drops.
+Because he decides with fewer draws, the percentage of time he is correct drops.
 
 This leads to him having a higher expected loss when he puts equal weight on both models.
 
@@ -939,4 +939,4 @@ We'll dig deeper into some of the ideas used here in the following lectures:
 * {doc}`this lecture <likelihood_ratio_process>` describes **likelihood ratio processes** and their role in frequentist and Bayesian statistical theories
 * {doc}`this lecture <likelihood_bayes>` discusses the role of likelihood ratio processes in **Bayesian learning**
 * {doc}`this lecture <navy_captain>` returns to the subject of this lecture and studies whether the Captain's hunch that the (frequentist) decision rule
-  that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed
+  that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed