<h1 align="center">An Abridged History and Philosophy of Probability</h1>

<img src="Images/HistoryBanner.png"
     alt="Probability Historical Figures"
     width="800"
     style="display:block; margin-left:auto; margin-right:auto;">


<div style="max-width: 800px; margin: 0 auto; text-align: justify;">

## Part 1: The Philosophy of Probability

<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
The true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.
    <br><span style="font-style: normal; font-size: 0.9em;">— James Clerk Maxwell</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


What is probability? This sounds like a pretty silly way to start this document, and feels like a cheap literary device, but this question is worth asking solely because more than one answer exists. What is meant by probability has been, and partly still is, a contested idea. There are two schools of thought: a correct one and an incorrect one. That is a bit harsh—it is more fair to say that one school of thought is correct, and the other is sometimes a pretty good approximation of the correct school of thought. I am of course talking about Bayesianism (the correct school) and Frequentism (the approximately correct school).

Before continuing, a note on notation. $P(A \mid B)$ denotes the probability of $A$ given $B$—that is, the probability of $A$ *conditional* on $B$ being true or known. The vertical bar "$\mid$" should be read as "given" or "assuming." A comma denotes "and," so $P(A, B \mid C)$ is the probability that both $A$ and $B$ are true, given $C$. You may also see this written with the intersection symbol: $P(A \cap B \mid C)$.

The distinction that matters for everything that follows is this:

$$\underbrace{P(\theta \mid \text{data})}_{\substack{\text{Bayesian} \\ \text{What we want}}} \quad \neq \quad \underbrace{P(\text{data} \mid \theta)}_{\substack{\text{Frequentist} \\ \text{ }}}$$

- The first asks: what is the probability of the parameter given what we observed?
-  The second asks: what is the probability of the data given some assumed parameter value?

These are not the same question. Both schools use the same notation, but frequentism—as we shall see—denies that $P(\theta \mid \text{data})$ is even a meaningful quantity.

The Frequentist views probability as a limiting long-run frequency and provides tools to calculate $P(\text{data} \mid \theta)$. The Bayesian views probability as a degree of plausibility and provides tools to compute $P(\theta \mid \text{data})$. Confusing these two is known as "transposing the conditional" or "the prosecutor's fallacy." As an example, consider a corpse. The probability of being dead given that you were guillotined is close to 1. The probability of having been guillotined given that you are dead is close to 0. Most dead people were not guillotined. 

As scientists, we are in the position of the detective. We have the data (the corpse). We want to know the cause (the model, the parameter, the hypothesis). We want $P(\theta \mid \text{data})$. 

These different views are not merely different interpretations of the same mathematics. They set differing frameworks which determine which questions are even meaningful to ask. As an astronomer, many questions regarding the probability of hypotheses and parameters cannot be coherently posed under frequentism.
### Part 1.1: The Frequentist Interpretation
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
The occult mysteries of this curious doctrine, regardless how fascinating they are to the adept, should remain hidden from the outlander because we know [through numerous disastrous experiences] they are apt to misuse and misinterpret them <br><span style="font-style: normal; font-size: 0.9em;">— William Briggs, in It is Time to Stop Teaching Frequentism to Non-statisticians</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
The frequentist definition of probability is as follows:

*The probability of an event is the limiting frequency which that event occurs in an infinite sequence of identical, independant trials.*

This is to say that if you were to flip a fair coin infinitely many times, the proportion of heads approaches 1/2. That limit *is* the probability. This feels like a natural definition. This is in part due to the fact that you have been taught this definition throughout your schooling. It also seems objective: it describes something that is independant of any observers. Objectivity is something that scientists ought to strive for.$^1$ In principle, you could actually measure the limiting frequency (or at least, approximate it arbitrarily well). 

The frequentist framework solidified in the first half of the 20th century via the works of people like R. A. Fisher, Jerzy, Neyman, and Egon Pearson. Tools such as maximum likelihood estimation significance testing, null/alternative hypotheses, Type I and II errors, statistical power etc are all tools from the frequentism framework. While these tools are occaisonally useful, they are not tools of a mathematical framework. 

Suppose we measure the eccentricity of an orbit and report it as $e=0.3\pm0.1$ at 68% confidence. What does this mean? The intuitive (and incorrect) interpretation is that there is a 68% chance that the true eccentricity lies in \[0.2, 0.4\]. This alluring interpretation is **forbidden** under frequentism.

Stepping back for a second, we ought to consider how frequentism understands parameters and data. Parameters, such as $e$ are fixed. They do not have a probability distribution. Only data can be random. Let's return to the (poorly named) confidence interval of \[0.2, 0.4\]. What does it mean? It means that if you were to repeat the experiment an infinite number of times (good luck doing that), 68% of the confidence intervals constructed would contain the true value. Given the single experiment, what can we say about our parameter? The **only** thing that we can say is that either this particular interval contains $e$ or it doesn't. That is a tautology! To quote William Briggs: *"why bother?"* 

Frequentism commits you to interpreting $e$ as a fixed constant, not a probability distribution. Probability, under frequentism, describes the behaviour of random variables. Constant cannot vary. As such, asking "what is the probability that the interval \[0.2, 0.4\] contains $e$ is incorrect. It would be akin to asking, what is the probability that $13$ is prime? $13$ is either prime or it isn't; there isn't an ensemble over which to define a frequency. 

### Part 1.2: The Astronomical Problem
Astronomy is often a science of single instances.

You cannot rerun the formation of the Galaxy$^2$. You cannot repeat observations of a supernova. You cannot resample the cosmic microwave background. The universe happened once. To suggest otherwise is to step outside the bounds of science and into pseudoscience$^3$. Most astronomical samples are not samples in the frequentist sense: draws from a well-defined population that could, in principle, be redrawn. They are singular events, unique systems, one-off configurations of matter observed at a particular moment in cosmic history.

Hopefully the problem is becoming apparent. Frequentism requires an ensemble: a hypothetical infinite sequence of identical, independent trials. When measuring a planet's eccentricity, what is the ensemble? Other measurements of this same planet? Simulations of the formation of this planet? The most defensible answer is repeated measurements of the same system. But this only characterises the distribution of your *estimator* (the behaviour of your measurement procedure across hypothetical repetitions) not the probability that the parameter lies within a given range. With enough measurements your confidence interval might shrink to \[0.29,0.31\], but you still cannot say there is a 68% probability that this interval contains $e$. Frequentism tells us about $\hat{e}$ (the estimate), not the actual parameter $e$. And $e$ is what we care about.

When sample sizes are large and models are correct, the sampling distribution of $\hat{e}$ concentrates tightly around $e$. The distinction can feel academic: a philosophical argument with no practical consequence. But this comfort relies on asymptotic guarantees that astronomy often lacks. We do not have thousands of independent galaxies drawn from identical conditions. We have *this* galaxy, *these* observations, *this* spectrum. The large-$N$ limit that lets frequentist and Bayesian methods converge is often not possible.

No amount of repeated measurement can answer the questions we often actually ask: 
- What is the probability that this transit signal is planetary rather than stellar?
- What is the probability that this gravitational wave event was a binary black hole merger?
- What is the probability that dark matter is composed of WIMPs? These questions have no ensemble.

There is no infinite sequence of universes to sample from.

Furthermore, frequentist inference depends not only on the data you observed, but on data you didn't observe! Consider what a $p$-value actually computes: It is the probability of obtaining a result *as extreme or more extreme* than what you observed, assuming the null hypothesis is true. To calculate this, you must sum over all hypothetical datasets that could have occurred but didn't. Your inference about the data in hand depends on data that was never measured. This same spectre haunts confidence intervals. You are asked to interpret a single interval by reference to repetitions that will never occur. Sir Harold Jeffreys identified this as a fundamental violation: significance testing requires "the consideration of data that have actually *not* been observed." He argued that inference should depend only on what actually occurred. William Briggs compared the null hypothesis to a strawman: it is constructed in order to knock it down, and to calibrate the inferential tools against this spectre. 

### Part 1.3: The Bayesian Interpretation
<img src="Images/BattleOfWits.png"
     alt="The Battle of Wits scence from The Princess Bride"
     width="600"
     style="display:block; margin-left:auto; margin-right:auto;">

In the 1987 film *The Princess Bride*, the Dread Pirate Roberts (right) is chasing Vizzini (left) who has kidnapped the Princess (behind Vizzini). The Dread Pirate Roberts had bested one of Vizzini's henchmen in a sword fight, and beat another in hand-to-hand combat. As Vizzini has no hope in physically besting Roberts, they come to an agreement: a battle of wits, do the death. Roberts poisons one of the goblets, and Vizzini must will chose which goblet to drink from, with Roberts to drink the other. Vizzini states:

<p style="text-align: center; font-style: italic; margin: 10px 60px;">
But it's so simple. All I have to do is divine from what I know of you: are you the sort of man who would put the poison into his own goblet or his enemy's?
</p>

Vizzini is reasoning under uncertainty. He has some *prior knowledge* (The Dread Pirate Robert's behaviour, reputation, apparent cleverness). He has hypotheses (Poison in his own goblet, or his enemy's). He is attempting to assign plausibilities based on what he knows. This is Bayesian reasoning, and it what we naturally do as humans.

The Bayesian definition of probability is as follows:

<p style="text-align: center; font-style: italic; margin: 10px 60px;">
Probability is a measure of plausability: the degree to which a proposition is supported by the available evidence.
</p>

To put simply: To a Bayesian, probability is a measure of belief. Probability is not a frequency. It is not a property of the physicalm world. It is a probability of your state of knowledge. If this is your first time encountering this, it likely will feel uncomfortable --- where is the objectivity? But this is how we talk about probability in the real world. If you say "I am 90% sure that I left my keys at home" that is a statement of belief. You are not stating that "if this day was somehow to be repeated over and over again, 9 times out of 10 my keys will be left at home and one time I took the keys with me." If you were to then pat all of your pockets and you did not feel your keys, you might then state "I am 99% sure that I left my keys at home". Your belief has changed, but the location of the keys have not. 

Perhaps the main difference between Frequentism and Bayesianism is this: In Bayesian inference, the data are fixed and the parameters have probability distributions. You observe what you observe, there is nothing random about it anymore. The uncertainty lies in what you *don't* know: the parameters, the models, and the hypotheses.  

Return to the questions frequentism could not answer:

- What is the probability that the true eccentricity lies between 0.2 and 0.4? 
- What is the probability that this transit signal is planetary? 
- What is the probability that Model A is correct rather than Model B?

Under Bayesianism, these are not only meaningful: they are the *native outputs* of the framework. You compute the posterior distribution $P(\theta \mid \text{data})$ and read off the answers directly.

When a Bayesian reports $e = 0.3 \pm 0.1$, it means exactly what you always think a confidence interval means: the posterior probability that $e$ lies in $[0.2, 0.4]$ is 68%. No hypothetical repetitions. No ensemble of experiments. No spectres of data that never materialised. Just: given what was observed, this is the probability.

This is the credible interval. And it is *credible* in the ordinary English sense. You can believe it. It answers the question you asked. Details about the actual nuts and bolts usage of Bayes Theorem is in a different document.

Unfortunately for Vizzini, although his reasoning was sound, he had not considered all hypotheses. The Dread Pirate Roberts poisoned both goblets of wine. In retrospect, we see Robert's priors in action (I have built up a tolerance for this poison, most people have not). While the tools of Bayesianism are the only coherent probability tools, to be used effectively the user must still consider all possibilities. 

### Part 1.4: The Likelihood Principle
The likelihood function is defined as:

$$\mathcal{L}(\theta) = P(\text{data} | \theta)$$

This is the probability of the data you observed, evaluated as a function of the parameter $\theta$. The Likelihood Principle states:
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
All the information that the data provide about the parameter is contained in the likelihood function
</p>
This seems straightforward enough -- and it should. Probability should be simple. It means that inference should depend *only* on what you observed, and not on what you *could* have observed. We will see that Bayes' theorem, of course, respects the Likelihood Principle. This section is only brought up to remind you that under frequentism, how you *intend* to run an experiment affects the results of the experiment. See Section 4.3 for an example of such frequentism insanity.   

### Part 1.5: All Roads Lead to Bayesianism
What do we want from a probability system anyway? Mathematically, how can we represent how beliefs probabilitically? Many people have thought about this, but Cox came up with a nice set of axiomatic statements:

1) Comparability: Degrees of plausibility should be represented by real numbers, ordered so that more plausible propositions receive higher values.
2) Consistency: Equivalent states of knowledge must lead to equivalent plausibilities, and conclusions must not depend on the order or path of reasoning.
3) Continuity: Small changes in information should lead to small changes in plausability

It should be noted that in this section, we have not said anything about Bayes' Theorem, the above desiderata is just speaking in generic terms. Cox ended up proving that if any probability theory wanted to follow the above desiderata, then that system **is always** either Bayesian, or reduces to Bayesianism. Any other system will have internal contradictions. Frequentism has no comparable foundation. There is no theorem that says "if you want properties X, Y, Z, you must use p-values".

### Part 1.6: On Pseudoscience allegations
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
A previous acquaintance with probability and statistics is not necessary; indeed, a certain amount of innocence in this area may be desirable, because there will be less to unlearn.
    <br><span style="font-style: normal; font-size: 0.9em;">— E.T. Jaynes, in Probability Theory: The Logic of Science</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<img src="Images/RedditBayesPseudoScience.png"
     alt="A Reddit post expressing doubt on the validity of Bayesian Statistics"
     width="600"
     style="display:block; margin-left:auto; margin-right:auto;">

Above is a screenshot of a reddit post of someone who is near the end of their statistics degree. To their misfortune, they have not been taught Bayesian statistics. They came across the idea on their own, but they are expressing their doubt regarding its validity, going as far to suggest it could be pseudoscientific. This Redditor would not be the first to suggest such a thing. Such objections have been levied for much of the 20th century, most notably by R. A. Fisher. The worry goes something like this:

*Science ought to be objective. Two scientists examining the same data should reach the same conclusions. But as Bayesian inference requires scientists to specify priors, different scientists could get different results. If Alice uses one prior and Brendan uses another, they may reach different posteriors from identical observations. Doesn't this make Bayesian Inference fundamentaly subjective? And isn't subjectivity a sign of pseudoscience?* 

Does this make Bayesian inference subjective? Yes.

Does this make Bayesian inference pseudoscience? Not really.

The logical positives$^4$ of the early 20th century would have held this worry. For them, scientific statements had to be empirically verifiable (you likely hold similar beliefs). A prior probability seems to fail this test. How do you verify a prior? It apparently exists only in the mind of the beholder. To the positists, this made Bayesian priors metaphysical nonsense, no better than claims about the soul. R. A. Fisher held this suspicion. In fact, he is on record stating:

*"The theory of inverse probability \[Bayesian statistics\] is founded upon an error, and must be wholey rejected."*

His objections were philisophical. Priors seemed to add personal beliefs into what ought to be an objective mathematical procedure. Frequentism promises probabilities grounded in physical frequencies. 

This question is not stupod. Bayesian inference does depend on priors, and priors can differ between people. The question depends on if this is a problem or not. I (obviously) lean to the latter opinion.

#### Are different priors a problem?
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
The greater our knowledge increases, the greater our ignorance unfolds. <br><span style="font-style: normal; font-size: 0.9em;">— John F. Kennedy</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">

No.

Let's again consider two scientists, Alice and Brendan. They are analysing the same dataset to estimate some parameter $\theta$. Alice knows a lot about this subject, and thus has narrow priors. Brendan is new to this field, unaware of some recent results. As such, he has a broader prior. After observing the data, their priors are different. Is this a problem? 

No. Their posteriors should differ, as their states of knowledge differ. Alice knows things that Brenden doesn't. It would be weird if this weren't the case. It would be a failure of rationality if Alice were forced to ignore her expertise and reason as if she were as ignorant as Brendan. 

Bayesianism does not claim to produce an "objective" answer (whatever that means). It priduces the best answer given the state of knowledge. If two people have different knowledge, they ought to have different beliefs. Moreover, significantly different priors only produce significantly different posteriors when the data is uninformative. If after the experiment, Alice and Brenden still disagree, then the data gathered is uniformative. As data accumulates, the likelihood dominates over the prior. Alice and Brenden will find their posteriors converging with sufficient data. 

Contrast this with the frequentist fantasy of "objectivity." A frequentist confidence interval does not depend on the analyst's prior beliefs, but it does depend on the analyst's choice of test statistic, significance threshold, stopping rule, and definition of "as extreme or more extreme." These are all subjective choices. They are simply not called priors. The subjectivity is hidden rather than eliminated. All statistical frameworks require assumptions, the question is how you incorporate these assumptions and how you display them.

#### The Demarcation Problem

How do we distinguish science from pseudoscience? I am not entirely sure. I cannot give a single rule that clearly distinguishes the two. On the axis of astronomy --> SETI --> "UFOlogy" where does one draw the line? Probably between SETI and UFOlogy; that just feels right. 

While no single criterion cleanly seperates science from pseudoscience, several diagonostic critera have been created, that in tandem, offer a decent hueristic. 

- Is it falsifiable? Could a conceivable observation contradict it?
- It it immune to evidence? Do practitioners update their beliefs when confronted with negative data?
- Does it make novel predictions? Or are the predictions so vague that any outcome can be accomodated>
- Does it rely on authority or anecdote?
- Is there a plausible mechanism connecting cause and effect?

This list is not perfect. String theory, while heavily criticised, can hardly be labelled pseudoscience. Some pseudoscience follows some of these: astrology makes predictions which are, occasionally, correct. General relativity passes all of these. So does Bayesian inference pass or fail?

**Is Bayesian inference falsifiable?** Yes. The likelihood function can produce a posterior very different to the prior.

**Does Bayesian inference update on evidence?** By construction, yes.

**Does it make prediction?** The posterior should predict future observations.

**Does it rely on authority?** No - the prior explicitly lists all assumptions. There are no "standard practices" with baked in assumptions.

Any criticism of Bayesianism for subjectivity, without criticism of Frequentism for subjectivity is pointless. All inference requires assumptions.

---------

Priors are not arbritary.  The worry about priors seems to imply that they are plucked from thin air. This is not the case. In practice, priors encode genuine scientific information. For example:

- Stellar masses cannot be negative. They also probably aren't 1e9 solar masses.
- Fluxes cannot be negative.
- Stars probably are not 90% Cobalt.
- Orbital eccentricies lie within 0 and 1.
- Stellar effective temperatures are bounded by phuscs.
- Parallaxes of distant stars are small and positive.

These are not subjective in any pejorative sense. They are facts. A prior that assigns probability to negative mass (something that is sometimes implied in some frequentist models) are not different opinions, they are wrong. Priors encode what we know as scientists. Failing to do so would often lead to model errors.

The interesting cases arise when genuine uncertainty exists. What prior should we place on the fraction of stars hosting habitable planets? Here, reasonable people might disagree. But this disagreement isn't because one is correct and the other is wrong, the difference arises because of genuine uncertainty. 

#### Subjective and Objetive Bayesianism

With all of this being said, there are two schools of thought within Bayesianism: Subjective and Objective Bayes.

Subjective Bayesians hold that priors represent personal degrees of belief. Any prior consistent with the axioms of probability is valid. Different scientists may hold different priors legitimately.

Objective Bayesians seeks priors that are in some sense objective: priors that any rational agent should adopt given the same background information. Several approaches exist:

- Maximum entropy priors: Choose the distribution with maximum entropy subject to known constraints. This is akin to the least informative prior possible. If you know that a parameter is positive, then the maximum entropy prior should be an exponential.
- Jeffreys prior: Priors ought to be invariant under reparameterisation.
- Reference priors: Priors that maximise the expected information gain from the data.

E.T. Jaynes was an objective Bayesian. He imagined a robot (Jaynes' robot) that must assign probabilities based only on the information give. No hunches, no intuitions, no human intervention. The robots reasoning must be consistent. If the same problem is posed in different ways, the robot must give the same answer. Jaynes showed that such a robot *must* use Bayesian probability, and that maximum entropy provides the unique consistent method for assignming priors from incomplete information.  

In practice, most Bayesians are in neither camp. They use objective priors for nuisance parameters that they have no strong knowledge of, and informative (subjective) priors based on domain knowledge for parameters of interest. 


TODO: add a table demonstrating the differences between  how the schools see certain things
TODO: add citations and further reading
### Footnotes:

- $1$ TODO: Say something about the lack of objectivity
- $2$ This is not to say that cosmological simulations are pointless or wrong. Far from it. 
- $3$ Yes, I am a many worlds interpretation hater.
- $4$ Define logical postiives


<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
A scientist who has learned how to use probability theory directly as extended logic, has a great advantage in power and versatility over one who has learned only a collection of unrelated ad hoc devices.    <br><span style="font-style: normal; font-size: 0.9em;">— E.T. Jaynes</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">

</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 2: Dramatis Personae

### Part 2.1: Reverand Thomas Bayes
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
Bayes' theorem is to the theory of probability what Pythagoras' theorem is to geometry.    <br><span style="font-style: normal; font-size: 0.9em;">— Sir Harold Jeffreys</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 2.2: Pierre-Simon Laplace

### Part 2.3: Sir Ronald Fisher

### Part 2.4: Sir Harold Jeffreys
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
Dedicated to the memory of Sir Harold Jeffreys, who saw the truth and preserved it.  <br><span style="font-style: normal; font-size: 0.9em;">— E.T. Jaynes, dedication in Probability Theory: The Logic of Science</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 2.5: E.T. Jaynes

### Part 2.6: Alan Turing

</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 3: History

### Part 3.1: The Rise of Frequentism

### Part 3.2: The Jeffreys-Fisher Saga

### Part 3.3: Bayes at Bletchley Park
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
"I won't say what we did made us win the war, but I daresay we might have lost it without it.    <br><span style="font-style: normal; font-size: 0.9em;">— I.J. Good, on Bletchley Park
</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">

### Part 3.4: The Bayesian Renaissance
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
To paraphrase Seneca, they will be incredulous that such clear truths could have escaped us throughout the 20th (and into the 21st) century    <br><span style="font-style: normal; font-size: 0.9em;">—  Tommaso Toffoli</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 4: Ancedotes and Examples

### Part 4.1: Casinos are Bayesian

### Part 4.2: The Bem Affair
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
There are three kinds of lies: lies, damned lies, and statistics.    <br><span style="font-style: normal; font-size: 0.9em;">— Disraeli</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 4.3: The Stopping Problem:


</div>