<h1 align="center">An Abridged History and Philosophy of Probability</h1>

<img src="Images/HistoryBanner.png"
     alt="Probability Historical Figures"
     width="800"
     style="display:block; margin-left:auto; margin-right:auto;">


<div style="max-width: 800px; margin: 0 auto; text-align: justify;">

## Part 1: The Philosophy of Probability

<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
The true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.
    <br><span style="font-style: normal; font-size: 0.9em;">— James Clerk Maxwell</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


What is probability? This sounds like a pretty silly way to start this document, and feels like a cheap literary device, but this question is worth asking solely because more than one answer exits. What is meant by probability has been, and partly still is a contested idea. There are two schools of thought: a correct one and an incorrect one. That is a bit harsh, it is more fair to say that one school of thought is correct, and the other is sometimes a pretty good approximation of the correct school of thought. I am of course talking about Bayesianism (the correct school) and Frequentism (the approximately correct school). 

The Frequentist views probability as the limiting long run frequency. These different views are not just different interpretations of the same thing, they set differing frameworks which determine which questions are even meaningful to ask. As an astronomer, many questions regarding the probability of observing something in the universe cannot be coherently asked under frequentism.

### Part 1.1: The Frequentist Interpretation
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
The occult mysteries of this curious doctrine, regardless how fascinating they are to the adept, should remain hidden from the outlander because we know [through numerous disastrous experiences] they are apt to misuse and misinterpret them <br><span style="font-style: normal; font-size: 0.9em;">— William Briggs, in It is Time to Stop Teaching Frequentism to Non-statisticians</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
The frequentist definition of probability is as follows:

*The probability of an event is the limiting frequency which that event occurs in an infinite sequence of identical, independant trials.*

This is to say that if you were to flip a fair coin infinitely many times, the proportion of heads approaches 1/2. That limit *is* the probability. This feels like a natural definition. This is in part due to the fact that you have been taught this definition throughout your schooling. It also seems objective: it describes something that is independant of any observers. Objectivity is something that scientists ought to strive for.$^1$ In principle, you could actually measure the limiting frequency (or at least, approximate it arbitrarily well). 

The frequentist framework solidified in the first half of the 20th century via the works of people like R. A. Fisher, Jerzy, Neyman, and Egon Pearson. Tools such as maximum likelihood estimation significance testing, null/alternative hypotheses, Type I and II errors, statistical power etc are all tools from the frequentism framework. While these tools are occaisonally useful, they are not tools of a mathematical framework. 

Suppose we measure the eccentricity of an orbit and report it as $e=0.3\pm0.1$ at 68% confidence. What does this mean? The intuitive (and incorrect) interpretation is that there is a 68% chance that the true eccentricity lies in \[0.2, 0.4\]. This alluring interpretation is **forbidden** under frequentism.

Stepping back for a second, we ought to consider how frequentism understands parameters and data. Parameters, such as $e$ are fixed. They do not have a probability distribution. Only data can be random. Let's return to the (poorly named) confidence interval of \[0.2, 0.4\]. What does it mean? It means that if you were to repeat the experiment an infinite number of times (good luck doing that), 68% of the confidence intervals constructed would contain the true value. Given the single experiment, what can we say about our parameter? The **only** thing that we can say is that either this particular interval contains $e$ or it doesn't. That is a tautology! To quote William Briggs: *"why bother?"* 

Frequentism commits you to interpreting $e$ as a fixed constant, not a probability distribution. Probability, under frequentism, describes the behaviour of random variables. Constant cannot vary. As such, asking "what is the probability that the interval \[0.2, 0.4\] contains $e$ is incorrect. It would be akin to asking, what is the probability that $13$ is prime? $13$ is either prime or it isn't; there isn't an ensemble over which to define a frequency. 

### Part 1.2: The Astronomical Problem
Astronomy is often a science of single instances.

You cannot rerun the formation of the Galaxy$^2$. You cannot repeat observations of a supernova. You cannot resample the cosmic microwave background. The universe happened once. To suggest otherwise is to step outside the bounds of science and into pseudoscience$^3$. Most astronomical samples are not samples in the frequentist sense: draws from a well-defined population that could, in principle, be redrawn. They are singular events, unique systems, one-off configurations of matter observed at a particular moment in cosmic history.

Hopefully the problem is becoming apparent. Frequentism requires an ensemble: a hypothetical infinite sequence of identical, independent trials. When measuring a planet's eccentricity, what is the ensemble? Other measurements of this same planet? Simulations of the formation of this planet? The most defensible answer is repeated measurements of the same system. But this only characterises the distribution of your *estimator* (the behaviour of your measurement procedure across hypothetical repetitions) not the probability that the parameter lies within a given range. With enough measurements your confidence interval might shrink to \[0.29,0.31\], but you still cannot say there is a 68% probability that this interval contains $e$. Frequentism tells us about $\hat{e}$ (the estimate), not the actual parameter $e$. And $e$ is what we care about.

When sample sizes are large and models are correct, the sampling distribution of $\hat{e}$ concentrates tightly around $e$. The distinction can feel academic: a philosophical argument with no practical consequence. But this comfort relies on asymptotic guarantees that astronomy often lacks. We do not have thousands of independent galaxies drawn from identical conditions. We have *this* galaxy, *these* observations, *this* spectrum. The large-$N$ limit that lets frequentist and Bayesian methods converge is often not possible.

No amount of repeated measurement can answer the questions we often actually ask: 
- What is the probability that this transit signal is planetary rather than stellar?
- What is the probability that this gravitational wave event was a binary black hole merger?
- What is the probability that dark matter is composed of WIMPs? These questions have no ensemble.
- 
There is no infinite sequence of universes to sample from.

Furthermore, frequentist inference depends not only on the data you observed, but on data you didn't observe! Consider what a $p$-value actually computes: It is the probability of obtaining a result *as extreme or more extreme* than what you observed, assuming the null hypothesis is true. To calculate this, you must sum over all hypothetical datasets that could have occurred but didn't. Your inference about the data in hand depends on data that was never measured. This same spectre haunts confidence intervals. You are asked to interpret a single interval by reference to repetitions that will never occur. Sir Harold Jeffreys identified this as a fundamental violation: significance testing requires "the consideration of data that have actually *not* been observed." He argued that inference should depend only on what actually occurred. William Briggs compared the null hypothesis to a strawman: it is constructed in order to knock it down, and to calibrate the inferential tools against this spectre. 

### Part 1.3: The Bayesian Interpretation
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
But it's so simple. All I have to do is divine from what I know of you: are you the sort of man who would put the poison into his own goblet or his enemy's?    <br><span style="font-style: normal; font-size: 0.9em;">— Vizzini, The Princess Bride</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
A scientist who has learned how to use probability theory directly as extended logic, has a great advantage in power and versatility over one who has learned only a collection of unrelated ad hoc devices.    <br><span style="font-style: normal; font-size: 0.9em;">— E.T. Jaynes</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">

### Part 1.4: The Likelihood Principle

### Part 1.5: All Roads Lead to Bayesianism
Cox's Theorem

### Part 1.6: Popper, X, and the Logic Of Confirmation

### Part 1.7: On Pseudoscience allegations


### Footnotes:

- $1$ TODO: Say something about the lack of objectivity
- $2$ This is not to say that cosmological simulations are pointless or wrong. Far from it. 
- $3$ Yes, I am a many worlds interpretation hater. 
</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 2: Dramatis Personae

### Part 2.1: Reverand Thomas Bayes
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
Bayes' theorem is to the theory of probability what Pythagoras' theorem is to geometry.    <br><span style="font-style: normal; font-size: 0.9em;">— Sir Harold Jeffreys</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 2.2: Pierre-Simon Laplace

### Part 2.3: Sir Ronald Fisher

### Part 2.4: Sir Harold Jeffreys
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
Dedicated to the memory of Sir Harold Jeffreys, who saw the truth and preserved it.  <br><span style="font-style: normal; font-size: 0.9em;">— E.T. Jaynes, dedication in Probability Theory: The Logic of Science</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 2.5: E.T. Jaynes

### Part 2.6: Alan Turing

</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 3: History

### Part 3.1: The Rise of Frequentism

### Part 3.2: The Jeffreys-Fisher Saga

### Part 3.3: Bayes at Bletchley Park
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
"I won't say what we did made us win the war, but I daresay we might have lost it without it.    <br><span style="font-style: normal; font-size: 0.9em;">— I.J. Good, on Bletchley Park
</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">

### Part 3.4: The Bayesian Renaissance
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
To paraphrase Seneca, they will be incredulous that such clear truths could have escaped us throughout the 20th (and into the 21st) century    <br><span style="font-style: normal; font-size: 0.9em;">—  Tommaso Toffoli</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


</div>

<div style="max-width: 800px; margin: 0 auto; text-align: center;">

## Part 4: Ancedotes and Examples

### Part 4.1: Casinos are Bayesian

### Part 4.2: The Bem Affair
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">
<p style="text-align: center; font-style: italic; margin: 10px 60px;">
There are three kinds of lies: lies, damned lies, and statistics.    <br><span style="font-style: normal; font-size: 0.9em;">— Disraeli</span>
</p>
<hr style="border: none; border-top: 1px solid #ccc; margin: 20px 100px;">


### Part 4.3: The Stopping Problem:


</div>