# Small Worlds vs Large Wolrds

The **Small World** represents the scientific model itself, and the **Large World**
represents the broader context in which one deploys a model.

**Bayesian inference** is just counting and comparing of possibilities. Consider
by analogy Jorge Luis Borges’ short story “The Garden of Forking Paths.”
In order to make good inference about what actually happened, it helps to consider
 everything that could have happened. A Bayesian analysis is a garden of forking data,
 in which alternative sequences of events are cultivated.

**The approach cannot guarantee a correct answer**, on large world terms. But it can
guarantee the best possible answer, on small world terms, that could be derived
from the information fed into it.

The goal of the Bayesian approach is to figure out which of the conjectures for a
certain context is **the most plausible**, given some evidence (data).

By comparing these counts, we have part of a solution
for a way to rate the relative plausibility of each conjecture.
But it’s only a part of a solution, because in order to compare these counts
we first have to decide how many ways each conjecture could itself be realized.
We might argue that when we have no reason to assume otherwise, we can just consider
 each conjecture equally plausible and compare the counts directly, **Principle of Indifference**.
 But often we do have reason to assume otherwise.

> ***Principle of indifference***: When there is no reason to say that one conjecture is more plausible
> than another, weigh all of the conjectures equally.

To grasp a solution, suppose we’re willing to say each conjecture is equally plausible
at the start. Then, we just compare the counts of ways
in which each conjecture is compatible with the observed data. So, comparing them can suggest
that ones are more plausible, than others. Since these are our initial counts, and
probably they are going to update later, they are labeled **prior**.

Then when we get more evidence or observations, we can update the conjectures' plausibility.
Only if they new data is independent of the previous data,
> To update the plausibility ***p*** of  a conjecture ***C*** that is produced in ***W<sub>prior</sub>***
> ways based on previous data ***D<sub>prior</sub>*** after providing more evidence ***D<sub>new</sub>***
> is as follows:
>
> $\Large P_c \propto W_{prior} \times W_{new} $

Why multiplication? Because it's a shortcut for counting all possible paths.



## From counting to probability

It’s hard to use these counts though, so almost always they are standardized in a way that
transforms them into probabilities.

The meaning would be the same, it’s just the relative values that matter. Second,
as the amount of data grows, the counts will very quickly grow very large and become difficult
to manipulate.

Then, for any value p can take, we judge the plausibility of that value p
as proportional to the number of ways it can get through the garden of forking data.
Finally, we construct probabilities by standardizing the plausibility so that the sum of
the plausibilities for all possible conjectures will be one. All you need to do in order to
standardize is to add up all of the products, one for each value p can take, and then divide each
product by the sum of products:

Being ***p*** the proportion of a feature,

\begin{align*}
\Large P_p={\frac {W_{{p}_{new}} \times P_{prior}}{\sum \small products}}
\end{align*}

## Exercise 2.1
There is a bag with four marbles, and we only know that they are <span style="color:blue">blue [B]</span> and
<span style="color:grey">white [W]</span>. A marble is picked from the bag putting it back after finishing, after
doing this four times we got the sequence [<span style="color:blue">B</span> <span style="color:grey">W</span> <span style="color:blue">B</span>] .

So if ***p*** is defined as the proportion of marbles that are blue, for [<span style="color:blue">B </span><span style="color:grey">W W W</span>]
with ***D<sub>new</sub>*** = [<span style="color:blue">B</span> <span style="color:grey">W</span> <span style="color:blue">B</span>],
we can say that:

> plausability of ***p*** after ***D<sub>new</sub>*** $\propto$ was ***p*** can produce
> ***D<sub>new</sub>*** $\times$ prior plausability of ***p***

| Composition | p (prop.) | Ways (W) | Plausability (P) |
| --- | --- | --- | --- |
| [<span style="color:grey">W W W W</span>] | 0 | 0 | 0 |
| [<span style="color:blue">B </span><span style="color:grey">W W W</span>] | 0.25 | 3 | 0.15 |
| [<span style="color:blue">B B</span><span style="color:grey"> W W</span>] | 0.5 | 8 | 0.4 |
| [<span style="color:blue">B B B </span><span style="color:grey">W</span>] | 0.75 | 9 | 0.45 |
| [<span style="color:blue">B B B B</span>] | 1 | 0 | 0 |