## Fish in a pond estimate

Assume we have a pond with an unknown number of fish. We put a marker on a number of the fish (say 100) in the pond and release them back to the pond. Later on we catch 22 fish from the pond and measure that 9 out of 22 have the mark. What can we deduct on the total number of fish in the pond?

In [1]:
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
M = 100
n = 22
X = 9

In [3]:
print(M, n, X)
print(X/n)
print(M/(X/n))

100 22 9
0.4090909090909091
244.44444444444443


## Bootstrap method to find the CI

In this method, we take the $n (=22)$ samples, and we resample from that, with replacement. We repeat this many times and each time measure how many marked fish out of 22 were recaptured; then we plot the distribution. This distribution resembles the distribution of the population.

We then find the 95% interval for this distribution, that is the lower bound of 2.5% and the upper bound of 97.5%.

This would be the estimated Confidence Interval, using the Bootstrap method.

In [None]:
# Your code goes here

# Full simulation

This time, we consider a range of meaningful values for $N$, and for each value, we do many iterations of the simulation. Each iteration involves assuming the value for $N$, making $n$ draws and counting how many marked fish were among them. This gives us a probability value of $p$ that we can now plot over all values of $N$ and find the confidence interval.

In [None]:
# Your code goes here

# Analythic method based on Bayes theorem

Instead of simulation, we can use an analytical approach to derive an exact solution.

The Bayesian probability formula states: $P(A|B) = P(B|A) * P(A) / P(B)$.

We want to find $P(N | 9-out-of-22)$, while we have a closed form for the $P(9-out-of-22 | N)$; and that is: $C(22, 9) \cdot p^9 \cdot (1-p)^{13}$.

Using Bayes, we write: $P(N | 9-out-of-22) = P(9-out-of-22 | N) * P(N) / P(9-out-of-22)$.

We might need more insight for the $P(N)$ term, as a subject matter expert might already have an <it>a priori</it> opinion on this part. However, the $P(9-out-of-22)$ part is independent of $N$ and hence can be simplified later on.

In [None]:
# Your code goes here