## Ecological Survey

This exercise asks you to build a probabilistic model to help an ecologist with their analysis.

In [None]:
pip install pymc3

In [None]:
import math
import pymc3 as pm
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

An ecologist is doing a survey to estimate the proportion of a population of bats that come from species A and the proportion that comes from species B. The two bat species are identical in appearance apart from their size and weight. Species A is typically smaller, with a mean weight of just 10 grams. The weight of individuals within the species differ, with a standard deviation of 5 grams. Species B is slightly larger and heavier, with a mean weight of 15 grams, and a standard deviation of 2.5 grams.

Over a night time survey, the ecologist traps and weighs 20 bats. There weights are given below.

In [None]:
data = np.array([9.4, 11.0, 19.3, 8.4, 12.9, 18.8, 8.0, 17.3, 15.4, 7.5, 14.9, 21.2, 16.6, 15.1, 19.1, 12.3, 11.7, 13.9, 16.9, 18.4])

Build a probabilistic model that describes this setting and use the model to estimate the proportion of the overall bat population that comes from species A.

In [None]:
N = len(data)

model = pm.Model() 

with model:
    
    p = pm.Beta('p', alpha=1, beta=1)

    species = pm.Bernoulli('species', p=p, shape = (N))

    mu = pm.math.switch(species, 10, 15)

    sd = pm.math.switch(species, 5, 2.5)

    obs = pm.Normal('obs', mu=mu, sd=sd, observed=pm.Data('data', data))

In [None]:
with model:
    
    trace = pm.sample(draws=2000)

In [None]:
with model:

    pm.traceplot(trace, var_names=['p']);

In [None]:
p = trace['p']

plt.figure(figsize=(6, 4))
plt.hist(p, bins=100, density=True)
plt.xlim(0, 1)

plt.title("Probability density function")
plt.xlabel("Fraction of population that is species A");

We can use the trace to estimate the proportion of the bats from Species A.

In [None]:
np.mean(trace['p'])

The uncertainty in this estimate is given by the standard deviation of the samples from the posterior.

In [None]:
np.std(trace['p'])

We can also use the samples from 'species' to determine the probability of each of the bat caught being from Species A.

In [None]:
probability_species_A = np.sum(trace['species'], 0) / len(trace['p'])

In [None]:
probability_species_A[0]

In [None]:
x = np.linspace(1, N, N)

plt.figure(figsize=(6, 4))
plt.bar(x=x, height=probability_species_A)

plt.xticks(x)
plt.title("Probability that the bat is from Species A")
plt.xlabel("ID of bat");