# Probabilistic Model Evaluation

When using deep learning models in production,
we do not have the luxury of having access to ground-truth.
However, it may be feasable to integrate a human domain expert
into production workflows.

In order to maximize throughput,
we must be selective about when we alert our human expert.
If we can understand the reliability of our model predictions
as a function of its self-reported confidence,
we can use this to guide business logic in our application.

## Setup Environment

In [2]:
import pandas as pd
import tensorflow as tf
import tensorflow_probability as tfp

## Probabilistic Model

The chance of our model making a valid prediction
can be thought of as a
[Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution)
random variable:

$$\text{Valid Prediction}, V_i \sim \text{Ber}( p(c_i) ), i=1..N$$

where $p(c_i)$ is a
[logistic function](https://en.wikipedia.org/wiki/Logistic_function)
of the prediction confidence:

$$p(c) = \frac{1}{1+e^{\beta c + \alpha}}$$

We can use our ground-truth validations to evaluate
the likelihood that a given *logistic function*
with parameters $\alpha$ and $\beta$
"explains" the data.

This likelihood defines the following
joint log probability optimization function:

In [8]:
def validation_joint_log_prob(validation, confidence, alpha, beta):
    prior_alpha = tfp.distributions.Normal(loc=0., scale=10)
    prior_beta = tfp.distributions.Normal(loc=0, scale=100)
    
    logistic_p = 1./(1. + tf.exp(beta * confidence + alpha))
    expected = tfp.distributions.Bernoulli(probs=logistic_p)
    
    return (
        prior_alpha.log_prob(alpha)
        + prior_beta.log_prob(beta)
        + tf.reduce_sum(expected.log_prob(validation))
    )

## Hamiltonian Monte Carlo

In [24]:
def hamiltonian_monte_carlo(
    validation,
    confidence,
    num_steps=10000,
    num_leapfrog_steps=4,
    burnin=2000,
):
    # Initialize the HMC
    initial_chain_state = [
        0. * tf.ones([], name="init_alpha"),
        0. * tf.ones([], name="init_beta"),
    ]
    
    # Rescale "beta" to be 10x magnitude of "alpha"
    unconstraining_bijectors = [
        tfp.bijectors.Identity(),
        tfp.bijectors.AffineScalar(scale=10),
    ]
    
    # Create a closure with our input data
    unnormalized_posterior_log_prob = (
        lambda *args: validation_joint_log_prob(
            validation,
            confidence,
            *args,
        )
    )
    
    # Define the HMC
    hmc = tfp.mcmc.TransformedTransitionKernel(
        inner_kernel=tfp.mcmc.SimpleStepSizeAdaptation(
            tfp.mcmc.HamiltonianMonteCarlo(
                target_log_prob_fn=unnormalized_posterior_log_prob,
                num_leapfrog_steps=num_leapfrog_steps,
                step_size=1,
            ),
            num_adaptation_steps=int(burnin * 0.8),
        ),
        bijector=unconstraining_bijectors,
    )
    
    # Sample from the chain
    [
        posterior_alpha,
        posterior_beta,
    ], kernel_results = tfp.mcmc.sample_chain(
        num_results=num_steps,
        num_burnin_steps=burnin,
        current_state=initial_chain_state,
        kernel=hmc,
    )
    
    return posterior_alpha, posterior_beta, kernel_results

## Example: Reasonable Data

In [25]:
reasonable_data = pd.read_csv("validations/reasonable.csv")
reasonable_data

Unnamed: 0,confidence,validation
0,0.039997,0
1,0.603689,1
2,0.868116,1
3,0.133295,1
4,0.364439,1
...,...,...
995,0.767571,1
996,0.313861,1
997,0.196307,0
998,0.718977,1


In [None]:
%%time

reasonable_alpha, reasonable_beta, kernel_results = (
    hamiltonian_monte_carlo(
        reasonable_data["validation"],
        reasonable_data["confidence"],
    )
)



## References

* https://colab.research.google.com/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_TFP.ipynb
* https://www.tensorflow.org/probability/api_docs/python/tfp/mcmc/HamiltonianMonteCarlo