In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline 

# Bayesian Adjustment

### Background and Motivation 

Since we are segmenting judge's grant/appeal/appeal-grant rates within various appeal segments, we need an elegant way to derive these rates for groups that have very small, or no sample sizes. To this end, we will leverage the beta-binomial Bayesian model, where we assign priors that approximate the overall grant rate within each segment aggregated across judges. We then update the posterior distribution with each judge's data to obtain the 'Bayesian-adjusted' judge's rate (this is also known as the posterior mean). We will define this formally below: 

### Formal Definition  

Let $n_{i,j}$ be the total number of cases belonging to segment $i$ (e.g. Chinese Nationality is an example of a segment) and judge $j$

and $g_{i,j}$ be the number of such cases that were granted (or appealed, or appeal-granted) 

and $\theta_{i,j}$ be the probability that the judge grants a given case from segment $i$

We then assume $\theta_{i,j} \sim Beta(\alpha, \beta)$ and choose Beta priors $\alpha_i$ and $\beta_i$ such that: 
- the effective sample size is $\alpha_i + \beta_i = 10$ 
- the prior mean $\frac{\alpha_i}{\alpha_i + \beta_i}$ approximates the aggregate grant rate, rounded to the nearest 10% (e.g. if aggregate grant rate is 30%, set $\alpha_i=3$ and $\beta_i=7$) 

After observing each judge $i$'s data (i.e. granting $g_{i,j}$ out of $n_{i,j}$), we then update our posterior distribution to $\theta_{i,j} \sim Beta(\alpha_i + g_{i,j}, \beta_i + n_{i,j} - g_{i,j})$. This yields a posterior mean that we would use as our Bayesian-adjusted judge's appeal grant rate: $\frac{\alpha_i + g_{i,j}}{(\alpha_i + \beta_i) + n_{i,j}}$. 

### Worked Example 

Suppose there are 40000 cases in which respondents are of Chinese Nationality in total, of which 11000 were granted. 

This means the aggregate grant rate is $\frac{11000}{40000} = 0.275 = 0.3$ (rounded to nearest 10%). 

We would then chose Beta priors $\alpha_i=3$ and $\beta_i=7$ such that $\frac{\alpha_i}{\alpha_i + \beta_i} = \frac{3}{3 + 7} = 0.3 $ to reflect this judge-agnostic aggregate grant rate for Chinese Nationality, while maintaining an effective sample size of $\alpha_i + \beta_i = 3 + 7 = 10$. Effectively this means that our prior beliefs are given weight equivalent to that of 10 observed samples. 

Now suppose we have a Judge 1 that saw $n_{i,1}=40$ cases in which respondents are of Chinese nationality, of which he granted $g_{i,1}=28$ (or 70%) of them. Then his 'Bayesian-adjusted' grant rate (i.e. posterior mean) is $\frac{\alpha_i + g_{i,j}}{(\alpha_i + \beta_i) + n_{i,j}} = \frac{3 + 28}{10 + 40}$ = 62%

Contrast this with Judge 2 that saw $n_{i,2}=10$ cases of Chinese nationality, of which he granted 7 (or 70%) of them. Empirically, Judge 2 approved the same percentage of cases as Judge 1, but we have much less data and thus lower certainty that Judge 2 is indeed predisposed to granting more Chinese nationals than average. This is reflected in his 'Bayesian-adjusted' grant rate, which works out to $\frac{3 + 7}{10 + 10}$ = 50%, which deviates less from the prior mean than Judge 1. 

### Code

Below are methods that can be used to: 
- calibrate Beta priors ($\alpha, \beta$) based on the aggregate grant rate for a given segment 
- compute 'Bayesian-adjusted' grant rate (or posterior mean) based on calibrated priors and observed data (num_total, num_positives)
- tie the above into a single function  

In [32]:
# calibrate beta priors 

def calibrate_beta_priors(aggregate_mean): 
    """
    Takes aggregate rate and return Beta priors (alpha, beta) with prior mean approximating aggregate rate, 
    with effective sample size of 10 
    """
    
    rounded_rate = np.round(aggregate_mean, 1)
    alpha = int(rounded_rate * 10) 
    beta = 10 - alpha 
    
    return alpha, beta 

In [33]:
calibrate_beta_priors(aggregate_mean=0.275)

(3, 7)

In [34]:
# compute posterior mean given beta priors and observed data 

def compute_posterior_mean(alpha_prior, beta_prior, num_positives, num_total): 
    """ 
    Takes Beta priors (alpha, beta) along with observed data (num_total, num_positives) 
    and returns posterior mean 
    """
    
    updated_alpha = alpha_prior + num_positives 
    updated_beta = beta_prior + num_total - num_positives 
    
    posterior_mean = float(updated_alpha) / (updated_alpha + updated_beta)
    
    return posterior_mean 

In [35]:
compute_posterior_mean(alpha_prior=3, beta_prior=7, num_positives=28, num_total=40)

0.62

In [37]:
# calibrate beta priors 

def get_beta_adj_rate(aggregate_mean, num_positives, num_total): 
    """ 
    Takes aggregate mean as a float (from 0 to 1), num_total (integer), and num_positives (integer) 
    and return the 'Beta-adjusted' rate. 
    Example: if in total 30% of Chinese nationality cases were granted, and a specific judge saw 20 cases 
    and granted 14 of them, input aggregate_mean=0.3, num_total=20, and num_positives=14 
    """
    
    if type(aggregate_mean) is not float: 
        raise ValueError("Please enter a float for aggregate mean!")
        
    if aggregate_mean < 0 or aggregate_mean > 1: 
        raise ValueError("Aggregate mean must be between 0 and 1!")
            
    alpha_prior, beta_prior = calibrate_beta_priors(aggregate_mean)
    posterior_mean = compute_posterior_mean(alpha_prior, beta_prior, num_positives, num_total)
    
    return posterior_mean

In [39]:
get_beta_adj_rate(aggregate_mean=0.3, num_positives=28, num_total=40)

0.62