# Binomial on 2D data

Our model will be a betabinomial distribution per bin on a 2D grid. One dimension will be the distance to the sea (i.e. longitude if the sea shore is North-South) and the second would be distance to the river (i.e. latitude following the provious example). 

The probability of sucess of the binomial distribution will come from a $\beta(a,b)$ distribution. Spatial information is relevant because the $a$ will only vary with the river distance and $b$ will only vary with sea distance.

Therefore, we have a grid ${{x_i, y_j}} \forall i=1:N, j=1:M$ , where each $x_i, y_j$ pair (district) has 2 data values, the total number of votes and the number of votes of the right wing party (it is a 2 party political system, thus, total-right=left wing votes).

Here, the total number of votes will be considered as known data. It does not have much sense, but we know the distribution of $right(x,y) \sim BetaBinomial\Big(votes(x,y), \alpha(x),\beta(y)\Big)$. Therefore, our model has $N$ _plus_ $M$ parameters, instead of the product that would be if each district was independent.

## Load data

In [1]:
import pystan
import pandas as pd
import numpy as np
import arviz as az
import matplotlib.pyplot as plt

In [2]:
nchains = 4
ndraws = 1000

In [3]:
N_inhabitants = 26000
data = pd.read_csv("2D_data_N_inhabitants_{}.csv".format(N_inhabitants)).set_index(["category","number"])
Total = data.loc["total"].values
Right = data.loc["right"].values
N, M = Total.shape

In [4]:
N,M

(13, 8)

In [5]:
binomial_on_2D_dat = {
    'N': N,
    'M': M,
    'Total': Total,
    'Right': Right,
}
coords = {"river_distance":range(N), "sea_distance": range(M)}

## PyStan code

In [6]:
binomial_on_2D_code = """
data {
    int<lower=1> N;     // num of x, or num of river_distance values
    int<lower=1> M;     // num of y, or num of sea_distance values

    int Total[N,M];
    int Right[N,M];
}

parameters {
    vector<lower=0>[N] alphas;     
    vector<lower=0>[M] betas;
}

model {

    for (n in 1:N){
        for (m in 1:M){
            Right[n,m] ~ beta_binomial(Total[n,m], alphas[n], betas[m]);
        }
    }
}

generated quantities {
    real log_lik[N,M];
    real Right_hat[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            log_lik[n,m] = beta_binomial_lpmf(Right[n,m] | Total[n,m], alphas[n], betas[m]);
            Right_hat[n,m] = beta_binomial_rng(Total[n,m], alphas[n], betas[m]);
        }
    }
}
"""

In [7]:
sm = pystan.StanModel(model_code=binomial_on_2D_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_e8837c52e06dfb51a3042c4705a0427e NOW.


In [8]:
fit = sm.sampling(
    data=binomial_on_2D_dat, 
    iter=2*ndraws, 
    chains=nchains)

In [9]:
dims = {"alphas":["river_distance"], 
        "betas":["sea_distance"], 
        "Total": ["river_distance", "sea_distance"], 
        "Right": ["river_distance", "sea_distance"], 
        "Right_hat": ["river_distance", "sea_distance"], 
        "log_lik": ["river_distance", "sea_distance"]}
idata = az.from_pystan(
    posterior=fit,
    observed_data=['Total', 'Right'],
    posterior_predictive=['Right_hat'],
    log_likelihood="log_lik",
    coords=coords,
    dims=dims
)

In [10]:
az.loo(idata)

  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "


loo           2008.09
loo_se        18.9396
p_loo         29.3179
loo_scale    deviance
dtype: object

In [11]:
idata.to_netcdf("binomial_on_2D_intention_pystan.nc")

'binomial_on_2D_intention_pystan.nc'

## Constant success probability model
Now the model will be $right(x,y) = B(votes(x,y), p_{intention})$, being $p_{intention}$ constants.

In [12]:
binomial_on_2D_code_constant = """
data {
    int<lower=1> N;     // num of x, or num of river_distance values
    int<lower=1> M;     // num of y, or num of sea_distance values

    int Total[N,M];
    int Right[N,M];
}

parameters {
    real<lower=0, upper=1> p_intention;
}

model {

    for (n in 1:N){
        for (m in 1:M){
            Right[n,m] ~ binomial(Total[n,m], p_intention);
        }
    }
}

generated quantities {
    real log_lik[N,M];
    real Right_hat[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            log_lik[n,m] = binomial_lpmf(Right[n,m] | Total[n,m], p_intention);
            Right_hat[n,m] = binomial_rng(Total[n,m], p_intention);
        }
    }
}
"""

In [13]:
sm_constant = pystan.StanModel(model_code=binomial_on_2D_code_constant)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_d65ac6ef092d8f308c1f4925defd1625 NOW.


In [14]:
fit_constant = sm_constant.sampling(
    data=binomial_on_2D_dat, 
    iter=2*ndraws, 
    chains=nchains)

In [15]:
dims = {"Total": ["river_distance", "sea_distance"], 
        "Right": ["river_distance", "sea_distance"], 
        "Right_hat": ["river_distance", "sea_distance"], 
        "log_lik": ["river_distance", "sea_distance"]}
idata_constant = az.from_pystan(
    posterior=fit_constant,
    observed_data=['Total', 'Right'],
    posterior_predictive=['Right_hat'],
    log_likelihood="log_lik",
    coords=coords,
    dims=dims
)

In [16]:
az.loo(idata_constant)

  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "


loo            433725
loo_se        45726.2
p_loo         2564.94
loo_scale    deviance
dtype: object

In [17]:
idata_constant.to_netcdf("binomial_on_2D_intention_pystan_p_constant.nc")

'binomial_on_2D_intention_pystan_p_constant.nc'

## Only 1D variation in $N$ dimension (variation in $\alpha$ with betabinomial)
The third modelling option will be a variation on the first model: $right(x,y) \sim BetaBinomial\Big(votes(x,y), \alpha(x),\beta\Big)$.

In [18]:
binomial_on_2D_code_1D_a = """
data {
    int<lower=1> N;     // num of x, or num of river_distance values
    int<lower=1> M;     // num of y, or num of sea_distance values

    int Total[N,M];
    int Right[N,M];
}

parameters {
    vector<lower=0>[N] alphas;     
    real<lower=0> beta;
}

model {

    for (n in 1:N){
        for (m in 1:M){
            Right[n,m] ~ beta_binomial(Total[n,m], alphas[n], beta);
        }
    }
}

generated quantities {
    real log_lik[N,M];
    real Right_hat[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            log_lik[n,m] = beta_binomial_lpmf(Right[n,m] | Total[n,m], alphas[n], beta);
            Right_hat[n,m] = beta_binomial_rng(Total[n,m], alphas[n], beta);
        }
    }
}
"""

In [19]:
sm_1D_a = pystan.StanModel(model_code=binomial_on_2D_code_1D_a)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3d9409b1a82d2ac0803e0c4030d637bc NOW.


In [20]:
fit_1D_a = sm_1D_a.sampling(
    data=binomial_on_2D_dat, 
    iter=2*ndraws, 
    chains=nchains)

In [21]:
dims = {"alphas":["river_distance"], 
        "Total": ["river_distance", "sea_distance"], 
        "Right": ["river_distance", "sea_distance"], 
        "Right_hat": ["river_distance", "sea_distance"], 
        "log_lik": ["river_distance", "sea_distance"]}
idata_1D_a = az.from_pystan(
    posterior=fit_1D_a,
    observed_data=['Total', 'Right'],
    posterior_predictive=['Right_hat'],
    log_likelihood="log_lik",
    coords=coords,
    dims=dims
)

In [22]:
az.loo(idata_1D_a)

loo           2016.11
loo_se         12.116
p_loo         15.2194
loo_scale    deviance
dtype: object

In [23]:
idata_1D_a.to_netcdf("binomial_on_2D_intention_pystan_1D_a.nc")

'binomial_on_2D_intention_pystan_1D_a.nc'

## Only 1D variation in $N$ dimension (variation in $\beta$ with betabinomial)
The third modelling option will be a variation on the first model: $right(x,y) \sim BetaBinomial\Big(votes(x,y), \alpha, \beta(y)\Big)$.

In [24]:
binomial_on_2D_code_1D_b = """
data {
    int<lower=1> N;     // num of x, or num of river_distance values
    int<lower=1> M;     // num of y, or num of sea_distance values

    int Total[N,M];
    int Right[N,M];
}

parameters {
    real<lower=0> alpha;     
    vector<lower=0>[M] betas;
}

model {

    for (n in 1:N){
        for (m in 1:M){
            Right[n,m] ~ beta_binomial(Total[n,m], alpha, betas[m]);
        }
    }
}

generated quantities {
    real log_lik[N,M];
    real Right_hat[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            log_lik[n,m] = beta_binomial_lpmf(Right[n,m] | Total[n,m], alpha, betas[m]);
            Right_hat[n,m] = beta_binomial_rng(Total[n,m], alpha, betas[m]);
        }
    }
}
"""

In [25]:
sm_1D_b = pystan.StanModel(model_code=binomial_on_2D_code_1D_b)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b0daf9aaca2c157c6840a11b724d4f91 NOW.


In [26]:
fit_1D_b = sm_1D_b.sampling(
    data=binomial_on_2D_dat, 
    iter=2*ndraws, 
    chains=nchains)

In [27]:
dims = {"betas":["sea_distance"], 
        "Total": ["river_distance", "sea_distance"], 
        "Right": ["river_distance", "sea_distance"], 
        "Right_hat": ["river_distance", "sea_distance"], 
        "log_lik": ["river_distance", "sea_distance"]}
idata_1D_b = az.from_pystan(
    posterior=fit_1D_b,
    observed_data=['Total', 'Right'],
    posterior_predictive=['Right_hat'],
    log_likelihood="log_lik",
    coords=coords,
    dims=dims
)

In [28]:
az.loo(idata_1D_b)

  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "


loo           1995.26
loo_se        12.8462
p_loo         10.2907
loo_scale    deviance
dtype: object

In [30]:
idata_1D_b.to_netcdf("binomial_on_2D_intention_pystan_1D_b.nc")

'binomial_on_2D_intention_pystan_1D_b.nc'