# Binomial on 2D data

Our model will be a binomial distribution per bin on a 2D grid. One dimension will be the distance to the sea (i.e. longitude if the sea shore is North-South) and the second would be distance to the river (i.e. latitude following the provious example). 

The probability of sucess of the binomial distribution will come from a $\beta(a,b)$ distribution. Spatial information is relevant because the $a$ will only vary with the river distance and $b$ will only vary with sea distance.

Therefore, we have a grid ${{x_i, y_j}} \forall i=1:N, j=1:M$ , where each $x_i, y_j$ pair (district) has 2 data values, the total number of votes and the number of votes of the right wing party (it is a 2 party political system, thus, total-right=left wing votes).

It does not have much sense, but we know that both variables follow a binomial distribution $votes(x,y) \sim B\Big(N_{inhabitants}, \frac{0.9}{1+\text{Beta}\big(\alpha(x),\beta(y)\big)}\Big)$ and $right(x,y) \sim B\Big(votes(x,y), \text{Beta}\big(\alpha(x),\beta(y)\big)\Big)$. Therefore, our model has $N$ _plus_ $M$ parameters, instead of the product that would be if each district was independent.

## PyStan code

In [1]:
import pystan
import pandas as pd
import numpy as np
import arviz as az
import matplotlib.pyplot as plt

In [2]:
N_inhabitants = 26000
data = pd.read_csv("2D_data_N_inhabitants_{}.csv".format(N_inhabitants)).set_index(["category","number"])
Total = data.loc["total"].values
Right = data.loc["right"].values
N, M = Total.shape

In [3]:
N,M

(13, 8)

In [4]:
# restricted model
binomial_on_2D_code = """
data {
    int<lower=1> N;     // num of x, or num of river_distance values
    int<lower=1> M;     // num of y, or num of sea_distance values
    int<lower=1> N_inhabitants;

    int Total[N,M];
    int Right[N,M];
}

parameters {
    vector<lower=0, upper=2>[N] alphas;     
    vector<lower=0, upper=2>[M] betas;
    real<lower=0, upper=1> p_intention[N,M];
    real<lower=0, upper=1> p_aux[N,M];
}

transformed parameters {
    real<lower=0, upper=1> p_participation[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            p_participation[n,m] = .9/(1+p_aux[n,m]);
        }
    }
    
}



model {

    for (n in 1:N){
        for (m in 1:M){
            p_intention[n,m] ~ beta(alphas[n], betas[m]);
            p_aux[n,m] ~ beta(alphas[n], betas[m]);
            Total[n,m] ~ binomial(N_inhabitants, p_participation[n,m]);
            Right[n,m] ~ binomial(Total[n,m], p_intention[n,m]);
        }
    }
}

generated quantities {
    real log_lik[N,M];
    real Total_hat[N,M];
    real Right_hat[N,M];
    
    for (n in 1:N){
        for (m in 1:M){
            log_lik[n,m] = binomial_lpmf(Total[n,m] | N_inhabitants, p_participation[n,m]) + 
                           binomial_lpmf(Right[n,m] | Total[n,m], p_intention[n,m]);
            Total_hat[n,m] = binomial_rng(N_inhabitants, p_participation[n,m]);
            Right_hat[n,m] = binomial_rng(Total[n,m], p_intention[n,m]);
        }
    }
}
"""

In [5]:
binomial_on_2D_dat = {
    'N': N,
    'M': M,
    "N_inhabitants": N_inhabitants,
    'Total': Total,
    'Right': Right,
}

sm = pystan.StanModel(model_code=binomial_on_2D_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_0518c242153eb4cce33b8e6bff5e3fbf NOW.


In [6]:
fit = sm.sampling(
    data=binomial_on_2D_dat, 
    iter=1000, 
    chains=4, 
    pars=[
        'alphas', 
        'betas',
        'p_intention',
        'p_participation',
        'Total_hat',
        'Right_hat',
        'log_lik'
    ]
)

In [7]:
dims = {"alphas":["river_distance"], 
        "betas":["sea_distance"], 
        "p_intention": ["river_distance", "sea_distance"],  
        "p_participation": ["river_distance", "sea_distance"], 
        "Total": ["river_distance", "sea_distance"], 
        "Right": ["river_distance", "sea_distance"], 
        "Total_hat": ["river_distance", "sea_distance"], 
        "Right_hat": ["river_distance", "sea_distance"], 
        "log_lik": ["river_distance", "sea_distance"]}
coords = {"river_distance":range(N), "sea_distance": range(M)}
idata = az.from_pystan(
    posterior=fit,
    observed_data=['Total', 'Right'],
    posterior_predictive=['Total_hat', 'Right_hat'],
    log_likelihood="log_lik",
    coords=coords,
    dims=dims
)

In [8]:
idata.to_netcdf("binomial_on_2D_pystan.nc")

'binomial_on_2D_pystan.nc'