# Imports

In [2]:
import arviz as az
import numpy as np
import pymc3 as pm

# Data Extraction

In [3]:
csv_data = np.loadtxt('problem2.csv', delimiter=',', skiprows=1)
X1 = csv_data[:, 0]
X2 = csv_data[:, 1]
Y = csv_data[:, 2]

# Model

In [5]:
with pm.Model() as model:
    # Data
    x1_data = pm.Data("x1_data", X1)
    x2_data = pm.Data("x2_data", X2)
    y_data = pm.Data("y_data", Y)

    # Priors
    beta0 = pm.Normal('beta0', mu=0, sigma=100)
    beta1 = pm.Normal('beta1', mu=0, sigma=100)
    beta2 = pm.Normal('beta2', mu=0, sigma=100)
    alpha0 = pm.Normal('alpha0', mu=0, sigma=100)
    alpha1 = pm.Normal('alpha1', mu=0, sigma=100)
    alpha2 = pm.Normal('alpha2', mu=0, sigma=100)
    
    # Linear Model
    mu = beta0 + (beta1 * x1_data) + (beta2 * x2_data)

    # Variance
    sigma_squared = pm.math.exp(alpha0 + (alpha1 * x1_data) + (alpha2 * x2_data))

    # Likelihood
    y_likelihood = pm.Normal('y', mu=mu, sigma=pm.math.sqrt(sigma_squared), observed=y_data)

    # Posterior Sampling
    trace = pm.sample(11000, tune=1000, return_inferencedata=True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha2, alpha1, alpha0, beta2, beta1, beta0]


  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
Sampling 4 chains for 1_000 tune and 11_000 draw iterations (4_000 + 44_000 draws total) took 16 seconds.


# Show Statistics

In [6]:
az.summary(trace, hdi_prob=0.95)

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta0,8.499,0.011,8.478,8.52,0.0,0.0,27288.0,25935.0,1.0
beta1,2.99,0.016,2.957,3.022,0.0,0.0,29100.0,26848.0,1.0
beta2,0.019,0.05,-0.081,0.114,0.0,0.0,36528.0,30710.0,1.0
alpha0,-5.908,0.123,-6.146,-5.665,0.001,0.001,25263.0,26154.0,1.0
alpha1,-0.165,0.159,-0.477,0.148,0.001,0.001,30318.0,28327.0,1.0
alpha2,9.965,0.154,9.664,10.264,0.001,0.001,31467.0,28730.0,1.0


# Variable Significance

## Significance for the Mean
Juding by the 95% credible interval ranges of the vairables, `beta0` and `beta1` have 95% credible intervals of `[8.478, 8.520]` and `[2.957, 3.022]`, respectively. These variables are statistically significant as these 95% credible interval ranges do not include 0. Moreover, they are narrow ranges, suggesting high confidence.

## Significance for the Variance
Juding by the 95% credible interval ranges of the vairables, `alpha0` and `alpha2` have 95% credible intervals of `[-6.146, -5.665]` and `[9.664, 10.264]`, respectively. These variables are statistically significant as these 95% credible interval ranges do not include 0. Moreover, they are narrow ranges, suggesting high confidence.

In [7]:
# Obtain the posterior mean of alpha0, alpha1, and alpha2
posterior_mean_alpha0 = np.mean(trace.posterior.alpha0.values)
posterior_mean_alpha1 = np.mean(trace.posterior.alpha1.values)
posterior_mean_alpha2 = np.mean(trace.posterior.alpha2.values)

# Obtain the max x1 value and min x2 value
max_x1 = np.max(X1)
min_x2 = np.min(X2)

# Calculate the minimized variance
minmized_variance = pm.math.exp(posterior_mean_alpha0 + (posterior_mean_alpha1 * max_x1) + (posterior_mean_alpha2 * min_x2))

# Print the results
print("Posterior Mean Alpha0: ", posterior_mean_alpha0)
print("Posterior Mean Alpha1: ", posterior_mean_alpha1)
print("Posterior Mean Alpha2: ", posterior_mean_alpha2)
print("Max X1: ", max_x1)
print("Min X2: ", min_x2)
print("Minimized Variance in Data: ", minmized_variance.eval())

Posterior Mean Alpha0:  -5.90780793584607
Posterior Mean Alpha1:  -0.16520765681015462
Posterior Mean Alpha2:  9.965230118902843
Max X1:  1.0
Min X2:  0.0
Minimized Variance in Data:  0.0023042141517580373


# Minimize the Variance

Using the posterior means obtained above for `alpha0`, `alpha1`, and `alpha2`, the equation for variance is:

`variance = pm.math.exp(-5.907 - 0.166 * x1 + 9.964 * x2)`

## Theoretical Answer
Based on this, the more negative the exponent, the lower the variance. In other words, as the the exponent tends toward negative infinity (`-∞`), the variance tends toward `0`. Since the `x1` coefficient is negative, the value of `x1` that contributes to reducing the variance the most is the largest possible `x1` value (the larger the value subtracted, the more negative the exponent). Since the `x2` coefficent is positive, the value of `x2` that contributes to lowering the variance most is the smallest possible `x2` value. Based on this, the `(x1, x2)` pair that (theoretically) mimizes variance is `(∞, -∞)`.

## If Bounded By Values in Data (Just for Fun)
Just for fun, the maximum `x1` value in the data 1.0 and the minmum `x2` value is `0.0`. If constrained by these values (which we are theroetically not constrined by), the variance for `[1.0, 0.0]` would be `0.002305`.
