In [2]:
import pymc as pm
import numpy as np
import arviz as az

%load_ext lab_black
%load_ext watermark

# Multinomial Regression

Adapted from [unit 7: NHANESmulti.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit7/NHANESmulti.odc)

Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/paraguay.csv).

## Associated lecture video: Unit 7 Lesson 17

In [1]:
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed?v=xomK4tcePmc&list=PLv0FeK5oXK4l-RdT6DWJj0_upJOG2WKNO&index=79" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

## Problem statement

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. 

Assume that N subjects select a choice form K categories. The i-th subject is characterized by 3 covariates x[i,1], x[i,2], and x[i,3]. Given the covariates, model the probability of a subject selecting the category k, k=1,...,K.


In [21]:
# data
# fmt: off
y = np.array([[1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 0, 0, 1, 0, 0, 0, 0, 0],
              [0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
              [0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
              [0, 0, 0, 0, 0, 0, 0, 1, 1, 0]])

X = np.array([[2,  1,  1,  2,  2,  2,  3,  3,  3,  4],
              [4,  5,  6,  4,  4,  6,  3,  2,  1,  1],
              [9, 10, 14, 21, 22, 30, 33, 36, 40, 44]])
# fmt: on

In [22]:
X = X.T
X

array([[ 2,  4,  9],
       [ 1,  5, 10],
       [ 1,  6, 14],
       [ 2,  4, 21],
       [ 2,  4, 22],
       [ 2,  6, 30],
       [ 3,  3, 33],
       [ 3,  2, 36],
       [ 3,  1, 40],
       [ 4,  1, 44]])

In [24]:
y = y.T
y

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1],
       [0, 0, 0, 0, 1],
       [0, 0, 0, 1, 0]])

In [25]:
X_aug = np.concatenate((np.ones((X.shape[0], 1)), X), axis=1)
X_aug

array([[ 1.,  2.,  4.,  9.],
       [ 1.,  1.,  5., 10.],
       [ 1.,  1.,  6., 14.],
       [ 1.,  2.,  4., 21.],
       [ 1.,  2.,  4., 22.],
       [ 1.,  2.,  6., 30.],
       [ 1.,  3.,  3., 33.],
       [ 1.,  3.,  2., 36.],
       [ 1.,  3.,  1., 40.],
       [ 1.,  4.,  1., 44.]])

In [33]:
with pm.Model() as m:
    y_data = pm.Data("y", y, mutable=False)
    X_data = pm.Data("X", X_aug, mutable=True)
    
    beta = pm.Normal("beta", 0, tau=.1, shape=(4, 10))
    eta = pm.math.exp(pm.math.dot(X_data, beta))
    p = eta/pm.math.sum(eta)
    
    pm.Multinomial("likelihood", n=1, p=p, observed=y)
    
    trace = pm.sample(3000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...


ValueError: Input dimension mismatch. (input[0].shape[1] = 10, input[2].shape[1] = 5)
Apply node that caused the error: Elemwise{Composite{Switch(EQ(i0, i1), i2, (i3 * i4))}}[(0, 0)](Softmax{axis=None}.0, TensorConstant{(1, 1) of 0}, TensorConstant{[[-inf   0..inf   0.]]}, TensorConstant{[[1. 0. 0...0. 1. 0.]]}, LogSoftmax{axis=None}.0)
Toposort index: 7
Inputs types: [TensorType(float64, (None, None)), TensorType(int8, (1, 1)), TensorType(float32, (10, 5)), TensorType(float64, (10, 5)), TensorType(float64, (None, None))]
Inputs shapes: [(10, 10), (1, 1), (10, 5), (10, 5), (10, 10)]
Inputs strides: [(80, 8), (1, 1), (4, 40), (8, 80), (80, 8)]
Inputs values: ['not shown', array([[0]], dtype=int8), 'not shown', 'not shown', 'not shown']
Outputs clients: [[Sum{axis=[1], acc_dtype=float64}(Elemwise{Composite{Switch(EQ(i0, i1), i2, (i3 * i4))}}[(0, 0)].0)]]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.