# Multinomial and Categorical Models

In [2]:
import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt

from scipy import stats
from scipy.special import expit as logistic
from scipy.special import softmax

%config InlineBackend.figure_format = 'retina'
warnings.simplefilter(action="ignore", category=FutureWarning)
RANDOM_SEED = 8927
np.random.seed(286)

In [3]:
az.style.use("arviz-darkgrid")
az.rcParams["stats.credible_interval"] = 0.89


def standardize(series):
    """Standardize a pandas series"""
    return (series - series.mean()) / series.std()

The binomial distribution is relevant when there are only two things that can happen, and we count those things. In general, more than two things can happen. For example, recall the bag of marbles from may back in ch2.  It contained only blue and white marbles, but suppose we introduced red marbles as well. Now each draw from the bag can be one of three categories, and the count that accumulates is across all three categories.

When more than two types of unordered events are possible, and the probablity of each type of event is constant across trials, then the maximum entropy distribution is the MULTINOMIAL DISTRIBUTION.

A model built on a nultinomial distribution may also be called a CATEGORICAL regression, usually when each event is isolated on a single row, like with logistic regression. In machine learning this model type is sometimes known as the MAXIMUM ENTROPY CLASSIFIER

The conventional and natural link in this context is the MULTINOMIAL LOGIT, aso known as the SOFTMAX function. This link function takes a vector of scores, one or each K event types, and computer the probability of a particular type of event k.

Combined with this conventional link, his type of GLM may be called MULTINOMIAL LOGISTIC REGRESSION. 

## Example - Predictors matched to outcomes

For example, suppose you are modelling choice of career for a number of young adults. One of the relevat predictor variables is expected income. In that case, the same parameter B_income appears in each linear model, in order to estimate the impact of the income trait on the probability a career is chosen. But a different income value multiplies the parameter in each linear model.

The code below simulates career choice from three different careers, each with its own income trait. These traists are used to assign a score to each type of event. Then when the model is fit to data, one of this scores is held constant, and the other two scores are estimated, using the known income traits.

In [4]:
# simulate career choices among 500 individuals
N = 500  # number of individuals
income = np.array([1, 2, 5])  # expected income of each career
score = 0.5 * income  # score for each career, based on income
# converts scores to probabilities:
p = softmax(score)

# now simulate choice
# outcome career holds event type values, not counts
career = np.random.multinomial(1, p, size=N)
career = np.where(career == 1)[1]
career[:11], score, p

(array([2, 2, 2, 2, 0, 2, 1, 2, 2, 2, 2]),
 array([0.5, 1. , 2.5]),
 array([0.09962365, 0.16425163, 0.73612472]))