# Chapter 15: Overview of the Generalized Linear Models (GLM)

## Types of Variables

Predicted variables are functions of predictor variables. These variables can be classified into four groups. 

1. Nominal - Unordered Categorical data such as country or political affiliation
2. Ordinal - Ordered categorical data such as positioning
3. Metric - Data that has a defined distance between each value such as temperature
4. Count - Frequency data such as population

## Linear Combinations

The key idea of GLMs is that the predicted variable is a function of the weighted sums of the predictor variables, as shown below. 

$$ y = \beta_0 + \beta_1 x $$

where $y$ is the predicted variable, $x$ is the predictor variable, and $\beta$ are the weights. This can applied to more than one predictor variable, as shown below. 

$$ y = \beta_0 + \sum_{k=1}^{N} \beta_k x_k $$

Some predictor variables do not interact in a linear way, such as 

$$ y = \beta_0 + \beta_1 x_1 x_2$$

While this is not strictly linear, vieweing $x_1 x_2$ as a single variable keeps the linear form.

Scalar values cannot be used to represent nominal data because it assumed some sort of distance. Instead, objects such as vectors can be used. For example, when dealing with sex, a vector with two components can be used. Below is an example using a dot product to match the weight vector and the variable vector. The weight values are constrained such that some baseline value for the GLM can be achieved.

$$ y = \beta_0 + \beta_{[1]} x_{[1]} + ... +  \beta_{[N]} x_{[N]} = \beta_0 + \vec{\beta} \cdot \vec{x}$$
$$ \sum_{j=1}^{J} \beta_{[j]} = 0 $$

This can also be extended to non-linear interactions by adding a new term and a constraint for the non-linear interaction.

## Linking from Combined Predictors to Noisy Predicted Data

### From predictors to predicted central tendency

The mapping to the predicted variable is called the inverse link function denoted by f(). Typically, the identity function is used

$$ y = f(lin(x)) $$

but many other exist.

#### Logistic Function

$$ y = logistic(x) =  \frac{1}{1 + e^{-x}}$$

This can be rewritten as a linear combination of predictors. 

$$ y = logistic(x;\beta_0,\beta_1) =  \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$$

and also as a function of the gain $\gamma$ and the threshold $\theta$

$$ y = logistic(x;\gamma,\theta) =  \frac{1}{1 + e^{-\gamma(x-\theta)}}$$

The threshold is the x-value for which y = 0.5 and the gain is the rate of change at that point. 

The inverse of the logistic function is the logit function. Given $0 < p < 1$, $ logit(p) = \log\frac{p}{1-p}$. Thus the logistic function can be expressed in either way:

$$ y = logistic(lin(x)) $$
$$ logit(y) = lin(x) $$

#### Cumulative Normal Function

This function acts very similarly to the logistic function. It is denoted as $\Phi(x, \mu, \sigma)$ where $\mu$ is the mean and $\sigma$ is the standard deviation. $\mu$ is the same as $\gamma$ from the logistic function. $\sigma$, on the other hand, is lower when the steepness is higher. The inverse of the cumulative normal function is called the probit function, which maps probabilities to numbers on the real line. 

### From predicted central tendency to noisy data

In real world data, the predicted variable can never be deterministically calculated from the predictors. The best that can be done is to calculate the probability of the central tendency of the predicted variable. 

$$ y \sim pdf(\mu, [\text{parameters}]) $$

Depending on the type of data, the probability distribution and hence the parameters are different. Some typical ones are shown below.

|         |                                     Inverse Link Function                                     |         Noise Distribution         |
|:-------:|:---------------------------------------------------------------------------------------------:|:----------------------------------:|
|  Metric |                                         $\mu$ = lin(x)                                        |    y $\sim$ normal($\mu,\sigma$)   |
|  Binary |                                    $\mu$ = logistic(lin(x))                                   |      y $\sim$ bernoulli($\mu$)     |
| Nominal |                        $\mu_k = \frac{e^{lin_k(x)}}{\sum e^{lin_c(x)}}$                       |  y $\sim$ nominal($...,\mu_k,...$) |
| Ordinal | $\mu_k$ = $\Phi(\frac{\theta_k - lin(x)}{\sigma}) - \Phi(\frac{\theta_{k-1}-lin(x)}{\sigma})$ |  y $\sim$ nominal($...,\mu_k,...$) |
|  Count  |                                       $\mu = e^{lin(x)}$                                      |       y $\sim$ poisson($\mu$)      |

### Formal Expression

The Generalized Linear Model can finally be written as:

$$ \mu = f(lin(x),[\text{parameters}]) $$
$$ y \sim pdf(\mu,[\text{parameters}]) $$

# Chapter 15: Exercises

## Exercise 15.1

For each of the examples below, (1) identify the predicted variable and its scale type, (2) identify the predictor variable(s) and its scale type, (3) and identify the type GLM.

A. Guber (1999) examined average performance by public high school students on the SAT as a function of how much money was spent per pupil by the state, and what percentage of eligible students actually took the exam.

Answer: 

1. average SAT score (metric)
2. amount of money spent per pupil (metric), percentage of students that took the exam (metric)
3. metric y scale and two metrix x scale

B. Hahn, Chater, and Richardson (2003) were interested in perceived similarity of simple geometric patterns. Human observers rated pairs of patterns for how similar the patterns appeared, by circling one of the digits 1–7 printed on the page, where 1 meant “very dissimilar” and 7 meant “very similar.” The authors presented a theory of perceived similarity, in which patterns are perceived to be dissimilar to the extent that it takes more geometric transformations to produce one pattern from the other. The theory specified the exact number of transformations needed to get from one pattern to the other.

Answer:

1. Similarity rating (ordinal)
2. Number of transformation between patterns (metric)
3. ordinal y scale and metric x scale

C. RL Berger, Boos, and Guess (1988) were interested in the longevity of rats, measured in days, as a function of the rat’s diet. One group of rats fed freely, another group of rats had a very low calorie diet.

Answer:

1. Number of living days (metric)
2. Diet (nominal)
3. metric y scale and 2 factor nominal x scale

D. McIntyre (1994) was interested in predicting the tar content of a cigarette (measured in milligrams) from the weight of the cigarette.

Answer:

1. Tar content in mg in a cigarette (metric)
2. Weight of a cigarette (metric)
3. metric y scale and metric x scale

E. You are interested in predicting the gender of a person, based on the person’s height and weight.

Answer:

1. Gender (nominal)
2. Height (metric), weight (metric)
3. nominal y scale and two metric x scale 

F. You are interested in predicting whether a respondent will agree or disagree with the statement, “The United States needs a federal health care plan with a public option,” on the basis of the respondent’s political party a liation.

Answer:

1. Agree or Disagree (dichotomous)
2. Political affiliation (nominal)
3. dichotomous y scale and nominal x scale
