#Chapter 14.  Overview of the GLM

## -----------------------------------------------------------------------------------------------------------------------------
## Contents
### 14.1 The GLM
### 14.2 Cases of the GLM
## -----------------------------------------------------------------------------------------------------------------------------

### 14.1 The Generalized Linear Model(GLM)

### 14.1.1 Predictor and predicted variables

Suppose we want to predict someone’s weight from their height. 
> Predicted variable(독립변수): weight

> Predictor(설명변수): height

suppose we want to predict high school grade point average (GPA) from Scholastic Aptitude Test (SAT) score and family income.
> Predicted variable(독립변수): GPA

> Predictor(설명변수): SAT and income

** The value of the predictor variable comes from
“outside” the system being modeled, whereas the value of the predicted variable depends
on the value of the predictor variable.

####The key mathematical difference between predictor and predicted variables is that the likelihood function.
(Likelihood function: expresses the probability of values ofthe predicted variable as a function of values of the predictor variable. The likelihood function does not describe the probabilities of values of the predictor variable.)

In experimental settings, the variables that are actually manipulated and set by the
experimenter are the independent variables. In this context of experimental manipulation, the values of the independent variables truly are (in principle, at least) independent of the values of other variables, because the experimenter has intervened to arbitrarily set the values of the independent variables. But sometimes a non-manipulated variable is also referred to as “independent”, merely as a way to indicate that it is being used as a predictor variable.

Among non-manipulated variables, the roles of predicted and predictor are arbitrary, determined only by the interpretation of the analysis. 

> Consider, for example, people’s weights and heights. We could be interested in predicting a person’s weight from his/her height, or we could be interested in predicting a person’s height from his/her weight.

Prediction is merely a mathematical dependency, not necessarily a description of underlying causal relationship.

Just as “prediction” does not imply causation, “prediction” also does not imply any temporal relation between the variables. 

> For example, we may want to predict a person’s
sex, male or female, from his/her height. Because males tend to be taller than females, this
prediction can be made with better than chance accuracy. But a person’s sex is not caused
by his/her height, nor does a person’s sex occur only after their height is measured. 

Thus,
we can “predict” a person’s sex from his/her height, but this does not mean that the person’s
sex occurred later in time than his/her height.

### In summary: 
#### All manipulated independent variables are predictor variables, not predicted. 
####Some dependent variables can take on the role of predictor variables, if desired. All predicted variables are dependent variables. 



### Why we care.
We care about these distinctions between predicted and predictor variables <B>because the likelihood function is a mathematical description of the dependency of the predicted variable on the predictor variable.</B>

The first thing we have to do in statistical inference is identify what variables we are interested in predicting, on the basis of what
predictors.

### 14.1.2 Scale types: metric, ordinal, nominal

Items can be measured on different scales. For example, the participants in a foot race can
be measured either by the time they took to run the race, or by their placing in the race (1st,
2nd, 3rd, etc.), or by the name of the team they represent. These three measurements are
examples of metric, ordinal, and nominal scales, respectively (Stevens, 1946).
Examples ofmetric scales include response time (i.e., latency or duration), temperature,
height, and weight. Those are actually cases of a specific type of metric scale, called a ratio
scale, because they have a natural zero point on the scale. The zero point on the scale
corresponds to there being a complete absence of the stuff being measured. For example,
when the duration is zero, there has been no time elapsed, and when the weight is zero,
there is no downward force. Because these scales have a natural zero point, it is meaningful
to talk about ratios of amounts being measured, and that is why they are called ratio scales.
For example, it is meaningful to say that taking 2 minutes to solve a problem is twice as
long as taking 1 minute to solve the problem. On the other hand, the scale of historical
time has no known absolute zero. We cannot say, for example, that there is twice as much
time in January 2nd as there is time in January 1st. We can refer to the duration since some
arbitrary reference point, but we cannot talk about the absolute amount of time in any given
moment. Scales that have no natural zero are called interval scales because all we know
about them is the amount of stuff in an interval on the scale, not the amount of stuff at a
point on the scale. Despite the conceptual difference between ratio and interval scales, we
will lump them together into the category of metric scales.
A special case of metric-scaled data is count data, also called frequency data. For example, the number of cars that pass through an intersection during an hour is a count. The
number of poll respondents who say they belong to a particular political party is a count.
Count data can only have values that are non-negative integers. Distances between counts
have meaning, and therefore the data are metric, but because the data cannot be negative and are not continuous, they are treated with different mathematical forms than continuous,
real-valued metric data.
Examples of ordinal scales include placing in a race, or rating of degree of agreement.
When we are told that, in a race, Jane came in first, Jill came in second, and Jasmine came
in third, we only know the order. We do not know whether Jane beat Jill by a nose or by a
mile. There is no distance or metric information in an ordinal scale. As another example,
many polls have ordinal response scales: Indicate how much you agree with this statement:
“Bayesian statistical inference is better than null hypothesis significance testing”, with 5 =
strongly agree, 4 = mildly agree, 3 = neither agree nor disagree, 2 = mildly disagree, and
1 = strongly disagree. Notice that there is no metric information in the response scale,
because we cannot say the difference between ratings of 5 and 4 is the same amount of
difference as between ratings of 4 and 3.
Examples of nominal, a.k.a. categorical, scales include political party affiliation, the
face of a rolled die, and the result of a flipped coin. For nominal scales, there is neither
distance between categories nor order between categories. For example, suppose we measure the political party affiliation of a person. The categories of the scale might be Green,
Democrat, Republican, Libertarian, and Other. While some political theories might infer
that the parties fall on some underlying liberal-conservative scale, there is no such scale
in the actual categorical values themselves. In the actual categorical labels there is neither
distance nor ordering.
In summary, if two items have different nominal values, all we know is that the two
items are different (and what categories they are in). On the other hand, if two items have
different ordinal values, we know that the two items are different and we know which one is
“larger” than the other, but not how much larger. If two items have different metric values,
then we know that they are different, which one is larger, and how much larger.

### Why we care.
We care about the scale type because the likelihood function must specify a probability distribution on the appropriate scale. If the scale has two nominal values,
then a Bernoulli likelihood function may be appropriate. If the scale is metric, then a normal distribution may be appropriate as a likelihood function. Whenever we a choosing a
model for data, we must answer the question, What kind of scale are we dealing with?
In the following sections, we will first consider the case of a metric predicted variable
with metric predictors. In that context of all metric variables, we will develop the concepts of linear functions and interactions. Once those concepts are established for metric
predictors, the notions will be extended to nominal predictors.

### 14.1.3 Linear function of a single metric predictor

Suppose we have identified one variable to be predicted, which we’ll call y, and one variable
to be the predictor, which we’ll call x. Suppose we have determined that both variables are
metric. The next issue we need to address is how to model a relationship between x and y.
There are many possible dependencies of y on x, and the particular form of the dependency
is determined by the specific meanings and nature of the variables. But in general, across all
possible domains, what is the most basic or simplistic dependency of y on x that we might
consider? The usual answer to this question is, a linear relationship. A linear function is the
generic, “vanilla”, off-the-shelf dependency that is used in statistical models. The methods
can be generalized to other models when needed.
Linear functions preserve proportionality. If you double the input, then you double the
output. If cost of a book is a linear function of the number of pages, then when the number of pages is reduced 10%, the cost should be reduced 10%. If automobile speed is a linear
function of gas delivery to the engine, then when you press the pedal 20% further, the car
should go 20% faster. Non-linear functions do not preserve proportionality. For example,
in actuality, car speed is not a linear function of gas delivery. At higher and higher speeds,
it takes proportionally more and more gas to make the car go faster. Despite the fact that
many real-world dependencies are non-linear, most are at least approximately linear over
moderate ranges of the variables. For example, if you have twice the wall area, it takes
approximately twice the amount of paint. It is also the case that linear relationships are
intuitively prominent (Brehmer, 1974; Hoffman, Earle, & Slovic, 1981; Kalish, Griffiths, &
Lewandowsky, 2007). Linear relationships are the easiest to think about: Turn the steering
wheel twice as far, and the car should turn twice as sharp. Turn the volume knob 50%
higher, the loudness should increase 50%.
The general mathematical form for a linear function of a single variable is


y = β0 + β1 x

When values of x and y that satisfy Equation 14.1 are plotted, they form a line. Examples
are shown in Figure 14.1. The value of parameter β0 is called the y-intercept because it is
the where the line intersects the y-axis when x = 0. The left panel of Figure 14.1 shows two
lines with different y-intercepts. The value of parameter β1 is called the slope because it
indicates how much y increases when x increase by 1. The right panel of Figure 14.1 shows
two lines with the same intercept but different slopes.
In strict mathematical terminology, the type of transformation in Equation 14.1 is called
affine. When β0 , 0, the transformation does not preserve proportionality. For example,
consider y = 10 + 2x. When x is doubled from x = 1 to x = 2, y increases from y = 12 to
y = 14, which is not doubling y. Nevertheless, the rate of increase in y is the same for all
values of x: Whenever x increases by 1, y increases by 2. Equation 14.1 can be algebraically
re-arranged so that it does preserve proportionality, as will be shown next.

<figure id="fig.redline0" style="float: none"><img src="1.png"><figcaption> 
</figcaption></figure>

### 14.1.4 Additive combination of metric predictors

> ### 14.1.4.1 Reparameterization to x threshold form

### 14.1.5 Nonadditive interaction of metric predictors

### 14.1.6 Nominal predictors

> ### 14.1.6.1 Linear model for a single nominal predictor

> ### 14.1.6.2 Additive combination of nominal predictors

> ### 14.1.6.3 Nonadditive interaction of nominal predictors

### 14.1.7 Linking combined predictors to the predicted

> ### 14.1.7.1 The sigmoid (a.k.a. logistic) function

> ### 14.1.7.2 The cumulative normal (a.k.a. Phi) function

### 14.1.8 Probabilistic prediction

### 14.1.9 Formal expression of the GLM

### 14.2 Cases of the GLM