# Tuesday: Linear Regression 

**Assumptions**

* Linear relations ship between X and Y
* Independence of the error terms i.e. the error of one observation should not predict the error of another.
* Homoscedasticity: variance of the error terms should be constant across the predictors
* Normality of errors: the errors should follow a normal distribution.
* No (multi)colinearity: predictors should not be correlated
* No outliers or high-leverage points
* No autocorrelation: correlation between residuals of regression model at different time points. 
  


![image.png](attachment:d5c90931-cad1-4e92-a0b9-fde3ed6930f1.png)

Simple linear regression estimates $\beta$1,2 corresponding to intercept and slope of the regression line, respectively.
These parameters are optimized to minimize the residual sum of squares RSS and ensure the best fit to the data.

![image.png](attachment:769895e8-f960-47fc-b168-d80c2989318d.png)

***TYPICAL TEST PROBLEM: Calculate $\beta$1,2***

![image.png](attachment:29532da9-9860-4342-9f22-37f7c123717d.png)

The residual standard error is used to asses the quality of a linear regression and is an estimation of the variance of the error terms. 
It can be described as a measure of the **lack of fit**. 

&rarr; The problem with the RSE is that there is no upperbound and thus it is hard to say wether it is acceptable or not. 

Therefor the $R^2$ is used seen below:

![image.png](attachment:8913bd71-a353-4dba-bcc6-3b2cda948e1d.png)

The $R^2$ takes a value between 0 and 1 is used to say how much variance your model explains, and thus is much more interpretable.

in multiple linear regression a hyperplane is fitted in n-dimensional space and the parameters $\beta$j where j takes on any value between 1 and n. The effects (value of $\beta$) of the predictors are estimated given all other predictors remain fixed.

&rarr; with the inclusion of multiple predictors the problem of collinearity occurs. This makes it harder to determine the effects of the variables accuretly as they are not independent. Inference or interpretation becomes more dificult. 

**key questions**:

1. Is at least one of the predictors X1, X2,...,Xp useful in predicting
the response?
2. Do all the predictors help to explain Y , or is only a subset of the
predictors useful?
3. How well does the model ft the data?
4. Given a set of predictor values, what response value should we predict,
and how accurate is our prediction?

The F statistic is used to do hypothesis test on the effects of the predictors, here n is the sample size and p is the total number of predictors. 

![image.png](attachment:423d0a98-709c-483c-b4ab-31b3d1edaae3.png)

Predictor selection can be divided into three common approaches, described below. 

![image.png](attachment:c64c54a8-4eef-42ef-a92b-513e7296caaf.png)

Tricks:

&rarr; Qualitative prediction can be used by using dummy variable where the model is provided a continous predictor and an categorical resopnse. 

&rarr; we can build a model with interaction where the combined effect of two predictors is estimated. if there is significant interaction the individual effect of the predictor cannot be accurately estimated.

&rarr; we can transform a predictor by squaring of logging and build use it as input for the linear model i.e. the model is linear within the quadratic predictor space. 

Top plot are uncorrelated residuals we negative and positive while bottom plot has correlated values we see groups of points positive and than negative. 

![image.png](attachment:271f1366-e1ae-4e38-a833-82399aa04da6.png)

non constant error variance

![image.png](attachment:53f4b740-012c-40e1-857a-51d267e964d4.png)

Outliers a.k.a high leverage points have a big influenced on the model and thus need to be detected and removed. 

![image.png](attachment:a281f456-95c9-4f2b-8d57-f89ddeaa4948.png)

When trying to identify outliers, one problem that can arise is when there is a potential outlier that influences the regression model to such an extent that the estimated regression function is "pulled" towards the potential outlier, so that it isn't flagged as an outlier using the standardized residual criterion. To address this issue, studentized residuals offer an alternative criterion for identifying outliers. The basic idea is to delete the observations one at a time, each time refitting the regression model on the remaining n–1 observations. Then, we compare the observed response values to their fitted values based on the models with the ith observation deleted. This produces deleted residuals. Standardizing the deleted residuals produces studentized residuals.

An observation could be unusual with respect to its y-value or x-value. However, rather than calling them x- or y-unusual observations, they are categorized as outlier, leverage, and influential points according to their impact on the regression model. High leverage points can be identified using the leverage formula while outliers can be detected using the cooks distance formula. 

To detect collinearity one can use the variance inflation factor VIF. 


# Questions 

1. Describe the null hypotheses to which the p-values given in Table 3.4
correspond. Explain what conclusions you can draw based on these
p-values. Your explanation should be phrased in terms of sales, TV,
radio, and newspaper, rather than in terms of the coefcients of the
linear model.

2. Carefully explain the diferences between the KNN classifer and KNN
regression methods.

3. Suppose we have a data set with fve predictors, X1 = GPA, X2 =
IQ, X3 = Level (1 for College and 0 for High School), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and
Level. The response is starting salary after graduation (in thousands
of dollars). Suppose we use least squares to ft the model, and get
βˆ0 = 50, βˆ1 = 20, βˆ2 = 0.07, βˆ3 = 35, βˆ4 = 0.01, βˆ5 = −10.

(a) Which answer is correct, and why?

i. For a fxed value of IQ and GPA, high school graduates earn
more, on average, than college graduates. **FALSE**

ii. For a fxed value of IQ and GPA, college graduates earn
more, on average, than high school graduates **FALSE**

iii. For a fxed value of IQ and GPA, high school graduates earn
more, on average, than college graduates provided that the
GPA is high enough. **TRUE**

iv. For a fxed value of IQ and GPA, college graduates earn
more, on average, than high school graduates provided that
the GPA is high enough **FALSE**

(b) Predict the salary of a college graduate with IQ of 110 and a
GPA of 4.0.

In [7]:
def salary(gpa, iq, college=0):
    salary = 50 + 20 * gpa + 0.07 * iq + 35 * college + 0.01 * gpa * iq + -10 * gpa * college
    return salary


In [8]:
salary(4.0, 110, 1)

137.1

**(c) True or false: Since the coefcient for the GPA/IQ interaction
term is very small, there is very little evidence of an interaction
efect. Justify your answer.**

No this cannot be said based on the effect size this can only be said based on the significance of the F-statistic. 