# Variance Inflation Factors (VIF)

**Goal**

These are measures of how much the variances of the estimated regression coefficients are inflated as compared to when the predictors are not linearly related.

**Main Idea**

Measure if a predictor $X_j$ can be explained by other predictors $X_1, \dots , X_{j-1}, X_{j+1}, \dots , X_{p-1}$

**Definition**  

The variance inflation factor (VIF) is typically used to measure collinearity in a multiple regression model. That is, when
independent predictors are highly correlated. Thus, VIF defined as:

$\text{VIF}_j = \frac{1}{1 - R_j^2}$

where

$R^2_j = R^2_{X_j | X_1, \dots , X_{j-1}, X_{j+1}, \dots , X_{p-1}}$

# Set up environment

In [36]:
library(tidyverse)
library(car)

# Data import

In [17]:
start_posns  <- c(1, 6, 11, 16)
end_posns    <- c(4, 9, 14, 19)
#col_names    <- c("skinfold thickness", "Thigh cir", "Midarm cir", "body fat")
col_names    <- c("skinfold", "Thigh", "Midarm", "body_fat")
fwf_col_info <- fwf_positions(start_posns, end_posns, col_names)

In [19]:
dat_bodyfat <- read_fwf("
X1   X2   X3   Y
19.5 43.1 29.1 11.9
24.7 49.8 28.2 22.8
30.7 51.9 37.0 18.7
29.8 54.3 31.1 20.1
19.1 42.2 30.9 12.9
25.6 53.9 23.7 21.7
31.4 58.5 27.6 27.1
27.9 52.1 30.6 25.4
22.1 49.9 23.2 21.3
25.5 53.5 24.8 19.3
31.1 56.6 30.0 25.4
30.4 56.7 28.3 27.2
18.7 46.5 23.0 11.7
19.7 44.2 28.6 17.8
14.6 42.7 21.3 12.8
29.5 54.4 30.1 23.9
27.7 55.3 25.7 22.6
30.2 58.6 24.6 25.4
22.7 48.2 27.1 14.8
25.2 51.0 27.5 21.1",
    fwf_col_info, 
    skip = 2)
dat

skinfold,Thigh,Midarm,body_fat
19.5,43.1,29.1,11.9
24.7,49.8,28.2,22.8
30.7,51.9,37.0,18.7
29.8,54.3,31.1,20.1
19.1,42.2,30.9,12.9
25.6,53.9,23.7,21.7
31.4,58.5,27.6,27.1
27.9,52.1,30.6,25.4
22.1,49.9,23.2,21.3
25.5,53.5,24.8,19.3


In [20]:
fit <- lm(body_fat ~ skinfold + Thigh + Midarm, data = dat_bodyfat)

# Calculate VIF from package

The VIF can be calculated by the function [vif](https://www.rdocumentation.org/packages/car/versions/3.0-0/topics/vif) from the R package [car](https://www.rdocumentation.org/packages/car/versions/3.0-0)

In [37]:
vif(fit)

# Calculate from definition

In [40]:
fit_skin  <- lm(skinfold ~            Thigh + Midarm, data = dat_bodyfat)
fit_thigh <- lm(Thigh    ~ skinfold +         Midarm, data = dat_bodyfat)
fit_mid   <- lm(Midarm   ~ skinfold + Thigh         , data = dat_bodyfat)
############################
R2_skin  <- summary(fit_skin)$r.squared
R2_thigh <- summary(fit_thigh)$r.squared
R2_mid   <- summary(fit_mid)$r.squared
############################
vif_skin  <- 1 / (1 - R2_skin)
vif_thigh <- 1 / (1 - R2_thigh)
vif_mid   <- 1 / (1 - R2_mid)

In [41]:
print(vif_skin)
print(vif_thigh)
print(vif_mid)

[1] 708.8429
[1] 564.3434
[1] 104.606


# Notes

In [2]:
start_posns  <- c(1, 2, 4,  8)
end_posns    <- c(1, 3, 7, 15)
col_names    <- factor(letters[1:4])
fwf_col_info <- fwf_positions(start_posns, end_posns, col_names)

In [3]:
glimpse(fwf_col_info)

Observations: 4
Variables: 3
$ begin     <dbl> 0, 1, 3, 7
$ end       <dbl> 1, 3, 7, 15
$ col_names <fct> a, b, c, d
