# Model an Ordinal Logistic Regression in R

This notebook will perform ordinal logistic regression on our sample data.  We will use a new dataset for this analysis, one which is more amenable to our analysis, as it includes ranked ordering.

In addition to the `tidyverse` and `caret` packages, we'll use the `foreign` package to load a Stata dataset and `MASS` to perform the ordinal logistic regression.

In [None]:
library(tidyverse)
library(caret)
library(foreign)
library(MASS)

This dataset is in Stata format.  Stata is a paid product for data analysis, but we don't have it available to us here.

In [None]:
students <- read.dta("../data/ologit.dta")
head(students)

We can use the `table()` function to get a breakdown of each column individually.  `lapply()` lets us do it for each column in a sequence.

In [None]:
lapply(students[, c("apply", "pared", "public")], table)

We can also view this data as a matrix, which makes sense for two relevant features.  We can see that there are relatively few students who had parents with graduate (or higher) degrees, so we can expect some risk from imbalance.

In [None]:
ftable(xtabs(~ public + apply + pared, data=students))

We also have a continuous variable, GPA.  We can see that students range between 1.9 and 4.0 GPA.

In [None]:
summary(students$gpa)

## Training a Model

We can use the `polr()` function to perform an ordinal logistic regression on our dataset.

`Hess` here indicates that we're going to return the Hessian (observed information matrix), which we'll need to do in order to call `summary()` on the model.

In [None]:
model <- polr(apply ~ pared + public + gpa, data=students, Hess=TRUE)

Because we did generate the Hessian, we can generate a summary.

In [None]:
summary(model)

We can also build an odds ratio.

How we can interpret these results (all other things being equal):

1. If at least one parent attended graduate school, a student is 2.85 times more likely to apply for graduate school.
2. Students who attended a public university for their undergraduate studies are 0.94 times as likely as students who attended a private school.
3. For every grade point increase (e.g., 2.0 to 3.0 or 2.9 to 3.9), a student becomes 1.85 times more likely to apply to graduate school.

In [None]:
exp(coef(model))

## Evaluating a Model

Now let's split into training and test datasets and see how the model fares.

In [None]:
set.seed(106842)
rand_students <- students[sample(nrow(students)), ]
trainIndex <- caret::createDataPartition(rand_students$apply, p=0.7, list=FALSE, times=1)
train_data <- rand_students[trainIndex,]
test_data <- rand_students[-trainIndex,]

In [None]:
model <- polr(apply ~ pared + public + gpa, data=train_data, Hess=TRUE)

In [None]:
model_pred <- predict(model, test_data)

In [None]:
outcomes <- cbind(as.data.frame(model_pred), test_data)
head(outcomes, 15)

In [None]:
caret::confusionMatrix(as.factor(outcomes$model_pred), as.factor(outcomes$apply))