/
class 8_1 filled in.Rmd
124 lines (86 loc) · 3.33 KB
/
class 8_1 filled in.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Logistic Regression"
output: html_document
---
```{r setup, include=FALSE}
filmData = read.table("filmData.txt",header=T)
install.packages("arm")
library(arm)
library(statisticalModeling)
library(mosaicModel)
library(tidyverse)
library(ggplot2)
library(dplyr)
require(broom)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
# Do scatterplot matrix of data (see lab 6-1 if need help)
# Next fit a logistic regression model on the BoxOffice variable
# This asks whether or not the chances of winning an Oscar are
# related to the the amount of money the film makes.
boxOfficeModel <- glm ( Oscar ~ BoxOffice, data=filmData, family=binomial(link="logit"))
# use mod_plot on the model
# look how it nicely does all the logit converting for us
# and plots the probability
mod_plot(boxOfficeModel, data=filmData)
```
```{r}
# Now fit the full model, i.e., predicting a film's Oscar
# winning chances based on all five of the available
# predictor variables.
fullModel = glm( Oscar ~ BoxOffice + Budget + Country + Critics + Length, data = filmData, family=binomial(link="logit"))
summary(fullModel)
exp(fullModel$coefficients)
mod_plot(fullModel)
# use mod_cv to compare the models
mod_cv(fullModel, boxOfficeModel)
# Here's another dataset with a logistic model
mod2 <- glm(married ~ age + sex * sector,
data = mosaicData::CPS85, family=binomial(link="logit"))
# plot the model and interpret it, what do the visualizations
# say about the relationships
```
```{r}
# How do we go from a prediction of the probability, to
# actually making a prediction of oscar or not?
# To assess the success of our model, let's look at how well
# it predicts Oscar success. First we pull the predicted logit
# scores out of our favoured model.
oscarLogits = predict(fullModel)
# Now we can use R's handy ifelse command to set a new variable
# "oscarPredictions" to 1 if the logit score is greater than 0,
# and to 0 if the logit score is lower than 0. Recall that a
# logit score of 0 corresponds to a probability of 0.5.
oscarPredictions = ifelse(oscarLogits > 0,1,0)
# Making a table of actual v. predicted Oscar success shows us
# that our model gets 215 films right and 85 films wrong.
# Better than guessing though!
table(oscarPredictions,Oscar)
# Try different cutoffs for the logit score, search over them
# to find one with better performance
```
# Residuals for logistic regression
```{r}
# add logistic regression predictions to data
fit <- fullModel
data_fitted <- augment(fit, filmData)
# make residuals plot
binnedplot(data_fitted$.fitted ,data_fitted$.resid,
xlab="Expected Values", ylab="Average residual",
main="Binned residual plot",
cex.pts=0.8, col.pts=1, col.int="gray")
mod_plot(fit)
```
# What do residuals look like when model assumptions are met?
```{r}
#generate data using logistic regression model
# read off the coefficient values from the oscars model
# make grid of variable values
# generate data
# plot data
# generate data with residual
# plot data
```