Performance for logistic regression #38

DominiqueMakowski · 2019-04-16T11:03:32Z

Related to #1

they say (on wikipedia 😀):

"The Hosmer-Lemeshow test is for overall calibration error, not for any particular lack of fit such as quadratic effects. It does not properly take overfitting into account, is arbitrary to choice of bins and method of computing quantiles, and often has power that is too low."

"For these reasons the Hosmer-Lemeshow test is no longer recommended. Hosmer et al have a better one d.f. omnibus test of fit, implemented in the R rms package residuals.lrm function."

Other alternatives have been developed to address the limitations of the Hosmer-Lemeshow test. These include the Osius-Rojek test and the Stukel test, available in the R script AllGOFTests.R: www.chrisbilder.com/categorical/Chapter5/AllGOFTests.R

I don't really have any opinion, never used any of those...

strengejacke · 2019-04-16T11:29:16Z

Ok, maybe we can implement some of those tests, though I hesitate including too many tests particularly for logistic regression, because it can become "biased" towards these regression type.

But maybe that's no real concern. If these tests are rarely implemented in R, I think it's good to have them in performance.

DominiqueMakowski · 2019-04-16T12:58:22Z

I agree, this could be an added value to the package.

And I think it works quite nicely with the scope of the package as a general toolbox offering to users an easy way to discover, try, select and use the tools they want ☺️

strengejacke · 2019-04-16T19:33:45Z

Should we find a common prefix for the goodness of fit functions? Like check_*() etc.?

DominiqueMakowski · 2019-04-17T02:02:10Z

Do you consider r2 and all of the other indices as goodness of fit metrics? Initially, we thought about prefixing all "performance" (in the very large sense, including goodness of fit) by performance_*, but it was a bit verbose.

However, if the number of indices that we provide increases, we could indeed re-prefix the functions, and provide "shortcuts" for the most popular ones (like r2)

strengejacke · 2019-04-17T05:49:04Z

No, I was not thinking of R2, but rather hosmer-lemeshow, or chisq-gof, maybe error-rate. Maybe those gof-tests that do some significance testing. So these might get a prefix like gof_ or so.

DominiqueMakowski · 2019-04-17T09:37:51Z

oh right (although r2 for lm also has some significance testing amd is considered as a goodness of fit index 😁)

tbh I don't find gof_ very intuitive (looks like "game of frones" but I am too biased haha).

I am not sure of the alternatives tho, fitquality_, goodfit_...

strengejacke · 2019-04-17T09:51:34Z

"game of frones"

:-D

but I am too biased

Indeed, I haven't even seen one episode yet, though I'm at least a bit addicted to fantasy.

DominiqueMakowski · 2019-04-17T09:58:14Z

😮

DominiqueMakowski · 2019-04-17T10:00:03Z

^^ But to come back to the main matter, how do you separate indices like r2 from the other goodness of fit indices that would be prefixed? because they have some level of "testing" (with p-value and all)?

strengejacke · 2019-04-17T10:01:42Z

because they have some level of "testing" (with p-value and all)?

Yes, it's not an "index" (like rmse or r2), but a significance-test (except error rate, actually) - at least this could be a rough discrimination.

DominiqueMakowski · 2019-04-17T10:19:08Z

right I see your point!

Then maybe something like test_goodfit_ (I know you don't like double prefixes but we could imagine different kinds of test_* in the future). Or test_performance (which would fit into the model_ and compare_ performance that we have)

pdwaggoner · 2019-04-17T12:32:15Z

Hi - sorry to be late to the party here, but I have used logistic regressions quite a bit, and I have a standard battery of goodness of fit tests I run for these models. For example:

heatmapFit: we can plot heat map fit plot to compare goodness of fit. When reading a heat map plot, we are comparing model predictions with smoothed empirical predictions. The 45-degree line references a perfect fit. Any deviance from that line suggests a loss in goodness of fit. The "p-value" legend shows if any deviance is statistically significant (dark color means statistical significance).
plot.roc from pROC: ROC curves, which plot correct predictions (sensitivity, true positive rate) against false predictions (specificity, false positive rate). A model with excellent fit will produce a curve that is very close to the left and top bound, where the area under the curve is large (near 1). A model with poor fit will produce a curve close to the 45-degree diagonal line (whereby it is equally likely for us to see correct and false predictions), such that a model has a very poor fit has area under the curve that is very small (near 0).

And then of course to compare between models (i.e., model selection):

Likelihood ratio tests
Extracting/comparing AIC (e.g., using extractAIC)
ePCP from OOmisc: comparing expected proportion correctly predicted between models. I have some great code for clean histograms that we could standardize for this one.

Let me know your thoughts here or if I am way off base.

DominiqueMakowski · 2019-04-17T14:24:48Z

Or test_performance (which would fit into the model_ and compare_ performance that we have)

Just clarifying myself: we could have the individual tests available directly by their name, and all of them "wrapped" together within a test_performance() function that would perform all available tests for the given class.

@pdwaggoner What does ePCP do? Although I think the current aim is to see how we can implement (i.e., name and stuff) a few tests hardly available elsewhere. It is not yet building the whole performance check pipeline :)

pdwaggoner · 2019-04-17T14:57:18Z

That makes sense. ePCP calculates the expected proportion of correct prediction and the related (1-α)% (approximate) confidence intervals for a given set of predicted success probabilities and the observed (binary) values (from the documentation). E.g.:

library(OOmisc)

fit <- runif(100)
y <- rbinom(100,1,0.5)

ePCP(fit,y,alpha=0.05)

strengejacke · 2019-04-17T15:42:16Z

ePCP() looks similar to sjstats::pred_accuracy(), though the difference is that sjstats::pred_accuracy() actually uses cross-validation to test the usefulness of the model regarding predictions.

pdwaggoner · 2019-04-17T15:46:53Z

Oh brilliant- cross-validation is much more useful. sjstats::pred_accuracy() would likely be a better route here. I think a general fit/validation method that would be a core, useful thing would be ROC curves. They're used frequently and are super intuitive. Could be valuable to at least consider for future iterations.

DominiqueMakowski · 2019-04-17T15:56:21Z

@pdwaggoner you've just been sjstats'ed 😅😅 (the phenomenon of coming up with something and realising sjstats already does it)

pdwaggoner · 2019-04-17T16:58:42Z

Ha! It feels... great (?) ;)

strengejacke · 2019-04-26T11:47:11Z

I have drafted a function predictive_accuracy(), which basically works like sjstats::pred_accuracy(), except that it does not have any other package dependencies (https://github.com/easystats/performance/blob/master/R/predictive_accuracy.R).

@pdwaggoner @DominiqueMakowski does anybody of you have worked with the boot package before? I haven't, and I'm not sure how to use it here.

In sjstats, I have used an own bootstrapping-method from the sjstats::bootstrap() function. How can this code be re-written using the boot package and w/o other packages (see link above)?

      cv <- data %>%
        bootstrap(n) %>%
        dplyr::mutate(
          models = purrr::map(.data$strap, ~ stats::lm(formula, data = .x)),
          predictions = purrr::map(.data$models, ~ stats::predict(.x, type = "response")),
          response = purrr::map(.data$models, ~ resp_val(.x)),
          accuracy = purrr::map2_dbl(.data$predictions, .data$response, ~ stats::cor(.x, .y, use = "pairwise.complete.obs"))
        )

Feel free to open a PR or commit to master...

DominiqueMakowski · 2019-04-26T12:24:13Z

I do also find the native boot quite unintuitive. Maybe it would be worth it reimplementing a bootsrap method, as it would also be used in parameters (see current implementation here).

strengejacke · 2019-04-27T20:25:28Z

heatmapFit: we can plot heat map fit plot to compare goodness of fit.

ePCP from OOmisc: comparing expected proportion correctly predicted between models. I have some great code for clean histograms that we could standardize for this one.

@pdwaggoner Do you have examples for these two? Although you mentioned that cross-validation might be the better approach compared to ePCP, we still could think about visualization, so I'm interested in the plots you mentioned here.

pdwaggoner · 2019-04-28T18:36:48Z

Hi @strengejacke - apologies for my delay here. Yeah, I have some MWE for each. First, for the ePCP (expected proportion correctly predicted), we are looking for a higher value, bounded between 0 and 1, where 1 = a model that perfectly predicted all actual observations. Here is some sample code using mtcars:

library(OOmisc) # for ePCP
library(ggplot2)

logitmod <- glm(vs ~ mpg, data = mtcars); summary(logitmod)
logitmod2 <- glm(vs ~ mpg + cyl, data = mtcars); summary(logitmod2)

y <- mtcars$vs
pred1 <- predict(logitmod, type="response")
pred2 <- predict(logitmod2, type="response")

# ePCP, expected proportion of correct prediction
epcp1 <- ePCP(pred1, y, alpha = 0.05) # define observed values and obtain predicted values
epcp2 <- ePCP(pred2, y, alpha = 0.05)

epcpdata <- data.frame(rbind(epcp1, epcp2))
epcpdata$model <- c(1,2)
epcpdata$count <- factor(c(1,2), label = c("Model1","Model2"))

# Now the plot
ggplot(epcpdata, aes(x=model,y=ePCP,colour=count)) +
  geom_bar(position=position_dodge(), stat="identity", fill = "darkgray") +
  geom_errorbar(aes(ymin=lower, ymax=upper),
                width=.2,
                position=position_dodge(.9)) +
  labs(x="Model Specification",
       y="Expected Proportion of Correct Prediction",
       title = "Comparing ePCP between Model 1 and Model 2",
       colour="Model Specification") +
  theme_bw()

And next, for the heatmapFit, which is a little less flexible, the goal is to compare model predictions with smoothed empirical predictions. The 45-degree line references a perfect fit. Any deviance from that line suggests a loss in goodness of fit. The “p-value” legend shows if any deviance is statistically significant (dark color means statistical significance). Using the same models as an example from mtcars, it would look something like:

## heatmapfit
library(heatmapFit) # for heat maps

# quick rescale to bound predictions between 0 and 1
range <- function(x){
  (x-min(x))/(max(x)-min(x))
  }

pred1 <- range(pred1)
pred2 <- range(pred2)

heatmap.fit(y, pred1, reps = 1000, legend = TRUE)
heatmap.fit(y, pred2, reps = 1000, legend = TRUE)

** You can copy and paste this code to see the figures. Pretty clean, efficient renderings of model fit.

** And of course, ROC curves are always great, simple model fit checks. Let me know thoughts if you have them. Hope this helps a bit!

strengejacke · 2019-04-28T20:02:12Z

The epcp-function looks nice, and has the beauty of clear interpretation. I implemented a function that calculates the epcp (05243ee), currently named correct_predictions(). Is epcp() better, or another name? What would you say?

strengejacke · 2019-04-28T20:04:59Z

library(performance)

m1 <- glm(vs ~ mpg, data = mtcars, family = binomial)
m2 <- glm(vs ~ mpg + cyl, data = mtcars, family = binomial)
m3 <- glm(vs ~ mpg + cyl * hp, data = mtcars, family = binomial)

compare_performance(m1, m2, m3)
#>   name class      AIC      BIC   R2_Tjur      RMSE   LOGLOSS      EPCP
#> 1   m1   glm 23.49109 27.88830 0.6666308 0.7393217 0.2732983 0.8359198
#> 2   m2   glm 24.84308 32.17176 0.7063532 0.6810625 0.2319231 0.8554707
#> 3   m3   glm 29.53334 32.46481 0.4738108 0.8932618 0.3989584 0.7410163

^{Created on 2019-04-28 by the reprex package (v0.2.1)}

strengejacke · 2019-04-28T20:05:49Z

btw, the referred paper (https://www.cambridge.org/core/services/aop-cambridge-core/content/view/92B052AADD9756C8BCC00527749E029D/S1047198700008524a.pdf/postestimation_uncertainty_in_limited_dependent_variable_models.pdf) also shows a way to implement this for multinomial models, we should keep this in mind... I did not fully understand yet how to do it, but it doesn't look that complicated.

DominiqueMakowski · 2019-04-29T00:10:07Z

As you might expect, I prefer correct_predictions than epcp ☺️

I wonder if this could fit into a larger check_predictions (or check_predicted) function, that would check predicted responses against response. I reckon it could also work for a continuous variable by returning a correlation or other very easy index.

pdwaggoner · 2019-04-29T12:44:59Z

I only prefer epcp because that's how I learned it, and that's how its derived (see the original measurement paper (specifically starting at the bottom of p. 88): https://www.jstor.org/stable/pdf/25791597.pdf). However, in line with the easystats mission, I think correct_predictions is more consistent and would be more intuitive for the average user. So I am perfectly happy to go with correct_predictions. And it looks really good! I will go take a closer look after sending this. But glad you like the metric - as you noted, its clean and crisp and has the benefit of being quite straightforward to interpret. Great work!

strengejacke · 2019-05-24T19:28:19Z

I think this can be closed. I have opened separate issues for the two remaining features.

This was referenced May 24, 2019

heatmap-plot easystats/see#25

Open

performance_pcp for multinomial models #53

Open

strengejacke closed this as completed May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance for logistic regression #38

Performance for logistic regression #38

DominiqueMakowski commented Apr 16, 2019

strengejacke commented Apr 16, 2019 •

edited

DominiqueMakowski commented Apr 16, 2019

strengejacke commented Apr 16, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019 •

edited

DominiqueMakowski commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019 •

edited

DominiqueMakowski commented Apr 17, 2019 •

edited

pdwaggoner commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

strengejacke commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

strengejacke commented Apr 26, 2019

DominiqueMakowski commented Apr 26, 2019

strengejacke commented Apr 27, 2019

pdwaggoner commented Apr 28, 2019

strengejacke commented Apr 28, 2019 •

edited

strengejacke commented Apr 28, 2019

strengejacke commented Apr 28, 2019

DominiqueMakowski commented Apr 29, 2019

pdwaggoner commented Apr 29, 2019

strengejacke commented May 24, 2019

Performance for logistic regression #38

Performance for logistic regression #38

Comments

DominiqueMakowski commented Apr 16, 2019

strengejacke commented Apr 16, 2019 • edited

DominiqueMakowski commented Apr 16, 2019

strengejacke commented Apr 16, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019 • edited

DominiqueMakowski commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

strengejacke commented Apr 17, 2019 • edited

DominiqueMakowski commented Apr 17, 2019 • edited

pdwaggoner commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

strengejacke commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

DominiqueMakowski commented Apr 17, 2019

pdwaggoner commented Apr 17, 2019

strengejacke commented Apr 26, 2019

DominiqueMakowski commented Apr 26, 2019

strengejacke commented Apr 27, 2019

pdwaggoner commented Apr 28, 2019

strengejacke commented Apr 28, 2019 • edited

strengejacke commented Apr 28, 2019

strengejacke commented Apr 28, 2019

DominiqueMakowski commented Apr 29, 2019

pdwaggoner commented Apr 29, 2019

strengejacke commented May 24, 2019

strengejacke commented Apr 16, 2019 •

edited

strengejacke commented Apr 17, 2019 •

edited

strengejacke commented Apr 17, 2019 •

edited

DominiqueMakowski commented Apr 17, 2019 •

edited

strengejacke commented Apr 28, 2019 •

edited