Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including number of observations used to build the model #82

Closed
mschubert opened this issue Oct 26, 2015 · 11 comments
Closed

Including number of observations used to build the model #82

mschubert opened this issue Oct 26, 2015 · 11 comments
Milestone

Comments

@mschubert
Copy link

I think it would be useful if some version of the data.frame that represents the result also include a column with the number of observations that were used to build the model.

An easy way to access would be using e.g. nobs() for stats::lm(), but I'm sure other models have similar reporting of this.

It would be even more useful if it could include the actual number of observations for logicalTRUE or factorLEVEL.

@grasshoppermouse
Copy link

I also think it would be useful if glance output included nobs. It looks like glance.stanreg already does:

https://github.com/dgrtwo/broom/blob/master/R/rstanarm_tidiers.R

@hughjonesd
Copy link
Contributor

Just adding my vote to this feature. I think in most fields, it is standard to report the N as one of the summary statistics.

@randomgambit
Copy link

hello there! is there a fix for that very important feature?
thanks!!

@alexpghayes
Copy link
Collaborator

I'd be willing to this in a tryCatch to finish_glance to pick up an n column for relevant models. The question then is how many models actually implement nobs() methods.

I'm hesitant to report counts for each factor level in glance() because this is moving more into data summarizing than properties of a model. If a tidy() method doesn't already inherit this information from summary() or whatnot I don't think it's worth the effort to try implement this consistently across tidiers. Also, skimr::skim() is fantastic for this sort of thing.

@alexpghayes alexpghayes added this to the 0.7.0 milestone Jul 13, 2018
@vincentarelbundock
Copy link
Contributor

I'm building a regression summary package built on broom and users are requesting this feature. It's pretty important to me, and I would be willing to do it if you tell me about your implementation preferences.

To answer your question, a lot of models actually implement the nobs method. I went through every extract method in the texreg package. I may have missed a couple, but the models not listed below should work with nobs:

#default
stats::nobs(model)

#felm
summary(model)$N

#censReg
summary(model)$nobs

#btergm
#mbtergm
model@nobs

#betaor
#betamfx
model$fit$nobs

#averaging
#model.selection (MuMIn)
as.numeric(attr(model, 'nobs'))

#sienaFit
model$n

#zeroinfl
summary(model)$n

#fGARCH
length(model@data) 

# gel
NROW(model$gt) 

# lme4
dim(model.frame(model))[1] 

#lmrob
#systemfit
#lmRob
length(model$residuals) 

#logitmfx
#probitmfx
#negbinirr
#negbinmfx
nrow(model$fit$model) 

#lrm
model$stats[1]

#mlogit
#plm
#pmg
#rq
#summary.lm
nrow(summary(model)$residuals)

#mnlogit
s$model.size$N

#multinom
#sarlm
nrow(summary(model$fitted.values))

#pgmm
attr(summary(model), 'pdim')$nT$N

#simex
length(model$model$residuals)

#survreg
length(model$linear.predictors)

#zelig
nrow(model$data)

#pglm
length(model$gradientObs[, 1])

@vincentarelbundock
Copy link
Contributor

@alexpghayes One possible design that I would be willing to implement:

For each element of the list above which are not compatible with nobs, extract the value explicitly in the model-specific glance function.

Modify the finish_glance function. If 'n' %in% names(ret), then tryCatch(stats::nobs)

This is a bit more work (which I am willing to do), but it's explicit, and would avoid unexpected side effects from having a bunch of ifelse statements.

@gavinsimpson
Copy link
Contributor

Sounds simpler to just implement nobs() methods for these internally, where they don't exist. Ideally these would be offered upstream to the respective package maintainers, but there's nothing stopping these being in broom if they are not wanted or maintainers are unresponsive. Using nobs() makes it simpler/safer to implement return of number of observations in glance.

@vincentarelbundock
Copy link
Contributor

vincentarelbundock commented Jan 24, 2019

Sure. I'm happy to write a bunch of nobs methods if people find those useful. There's also a discussion about how these methods would be used in my WIP PR: #594

Basically,

  1. Each glance function calls its own nobs()
  2. finish_glance calls nobs() for all of them, as it currently does for AIC, BIC, etc.

@vincentarelbundock
Copy link
Contributor

vincentarelbundock commented Jan 25, 2019

I'm in the process of checking every model object for which broom offers a glance function to see if they work with stats::nobs. I'm also writing new methods for those that don't. The results are collected in this Gist:

https://gist.github.com/vincentarelbundock/24bedac98499181790aab230cc5b74bc

@alexpghayes
Copy link
Collaborator

Closed in #597! Thanks @vincentarelbundock!

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants