-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store additional model fit statistics #114
Comments
A variant of this that's probably nicer is to generate the dictionary inside On the UrbanSim side, the template would save the entire dict into its yaml. The advantage here is that a user could then recreate the The interface would look something like this: m # choicemodels.MultinomialLogit
results = m.fit() # choicemodels.MultinomialLogitResults
print(results) # summary table
results.raw_results # NEW, this would provide a tidy dict
choicemodels.MultinomialLogitResults(model_expression, results) # rebuild from dict (Historical note: The current "raw results" are already a dictionary, but with a confusing key structure that we don't want to retain. It's generated by the core estimation code, and we can't change it directly because other code relies on it. So i think we'd implement the new dict as a translation layer inside |
Per potential questions:
|
Regarding the |
This is a proposal to update the statistical templates to store additional model fit statistics as individually accessible values, rather than just embedded in the summary table. This would make downstream analysis easier.
How it works now
Using Large MNL as an example, these are the current class attributes related to model fit:
fitted_parameters
: list, saved to yamlsummary_table
: str, saved to yamlmodel
:choicemodels.MultinomialLogitResults
, not persisted beyond the sessionProposed changes
I'm thinking we should add a class attribute called
fit_statistics
and use it to store a flexible dictionary of values -- standard errors, t-stats, p-values, and model-level statistics like r^2 or log-likelihood.For large MNL, we can include everything we have access to from ChoiceModels: mnl.py#L344-L360. We'd build the dict at the end of the
fit()
method (large_multinomial_logit.py#L490), and include it in the constructor and dict i/o methods alongsidesummary_table
andfitted_parameters
.It looks like we can do this without any changes to
choicemodels
, because these statistics are part of the results object that's available immediately after fitting the model.Large MNL is the first priority here -- the other templates could probably wait until we refactor them for issue #54.
Potential questions
Should we store these as top-level attributes of the template class, rather than inside a dictionary? That would be more Pythonic, and would make it more obvious if info was missing or formats changed. But it would make the class definitions a lot longer, and it would vary between model types -- especially once we expand into multi-stage models and machine learning approaches. So i think better to just have a dictionary whose name is the same across models but whose content varies.
Should we keep
fitted_parameters
as its own attribute, or include it in the dict? I'd lean toward keeping it separate. It's the most important piece of info because it feeds directly into the simulations. And leaving it where it is means less downstream code to change.Should we stop saving summary tables and generate them on the fly instead? This would make the yaml files tidier. But i'm thinking we should keep them for now -- they come from multiple sources (ChoiceModels, Statsmodels, PyLogit) and we don't have a standard procedure for generating them yet. It would be nice to store them in a tidier way, though, per issue #48.
Tagging @alephcero @cvanoli for feedback
The text was updated successfully, but these errors were encountered: