Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readability improvement #482

Merged
merged 23 commits into from
Jul 13, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,49 @@ julia> round.(predict(ols), digits=5)
1.83333
4.33333
6.83333

julia> round.(confint(ols); digits=5)
2×2 Matrix{Float64}:
-8.59038 7.25704
-1.16797 6.16797

julia> round(r2(ols); digits=5)
0.98684

julia> round(adjr2(ols); digits=5)
0.97368

julia> round(deviance(ols); digits=5)
0.16667

julia> dof(ols)
3

julia> dof_residual(ols)
1.0

julia> round(aic(ols); digits=5)
5.84252

julia> round(aicc(ols); digits=5)
-18.15748

julia> round(bic(ols); digits=5)
3.13835

julia> round(dispersion(ols.model); digits=5)
0.40825

julia> round(loglikelihood(ols); digits=5)
0.07874

julia> round(nullloglikelihood(ols); digits=5)
-6.41736

julia> round.(vcov(ols); digits=5)
2×2 Matrix{Float64}:
0.38889 -0.16667
-0.16667 0.08333
```

## Probit regression
Expand Down
66 changes: 60 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ functions are
Binomial (LogitLink)
Gamma (InverseLink)
InverseGaussian (InverseSquareLink)
NegativeBinomial (LogLink)
NegativeBinomial (NegativeBinomialLink, often used with LogLink)
Normal (IdentityLink)
Poisson (LogLink)

Expand Down Expand Up @@ -147,20 +147,74 @@ F-test: 2 models fitted on 50 observations
## Methods applied to fitted models

Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org).
- `coef`: extract the estimates of the coefficients in the model
- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`)
- `aic`: Akaike's Information Criterion
- `aicc`: corrected Akaike's Information Criterion for small sample sizes
- `bic`: Bayesian Information Criterion
- `coef`: estimates of the coefficients in the model
- `confint`: confidence intervals for coefficients
- `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights.
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `deviance`: measure of the model fit, weighted residual sum of squares for lm's
- `dispersion`: dispersion (or scale) parameter for a model's distribution
- `dof`: the number of degrees of freedom consumed in the model
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `dof_residual`: degrees of freedom for residuals, when meaningful
- `fitted`: the fitted values of the model
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`)
- `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`)
- `r2`: R² of a linear model or pseudo-R² of a generalized linear model
- `loglikelihood`: log-likelihood of the model
- `modelmatrix`: design matrix
- `nobs`: number of rows, or sum of the weights when prior weights are specified
- `nulldeviance`: deviance of the linear model which includes the intercept only
- `nullloglikelihood`: log-likelihood of the linear model which includes the intercept only
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `predict`: obtain predicted values of the dependent variable from the fitted model
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `r2`: R² of a linear model (an alias for `r²`)
- `residuals`: vector of residuals from the fitted model
- `response`: model response (a.k.a the dependent variable)
- `stderror`: standard errors of the coefficients
- `vcov`: estimated variance-covariance matrix of the coefficient estimates
- `predict` : obtain predicted values of the dependent variable from the fitted model
- `residuals`: get the vector of residuals from the fitted model
- `vcov`: variance-covariance matrix of the coefficient estimates


Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but
in practice one typically uses `LogLink`.

```jldoctest methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a long example to show on the home page. Also given how simple most of these functions are I'm not sure it's super useful to show all of them. How about adding some of these to the "Linear Model" example section instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except coef, r2, aic and prediction, others are moved to the Linear Regression example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather move everything there to keep the home page simple. Actually we should probably also rework existing examples as it's not super logical to illustrate passing contrasts before even showing how to fit a model... We could move contents to other pages and improve them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we start a different PR for something like Reorganise GLM documentation or continue updating this PR only?
My thought of keeping r2, aic and prediction along with fitting a model at the beginning is, these functionality most of the linear models consumers are looking for.
Looking for your thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's reasonable to go ahead adding the example here and reorganize documentation separately.

julia> using GLM, DataFrames;

julia> data = DataFrame(X=[1,2,3], y=[2,4,7]);

julia> test_data = DataFrame(X=[4]);

julia> mdl = lm(@formula(y ~ X), data);

julia> round.(coef(mdl); digits=8)
2-element Vector{Float64}:
-0.66666667
2.5

julia> round(r2(mdl); digits=8)
0.98684211

julia> round(aic(mdl); digits=8)
5.84251593
```
`predict` method returns predicted values of response variable from covariate values `newX`.
If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt.

```jldoctest methods
julia> round.(predict(mdl, test_data); digits=8)
1-element Vector{Float64}:
9.33333333
```
`cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights.

```jldoctest methods
julia> round.(cooksdistance(mdl); digits=8)
3-element Vector{Float64}:
2.5
0.25
2.5
```

## Separation of response object and predictor object

The general approach in this code is to separate functionality related
Expand Down
5 changes: 3 additions & 2 deletions src/GLM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ module GLM
import Statistics: cor
import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue
fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue,
aic, aicc, bic
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
import StatsFuns: xlogy
import SpecialFunctions: erfc, erfcinv, digamma, trigamma
import StatsModels: hasintercept
export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr²,
cooksdistance, hasintercept
cooksdistance, hasintercept, aic, aicc, bic, dispersion
mousum-github marked this conversation as resolved.
Show resolved Hide resolved

export
# types
Expand Down