Skip to content

Commit

Permalink
Attempt to improve the documentation
Browse files Browse the repository at this point in the history
Also known as spending the day figuring out how to let Documenter insert
the docstrings from StatsAPI in some cases and use package-specific
methods in others.
  • Loading branch information
ararslan committed Jul 9, 2023
1 parent e816a74 commit 6b556db
Show file tree
Hide file tree
Showing 4 changed files with 265 additions and 51 deletions.
7 changes: 3 additions & 4 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ using Documenter

using Documenter: HTML

makedocs(; modules=[BetaRegression],
sitename="BetaRegression.jl",
makedocs(; sitename="BetaRegression.jl",
pages=["Home" => "index.md",
"Details" => "details.md",
"API" => "api.md"],
"API" => "api.md",
"Details" => "details.md"],
format=HTML(; prettyurls=(get(ENV, "CI", "false") == "true")))

deploydocs(; repo="github.com/ararslan/BetaRegression.jl.git")
125 changes: 85 additions & 40 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,57 +14,102 @@ change.

```@docs
BetaRegressionModel
fit
fit!
BetaRegressionModel(::AbstractMatrix, ::AbstractVector)
fit(::Type{BetaRegressionModel}, ::AbstractMatrix, ::AbstractVector)
fit!(::BetaRegressionModel)
```

## Properties of a model

The following common functions are extended for beta regression models:
- `Link`: The model's link function
- `aic`: Akaike's information criterion for the model
- `aicc`: Model AIC corrected for small sample sizes
- `bic`: Bayesian information criterion for the model
- `coef`: The vector ``\boldsymbol{\beta}`` of regression coefficients
- `coefnames`: Names of the coefficients (for models fit with a formula and table)
- `coeftable`: Table of coefficient names, values, standard errors, z-values, and p-values
- `confint`: Confidence intervals for the coefficient estimates
- `deviance`: Model deviance
- `devresid`: Vector of deviance residuals
- `dof`: Degrees of freedom
- `dof_residual`: Residual degrees of freedom
- `fitted`: The vector ``\hat{\mathbf{y}}`` of fitted values from the model
- `informationmatrix`: Expected or observed Fisher information
- `linearpredictor`: The linear predictor vector ``\boldsymbol{\eta}``
- `Link`: Link function ``g`` used for the mean
- `loglikelihood`: Model log likelihood
- `modelmatrix`: The model matrix ``\mathbf{X}``
- `nobs`: Number of observations used to fit the model
- `offset`: Model offset, empty if the model was not fit with an offset
- `params`: All parameters from the model, including both ``\boldsymbol{\beta}`` and ``\phi``
- `precision`: The estimated precision parameter ``\phi`` on its natural scale
- `precisionlink`: Link function ``h`` used for the precision parameter
- `predict`: Predict new response values given new observations
- `r2`/``: Pseudo ``R^2``
- `residuals`: Vector of residuals
- `response`: The response vector ``\boldsymbol{y}``
- `responsename`: Name of the response variable (for models fit with a formula and table)
- `score`: Score vector
- `stderror`: Standard errors of the coefficient and precision parameter estimates
- `vcov`: Variance-covariance matrix
- `weights`: Model weights, empty if the model was not fit with weights

Note that for a model with ``p`` independent variables, the information and
variance-covariance matrices will have ``p + 1`` rows and columns, the last of which
corresponds to the precision term.
However, `coef` does _not_ include the precision term and will have length ``p``.
```@docs
aic
aicc
bic
coef(::BetaRegressionModel)
coefnames(::TableRegressionModel{<:BetaRegressionModel})
coeftable(::BetaRegressionModel)
confint(::BetaRegressionModel)
deviance(::BetaRegressionModel)
devresid(::BetaRegressionModel)
dof(::BetaRegressionModel)
dof_residual(::BetaRegressionModel)
fitted
informationmatrix(::BetaRegressionModel)
linearpredictor
Link(::BetaRegressionModel)
loglikelihood
modelmatrix
nobs(::BetaRegressionModel)
offset
params(::BetaRegressionModel)
precision(::BetaRegressionModel)
precisionlink
predict
r2(::BetaRegressionModel)
residuals
response
responsename(::TableRegressionModel{<:BetaRegressionModel})
score(::BetaRegressionModel)
stderror(::BetaRegressionModel)
vcov(::BetaRegressionModel)
weights
```

There is a subtlety here that bears repeating.
The function `coef` does _not_ include the precision term, only the regression
coefficients, so for a model with ``p`` independent variables, `coef` will return a vector
of length ``p``.
A number of other functions, such as `informationmatrix`, `vcov`, `stderror`, etc., _do_
include the precision term, and thus will return an array with (non-singleton) dimension
``p + 1``.
While this difference may seem strange at first blush, the design was chosen intentionally
to ensure that the model matrix and regression coefficient vector are conformable for
multiplication.
Use `params` to retrieve the full parameter vector with length ``p + 1``.

## Link functions

This package employs the system for link functions defined by the GLM.jl package.
In short, each link function has its own concrete type which subtypes `Link`.
Some may actually subtype `Link01`, which is itself a subtype of `Link`; this denotes
that the function's domain is the open unit interval, ``(0, 1)``.
Link functions are applied with `linkfun` and their inverse is applied with `linkinv`.
Relevant docstrings from GLM.jl are reproduced below.

Any mention of "the" link function for a `BetaRegressionModel` refers to that applied to
the mean (at least in this document).
However, despite only having one linear predictor, `BetaRegressionModel`s actually have
two link functions: one for the mean and one for the precision.

### Mean

```@docs
Link01
LogitLink
CauchitLink
CloglogLink
ProbitLink
```

### Precision

```@docs
IdentityLink
InverseLink
InverseSquareLink
LogLink
PowerLink
SqrtLink
```

## Developer documentation

This section documents some functions that are _not_ user facing (and are thus not
exported) and may be removed at any time.
They're included here for the benefit of anyone looking to contribute to the package
and wondering how certain internals work.
Other internal functions may be documented with comments in the source code rather
than with docstrings; read the source directly for more information on those.

```@docs
dmueta
Expand Down
24 changes: 23 additions & 1 deletion docs/src/details.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,14 +114,36 @@ By analogy, what is implemented here is an intercept-only sub-model for the prec
We don't have to resort to anything fancy in order to fit beta regression models; we can
simply use [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation)
on the full parameter vector for the model, which we define to be
``\theta = [\beta_0, \ldots, \beta_n, \phi]``.
``\theta = [\beta_1, \ldots, \beta_p, \phi]``.

## Fitting a model

In BetaRegression.jl, the maximum likelihood estimation is carried out via
[Fisher scoring](https://en.wikipedia.org/wiki/Scoring_algorithm) using closed-form
expressions for the score vector and expected information matrix.

The information matrix is symmetric with the following block structure:
```math
\left[
\begin{array}{ccc|c}
\frac{\partial^2 \ell}{\partial \beta_1^2} & \cdots &
\frac{\partial^2 \ell}{\partial \beta_1 \partial \beta_p} &
\frac{\partial^2 \ell}{\partial \beta_1 \partial \phi} \\
\vdots & \ddots & \vdots & \vdots \\
\frac{\partial^2 \ell}{\partial \beta_p \partial \beta_1} & \cdots &
\frac{\partial^2 \ell}{\partial \beta_p^2} &
\frac{\partial^2 \ell}{\partial \beta_p \partial \phi} \\
\hline \\
\frac{\partial^2 \ell}{\partial \phi \partial \beta_1} & \cdots &
\frac{\partial^2 \ell}{\partial \phi \partial \beta_p} &
\frac{\partial^2 \ell}{\partial \phi^2}
\end{array}
\right]
```
Since ``\mu`` depends on ``\phi``, we have that
``\mathbb{E}\left(\frac{\partial^2 \ell}{\partial \beta_i \partial \phi}\right) \neq 0``,
so the matrix is not block diagonal.

There is no canonical link function for the beta regression model in this parameterization
in the same manner as for GLMs (anything that constrains ``\mu`` within ``(0, 1)`` will
do just fine) but for simplicity and interpretability the default link function is
Expand Down
Loading

0 comments on commit 6b556db

Please sign in to comment.