Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move method descriptions to docstrings #106

Merged
merged 8 commits into from
Aug 26, 2017

Conversation

BenjaminBorn
Copy link
Contributor

In this pull request, I have copied all available documentation on tests, pvalues, and confidence intervals to docstrings. Please note that I have only done some reformatting. I have not checked the documentation for correctness. This is the first step in the transition to Documenter.jl as discussed in #103.

When reviewing, please consider the following questions.

  1. I have attached the general description of pvalue() and confint() somewhat arbitrarily to two of the tests (Binomial.jl and t.jl). Is there a better place to put them?
  2. Sometimes a test has different methods, e.g. OneSampleTTest(). If possible, I have combined them in one docstring, but sometimes, for clarity reasons, I have included different docstrings for the methods. Is that ok, or should we combine them somehow into one docstring? The disadvantage of having them in different docstrings is that the help output in the REPL can get convoluted, especially as there is not much spacing between the output for the different methods.
  3. pvalue and confint also have different methods with a number of separate docstrings. Is there a way to ensure that the general method, e.g. pvalue(x::HypothesisTests), is the first shown when typing ?pvalue? As some of the more specialised methods have long docstrings, it could get lost otherwise.
  4. The admonition !!! note doesn't work in the Jupyter notebook (at least not for me using Safari). It works in the REPL and Juno. Maybe someone can try it. If it's a general problem, we can file an issue with IJulia (or at the appropriate place).
  5. There was a copy/paste mistake in the docstring for the binomial confidence intervals. After fixing it, I realised that there is also an open PR (Updated docstring for binomial confidence intervals #55) for it. Is there any way of giving credit for the fix to @juliangehring ?

Thanks in advance for your input!

@ararslan ararslan self-requested a review August 21, 2017 16:57
@BenjaminBorn
Copy link
Contributor Author

One idea for question (1) would be to put a "fake" general method in HypothesisTests.jl à la

"""
    confint(test::HypothesisTest, alpha = 0.05; tail = :both)

Compute a confidence interval C with coverage 1-`alpha`.

...

"""
confint(test::HypothesisTest, alpha = 0.05; tail = :both)

And similarly for pvalue. Would that be preferable to attaching it to a random method?

@ararslan
Copy link
Member

A way to generally attach docstrings to functions is to either forward-declare the function with no methods, e.g.

"""
    confint(...)

...
"""
function confint end

or attach it to the function after methods have been defined for it, e.g.

"""
    confint(...)

...
"""
confint

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, that's very useful! I've taken this opportunity to review the docstrings, so I've noted many small issues which were already present in the docs. Feel free to fix only those which are introduced by your PR, but if you feel like it it would be great to fix the others too. At any rate it's good to have a commit just to copy the existing docstrings, and one or more commits improving them, so that we can easily track the changes.

`xs` comes from the same distribution against the alternative hypothesis that the samples
comes from different distributions.

`modified` paramater enables a modified test calculation for samples whose observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"parameter". Do you have more details about what's that modified calculation?

src/t.jl Outdated
OneSampleTTest(xbar::Real, stdev::Real, n::Int, mu0::Real = 0)

Perform a one sample t-test of the null hypothesis that `n` values with mean `xbar` and
sample standard deviation `stdev` come from a distribution with `mu0` against the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with mean mu0". Also better use μ0 and stddev as in the code.

@@ -30,6 +30,15 @@ immutable OneSampleADTest <: ADTest
A²::Float64 # Anderson-Darling test statistic
end

"""
OneSampleADTest{T<:Real}(x::AbstractVector{T}, d::UnivariateDistribution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general better do ::AbstractVector{<:Real} and remove T. Same elsewhere.

Copy link
Contributor Author

@BenjaminBorn BenjaminBorn Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the docstrings or should I also go through the function definitions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do both if T isn't used in the body of functions, but to keep this PR focused better only touch docstrings.

"""
KSampleADTest{T<:Real}(xs::AbstractVector{T}...; modified=true)

Perform a k-sample Anderson–Darling test of the null hypothesis that the data in vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in the k vectors" would be clearer. You can use double backticks around k for each of its uses.


Perform a k-sample Anderson–Darling test of the null hypothesis that the data in vectors
`xs` comes from the same distribution against the alternative hypothesis that the samples
comes from different distributions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"come". Same for previous occurrence (I guess).

src/t.jl Outdated
Most of the implemented confidence intervals are *strongly consistent*, that is, the
confidence interval with coverage 1-`alpha` does not contain the test statistic under
``h_0`` if and only if the corresponding test rejects the null hypothesis
``h_0: \\theta=\\theta_0``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write alpha and theta directly using greek letters.

src/t.jl Outdated
EqualVarianceTTest(x::AbstractVector{T<:Real}, y::AbstractVector{T<:Real})

Perform a two-sample t-test of the null hypothesis that `x` and `y` come from a
distributions with the same mean and equal variances against the alternative hypothesis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a distributions with the same mean and equal variances" -> "distributions with equal means and variances". Same below.

src/wilcoxon.jl Outdated
When there are no tied ranks and ≤50 samples, or tied ranks and ≤15 samples,
`SignedRankTest` performs an exact signed rank test. In all other cases, `SignedRankTest`
performs an approximate signed rank test. Behavior may be further controlled by using
`ExactSignedRankTest` or `ApproximateSignedRankTest` directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @ref.


The Mann-Whitney U test is sometimes known as the Wilcoxon rank sum test.

When there are no tied ranks and ≤50 samples, or tied ranks and ≤10 samples,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @ref.

src/wilcoxon.jl Outdated
"""
ExactSignedRankTest(x::AbstractVector{T<:Real}[, y::AbstractVector{T<:Real}])

Perform an exact signed rank U test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeat the definition of the test. Same below.

@nalimilan
Copy link
Member

I have attached the general description of pvalue() and confint() somewhat arbitrarily to two of the tests (Binomial.jl and t.jl). Is there a better place to put them?

The usual solution to that is to define function pvalue end somewhere and attach the generic docstring to that.

Sometimes a test has different methods, e.g. OneSampleTTest(). If possible, I have combined them in one docstring, but sometimes, for clarity reasons, I have included different docstrings for the methods. Is that ok, or should we combine them somehow into one docstring? The disadvantage of having them in different docstrings is that the help output in the REPL can get convoluted, especially as there is not much spacing between the output for the different methods.

I'd say that's OK as long as the repeated content is not too long. If the REPL output isn't clear, it should be improved, rather than working around it in packages (the HTML manual looks better in general).

pvalue and confint also have different methods with a number of separate docstrings. Is there a way to ensure that the general method, e.g. pvalue(x::HypothesisTests), is the first shown when typing ?pvalue? As some of the more specialised methods have long docstrings, it could get lost otherwise.

IIRC it depends on the order in which the methods are defined. Using the trick I gave above, you should be able to define it in the right place.

The admonition !!! note doesn't work in the Jupyter notebook (at least not for me using Safari). It works in the REPL and Juno. Maybe someone can try it. If it's a general problem, we can file an issue with IJulia (or at the appropriate place).

Agreed. Also check that it works in the HTML manual though.

There was a copy/paste mistake in the docstring for the binomial confidence intervals. After fixing it, I realised that there is also an open PR (#55) for it. Is there any way of giving credit for the fix to @juliangehring ?

Yes, the best way to give her credit is to merge here PR, which I just did. Thanks for pointing it out!

@BenjaminBorn
Copy link
Contributor Author

BenjaminBorn commented Aug 23, 2017

Ok, I have (hopefully) addressed all points directly related to the PR. I have also updated the documentation more generally along the lines suggested by @nalimilan in a separate commit. However, I'm not able to answer the specific questions about the Power Divergence test so there, I only addressed the formatting issues. Could we deal with these specific questions in separate PR by someone more qualified?

The p-value is computed using a ``χ^2`` approximation to the distribution of the test
statistic ``H_c=\\frac{H}{C}``:
```math
\\begin{align}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, does align number the lines and align* doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, align numbers (although the Jupyter notebook doesn't print them). I'll change it to align* in the next iteration.

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We can always improve docstrings later; that doesn't have to all happen at once here.

@BenjaminBorn
Copy link
Contributor Author

I have attached the general description of pvalue() and confint() somewhat arbitrarily to two of the tests (Binomial.jl and t.jl). Is there a better place to put them?

The usual solution to that is to define function pvalue end somewhere and attach the generic docstring to that.

Thanks for the hint about zero-method functions. One related question. In Documenter.jl, it's possible to include a certain method via

```@docs
length(::T)
```

So if I now define a general docstring for pvalue(::HypothesisTest) using a zero-method function, there is no way of selecting this general definition via a method for the construction of the documentation, right? I tried

```@docs
pvalue(::HypothesisTest)
```

but that didn't work.

@ararslan
Copy link
Member

```@docs
pvalue
```

🙂

@BenjaminBorn
Copy link
Contributor Author

```@docs
pvalue
```

But that prints the docstrings for all defined methods of pvalue. What if I only want the general one? For example so that I can structure the documentation and put pvalue(::Binomial) somewhere else.

@ararslan
Copy link
Member

ararslan commented Aug 23, 2017

Ah, I see. I think you should be able to do this:

"""
    pvalue(...)

Some general description
"""
pvalue(::HypothesisTest; kwargs...)

after methods have been defined for pvalue, rather than the 0-method forward declaration, then add pvalue(::HypothesisTest) or whatever to the @docs block.

@BenjaminBorn
Copy link
Contributor Author

Thanks, but even if I put the definition at the end of HypothesisTests.jl after everything has been defined (and the docstring can be found, e.g. in the REPL), I still get

ERROR: LoadError: UndefVarError: HypothesisTest not defined
while loading /Users/bborn/.julia/v0.6/HypothesisTests/docs/make.jl, in expression starting on line 3

when I add pvalue(::HypothesisTest) to the @docs block.

@ararslan
Copy link
Member

Hmmmmm, maybe

```@meta
CurrentModule = HypothesisTests
```

```@docs
pvalue(::HypothesisTest)
```

src/t.jl Outdated
OneSampleTTest(xbar::Real, stddev::Real, n::Int, μ0::Real = 0)

Perform a one sample t-test of the null hypothesis that `n` values with mean `xbar` and
sample standard deviation `stddev` come from a distribution with `μ0` against the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to fix "with mean μ0".

src/binomial.jl Outdated
- Agresti Coull interval `:agresti_coull`: Simplified version of the Wilson interval;
they are centered around the same value. The Agresti Coull interval has higher or
equal coverage.
- Arcsine transformation `:arcsine`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave this for somebody else then. ;-)

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks almost ready, I've just noted a few remaining details.


If `x` is a matrix with at least two rows and columns, it is taken as a two-dimensional
contingency table. Otherwise, `x` and `y` must be vectors of the same length. The contingency
table is calculated using `counts` from [`Statsbase`](@ref). Then the power divergence test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"StatsBase". Also, I meant writing[`StatsBase.counts`](@ref) (I'm not sure a link to a module works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Documenter.jl documentation states

Note that depending on what the CurrentModule is set to, a docstring @ref may need to be prefixed by the module which defines it.

However, at least in my local build, links like [`StatsBase.counts`](@ref) don't seem to work. But maybe we can figure this out in the next PR that will actually set up Documenter.jl.

/\\hat{n}_{ij})^λ -1\\right]
```
where ``n_{ij}`` is the cell count in the ``i`` th row and ``j`` th column and ``λ`` is a
real number determing the nature of the test to be performed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"determining"

"""
MultinomialLRT(x [,y] [,theta0])

Convenience function for power divergence test with ``λ=0``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please at least copy the relevant docs from PowerDivergenceTest and fix the signature.

@BenjaminBorn
Copy link
Contributor Author

Thanks @ararslan and @nalimilan for the thorough review. I have hopefully addressed all points now. Once this PR is merged, I will open a new PR to build the docs.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more issue. Can you rebase and fix the conflicts so that I can merge?

src/fisher.jl Outdated
@@ -71,7 +71,8 @@ immutable FisherExactTest <: HypothesisTest
end

testname(::FisherExactTest) = "Fisher's exact test"
population_param_of_interest(x::FisherExactTest) = ("Odds ratio", 1.0, x.ω) # parameter of interest: name, value under h0, point estimate
population_param_of_interest(x::FisherExactTest) = ("Odds ratio", 1.0, x.ω)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not change this since that's unrelated. Or at least put the comment before the function rather than after. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have reverted it.

@BenjaminBorn
Copy link
Contributor Author

BenjaminBorn commented Aug 24, 2017

Done. Let me know if I should do anything else.

@bjarthur
Copy link
Contributor

this is great! but... why is this package not being tested on julia 0.6?

@ararslan
Copy link
Member

Will be. Once this is merged we should drop 0.5 support, test on 0.6, and have FemtoCleaner do a deprecation fixing pass.

src/fisher.jl Outdated
The one-sided p-values are based on Fisher's non-central hypergeometric distribution
``f_ω(i)`` with odds ratio ``ω``:
```math
\\begin{align}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you're moving everything to align* this should be as well? Aside from that, this all LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

src/fisher.jl Outdated
|*Y2*| c | d |

!!! note
The [`Base.show`](@ref) output contains the conditional maximum likelihood estimate of the odds ratio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you build the docs, what does this end up linking to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above

The Documenter.jl documentation states
"Note that depending on what the CurrentModule is set to, a docstring @ref may need to be prefixed by the module which defines it."
However, at least in my local build, links like StatsBase.counts don't seem to work. But maybe we can figure this out in the next PR that will actually set up Documenter.jl.

So, at least for now the links to outside modules don't work. Those are:

  • [`StatsBase.counts`](@ref) and [`Base.show`](@ref) for which the error is that !! No doc found for reference
  • [`Rmath.pwilcox`](@ref) and [`Rmath.psignrank`](@ref) for which it says ERROR: UndefVarError(:Rmath), so this is maybe not the correct module call.

The Documenter.jl documentation unfortunately doesn't give an example how to link to other modules and I also couldn't find an example in other packages.

Besides these dead links, the docs are still built correctly and everything looks ok. If you have an idea on how to fix it, I can push the fix here. If it takes us longer to figure it out, we might want to merge here and then fix the links in the follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just remove the dead ref links for now, e.g. in this case just say "The show output contains..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But you explain this to @nalimilan ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did Milan tell you to link to things that can't be properly linked...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no, of course not. I was joking. He told me to include the links but wasn't sure whether it would work. Let's leave them out and we can figure the linking out later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it seems it's not possible at the moment: JuliaDocs/Documenter.jl#425. Let's go without links for now.

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go as far as I'm concerned. Thanks as always!

@BenjaminBorn
Copy link
Contributor Author

Thanks Alex. Sorry for the back and forth.

@ararslan
Copy link
Member

No problem at all! That's just how PR reviews go. You've done awesome work here, as usual.

@ararslan
Copy link
Member

@nalimilan Good to merge? Any further cleanup that's necessary can always happen later.

@nalimilan
Copy link
Member

Sorry for the delay. It's fine with me! Though conflicts have to be fixed first.

@ararslan
Copy link
Member

Though conflicts have to be fixed first.

Weird, I'm not seeing any merge conflicts?

@ararslan ararslan merged commit 6cc7d93 into JuliaStats:master Aug 26, 2017
@BenjaminBorn BenjaminBorn deleted the create_docstrings branch August 26, 2017 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants