Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to statistical recipes #1563

Merged
merged 15 commits into from
Jan 25, 2022
Merged

Fixes to statistical recipes #1563

merged 15 commits into from
Jan 25, 2022

Conversation

piever
Copy link
Contributor

@piever piever commented Jan 9, 2022

Description

Fixes qqnorm, qqplot, which were broken as they were converted to an odd Scatter plot type and that broke with the new internals. They now have a more sensible "standard" implementation as a new plot type.

It also fixed plotting StatsBase.Histogram objects. They accidentally ended up having a gap between bars after ,#1223.

Type of change

Delete options that do not apply:

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • Added or changed relevant sections in the documentation

I have not added tests as it seems tests exist already (see https://github.com/JuliaPlots/Makie.jl/blob/master/test/statistical_tests.jl) but somehow they stopped being run. It is probably best to update those in a separate PR (it seems all of them need fixing, due to changes in the internals).

Copy link
Member

@SimonDanisch SimonDanisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really the right person to review this ;) If you have opinions on this @jkrumbiegel, please comment as well..Otherwise, I'd just leave this up to @piever to merge!

- a list of samples,
- an abstract distribution, e.g. `Normal(0, 1)`,
- a distribution type, e.g. `Normal`.
In the last case, the Q-Q plot by fitting that distribution type to the data `y`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence misses a word?

The attribute `qqline` (defaults to `:R`) determines how to compute a fit line for the Q-Q plot.
Possible values are the following.
- `:identity` draws the identity line (useful to see `x` and `y` follow the same distribution).
- `:fit` fits the line to the quantile pairs (useful to see if the distribution of `y` can be obtained from the distribution of `x` via an affine transformation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe :fit and :quantile are not the best names. Both fit and both use quantiles it seems to me?

Copy link
Contributor Author

@piever piever Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm renaming :quantile to :robust (that's how Base R calls it, see here under qq.line.type). They say "least squares" instead of :fit, but I think that's a bit too technical.

- `:identity` draws the identity line (useful to see `x` and `y` follow the same distribution).
- `:fit` fits the line to the quantile pairs (useful to see if the distribution of `y` can be obtained from the distribution of `x` via an affine transformation).
- `:quantile` is analogous to `:fit` but uses a quantile-based fitting method.
- `:R` is an alias for `:quantile`, as that is the default behavior in `:R`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure if that's necessary to have two names for the same thing where one is R. We don't do that in other places either, so maybe a comment that quantile is Rs default method is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like Base R (see here) and ggplot2 (see here) have different defaults, so I'm just omitting that part.


Graphical attributes are
- `color` to control color of both line and markers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If markercolor is not also specified I guess?


convert_arguments(::Type{<:QQPlot}, args...; kwargs...) =
convert_arguments(Scatter, qqbuild(loc(args...)...); kwargs...)
Shorthand for `qqplot(Normal, y)`. See [`qqplot`](@ref) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this @ref stuff hasn't worked for a while, maybe since the switch to Franklin? At least I remember having to remove it in a bunch of places

@piever
Copy link
Contributor Author

piever commented Jan 10, 2022

Thanks for the review! I've addressed the feedback and added a NEWS entry on the renaming of the options for qqline . Should be good to go now.

@piever
Copy link
Contributor Author

piever commented Jan 10, 2022

Actually, I do have one doubt regarding the default value for qqline. I realize that no single choice is ideal, because qqline=:robust makes a lot of sense if you are doing the standard qqnorm (comparing with a standard Gaussian, or more generally a distribution with default parameters), whereas qqline=:identity is IMO a more reasonable default if you are comparing two empirical distributions (from vectors of samples) or data versus a fitted distribution.

OTOH, I don' really like magically switching from one default to the other depending on the argument types. Should we just default to qqline=:none, or is qqline=:robust a good enough default?

UPDATE: I'm becoming convinced that, for the sake of clarity, one should do the following two things.

  • Default to qqline=:none, otherwise the behavior (figuring out when it's a fit and when it's the identity line) is too subtle.
  • Rename :robust (the old :quantile) to :fitrobust. It is conceptually not different from :fit, but it simply uses only the [0.25, 0.75] quantiles rather than the range(0, 1, length=N) (which is used to draw the plot).
  • Error for invalid values of qqline and list the valid options in the error message.

(I've updated the PR to reflect this approach.)

@palday
Copy link
Contributor

palday commented Jan 15, 2022

@piever I like the idea of omitting the line by default and forcing the user to take an explicit stand on which type they want.

src/stats/distributions.jl Outdated Show resolved Hide resolved
Pietro Vertechi and others added 4 commits January 15, 2022 15:30
@piever
Copy link
Contributor Author

piever commented Jan 21, 2022

Planning to merge this in a few days if there are no additional concerns. To summarize the main API changes

  • old qqline=:quantile or qqline=:R are now qqline=:fitrobust
  • :none is the default (as in general it is unclear exactly which line should be drawn)
  • a value of qqline not among the valid options (:none, :fit, :fitrobust, :identity) errors rather than being ignored silently (which allows us to add options in a non-breaking way in the future)

@piever piever merged commit 584271a into master Jan 25, 2022
@piever piever deleted the pv/qqplot branch January 25, 2022 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants