Skip to content

Conversation

@KronosTheLate
Copy link

In the Statistics-course I had last semester, I early on found that I have an issue with normalizing distributions, computing normalized test-statistics, only to rescale the test statistic.

To that end, I looked for a way to change the location and scale of a TDist. I even went as far as trying to define my own generalized GTDist, based on https://juliastats.org/Distributions.jl/stable/extends/. But I never succeded.

Lo and behold, there is a function that does exactly what I wanted all along. Or, more precisly, a type: LocationScale.

This PR aims to make LocationScale more discoverable, by

  1. Adding a note about LocationScale in the "Create New Samplers and Distributions" section of the docs.
  2. Adding a reference to LocationScale in the docstring for TDist.

I feel that 2) might be overkill. But as the same time, it was the only distribution that I found the need to scale, and I quickly encountered that need. I also think that overdocumenting is better that underdocumenting, as one can not expect every user to read the entire documentation.

I do not know which (if any) other distributions LocationScale is particularly relevant for, but the same pointer should be added if there are.

These are my suggestions on how to make LocationScale more discoverable, but others are of course very welcome.

@devmotion
Copy link
Member

LocationScale is deprecated so we shouldn't add any new references for it: #1453

Instead one should use * (scaling) and + (shifting) for affine transformations.


Whereas this package already provides a large collection of common distributions out of box, there are still occasions where you want to create new distributions (*e.g* your application requires a special kind of distributions, or you want to contribute to this package).

**Note:** if you only want to change the location and scale of a univariate distribution, see [`LocationScale`](@ref).
Copy link

@rikhuijzer rikhuijzer Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note:** if you only want to change the location and scale of a univariate distribution, see [`LocationScale`](@ref).
!!! note
to change the location and scale of a univariate distribution, use `+` and `*`, see [`AffineDistribution`](@ref) for details.

Please check the syntax locally to verify that everything looks good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a too specific remark. One can define new Distributions but there are several ways to create “derived distributions” such as truncated and location-scale and mixtures, no need to single one out here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So perhaps a new subsection in "Create New Samplers and Distributions", called "Derived distributions"? With an overview on all the ways to create new derived distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good solution to me. I hate to simply leave the work to others, but I am unfortunately not one to write such a section.

I would invite any use of my suggested section on scaling and shifting, in a comment below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So perhaps a new subsection in "Create New Samplers and Distributions"

This section is (at least currently) for documenting how one can implement completely new distributions and samplers according to the interface.

I think a separate section that talks about such derived distributions would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already separate sections for some derived distributions, e.g. reshaped distributions, mixture models, and convolutions, and some are part of other sections such as truncated and product distributions. These could all be moved to or at least linked from such a section of derived distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it shouldn't be a subsection of "Create New Samplers and Distributions"

dof(d) # Get the degrees of freedom, i.e. ν
```
To create a TDist with a different location and scale, see `LocationScale`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To create a TDist with a different location and scale, see `LocationScale`.
To create a TDist with a different location and scale, use `+` and `*`, see `AffineDistribution` for details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your changes. Two problems however:

  1. The suggested reference is not good enough:
    image

  2. If there is a new section added (see my comment from 3 minutes ago), the reference should probably be to it, and not AffineDistribution. Right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People who read the source code probably have a terminal open and can do ?AffineDistribution

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not help me much...

julia> using Distributions

(@v1.7) pkg> st Distributions
      Status `C:\Users\densb\.julia\environments\v1.7\Project.toml`
  [31c24e10] Distributions v0.25.37

help?> AffineDistribution
search:

Couldn't find AffineDistribution
Perhaps you meant MatrixDistribution
  No documentation found.

  Binding AffineDistribution does not exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AffineDistribution is not exported because you're not supposed to use the constructor. It's just a fallback, similar to Truncated or Product which one also should not use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So referring to it for documentation is a bad idea, right?

@KronosTheLate
Copy link
Author

KronosTheLate commented Jan 17, 2022

Cool! That is pretty neat.
Then perhaps there should be a separate section in the documentation for scaling and shifting? With a reference to it early on in the Univariate Distributions section. I do not really feel like it belongs anywhere else, and this should make it discoverable enough for anyone looking to scale and shift a distribution.

I am not technically well versed on the details, but here is my stab at writing such a section:

Scaling and shifting a distribution

Some distributions have parameters that allow scaling and shifting them. Examples include the mean and standard deviation of a Normal distribution. However, other distributions do not have such parameters. Luckily, it is possible to shift and scale all univariate distributions. To do so, simply multiply with the scaling factor, and add the shift. In the example below, a Student's T distribution is scaled with a factor 2, and shifted by 2:

julia> d = TDist(10);

julia> scaled_d = d * 2;

julia> shifted_and_scaled_d = scaled_d + 2;

julia> shifted_and_scaled_d == d * 2 + 2
true

Note that the scaling occurs along the x-axis, just like the shift. This means that the scaling factor and shift play exactly the same role as the standard deviation and mean for a normal distribution, respectively. Additionally, distributions that have scaling and shifting parameters can still be scaled and shifted:

julia> d1 = Normal(10, 10);

julia> d2 = Normal()*10 + 10;  #Normal() defaults to mean=0 and stddev=1

julia> d1 == d2
true

@rikhuijzer
Copy link

rikhuijzer commented Jan 17, 2022

julia> d1 = Normal(10, 10);

julia> d2 = Normal()*10 + 10;

julia> d1 == d2
true

This requires the reader to know the default arguments for the Normal distribution by heart which is a bit weird.

Overall, I'm not sure whether such a section adds much. The whole idea seems pretty intuitive

EDIT: I do very much agree that adding a very more references would be a good idea. I've read large parts of the documentation about scaling recently and never spotted the + and * syntax.

@KronosTheLate
Copy link
Author

This requires the reader to know the default arguments for the Normal distribution by heart which is a bit weird.

I added a comment, which should make things clear. The point is that you can simply scale and shift "Standardized distributions", which makes writing it this way make sense. If you disagree, I am fine with setting the parameters explicitly.

Overall, I'm not sure whether such a section adds much. The whole idea seems pretty intuitive to me

The idea is very inntuitive - really cool functionality! However, how should a user know that it is possible, and what should they look for to discover that functionality?

Also, since I am suggesting a new section, it is very easy to skip it entirely if you understand it. Overdocumentation is IMHO much better than underdocumentation, because not having needed docs hurts so much more than having redundant docs. As I see it, the docs are for the dullest and least competent users imaginable. I do not feel that I am that user, but I would still very much appriciate such a section. I therefore see a need for it.

EDIT: I do very much agree that adding a very more references would be a good idea. I've read large parts of the documentation about scaling recently and never spotted the + and * syntax.

Where would you feel that it is best to add such references? The alternative to a new section, as I see it, would be to add instructions on how to scale and shift every univariate distribution in their docstring. This strikes me as a less elegant solution...

@mattiasvillani
Copy link

I second changing the documentation. The use of * and + for scaling and location changes is compact, but far from obvious.
Let me also add that I find it inconsistent that TDist() does not have location and scale parameters, but its special case Cauchy (TDist with \nu = 1) has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants