Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalising the y axis of the 1D hist Pdfs diagonals to unity! #86

Closed
sultan-hassan opened this issue Sep 27, 2016 · 25 comments
Closed

Comments

@sultan-hassan
Copy link

1 - Is there any quick way to do this with corner? if not, then its unclear how to use this with weights, what is the shape of weights! because in the documents its written [nsmaples,] isn't should be [nsamples,ndim] as the same shape of x the sample array?

2 - whenever I try using weights, it gives: TypeError: hist() got multiple values for keyword argument 'weights'

Thanks in advance for your help.

@kbarbary
Copy link
Contributor

Regarding weights, a single weight applies to a single sample (a single sample being a position in an n-dimensional space). So, weights[i] is the weight for the sample samples[i, :]. It wouldn't make sense to have different weights for different dimensions of a single sample.

@sultan-hassan
Copy link
Author

Thank you very much for that. So how do I creat such a weight to normalize the y axis of the diagonal 1D PDFs for my sample.
Lets assume that my sample has a shape of (1000, 3) so the corresponding wieghts should be (1000), right? so I am confused on how to construct such a weight to do the normalization.

@kbarbary
Copy link
Contributor

I'm a bit confused: I thought the y axis of the diagonals are not labelled anyway, so I don't see what normalizing these 1-d PDFs would do (other than perhaps change the y axis tick locations, which don't mean much).

Maybe you're using an option to get y labels on the diagonals, or maybe defaults have changed since I last looked at it?

@sultan-hassan
Copy link
Author

sultan-hassan commented Sep 27, 2016

Let me explain more. I am over plotting two samples on top of each other so without normalizing, you cant clearly see the pdfs of the two samples on the 1D pdf hist diagonals.
If you see the attached plot, I want to normalize the red and blue so then I can have them in the same length.

figure_2

@kbarbary
Copy link
Contributor

Ah, I didn't understand that you were plotting two sets of samples.

Dan will know better, but I think passing a weight array will indeed affect the relative scaling of the two sets of samples. Assuming you have 1000 samples, try passing 2.0 * np.ones(1000) for weights for one set and np.ones(1000) for the other set and see if it changes the relative scaling.

@sultan-hassan
Copy link
Author

Yes it does change scaling but it would be great to find out something consistent to use for all samples such that the sum under the PDF is equal to one or any way to normalize them without playing randomly, 2.0 * np.ones(1000) if not then 3.0 * np.ones(1000) ...etc. thanks for helping out.

@kbarbary
Copy link
Contributor

Is the problem that the two sets have different numbers of samples? If so, setting weights=np.ones(nsamples)/nsamples for each set should make the areas under the PDF the same regardless of the value of nsamples.

@sultan-hassan
Copy link
Author

The nsmaples is the same! and I tried this but it doesn't help. There must be a way :( I will keep playing around, let me know if you find something.

@dfm
Copy link
Owner

dfm commented Sep 28, 2016

If nsamples is the same then it is normalized! The integral under each of those histograms is the same.

@drphilmarshall
Copy link

Agreed - the blue and red histograms look like good approximations to
normalized PDFs to me!

On Wed, Sep 28, 2016 at 8:57 AM, Dan Foreman-Mackey <
notifications@github.com> wrote:

If nsamples is the same then it is normalized! The integral under each
of those histograms is the same.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#86 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AArY9xsEl0RhvOGsOR6s8_o2moi4Nf05ks5quo54gaJpZM4KIAYE
.

@sultan-hassan
Copy link
Author

So If I want the height of the 1D hist diagonals (blue and red) to be the same? should I incearse the bins number or I should use the same bin-width? It should be possible to have them in the same height! the question is how? still playing around...

@drphilmarshall
Copy link

drphilmarshall commented Sep 28, 2016 via email

@sultan-hassan
Copy link
Author

Great thanks for that :). Could you maybe point out the link to those functions 'equalize_peak_heights=True' and 'normalization=1512', are these kwargs for hist OR hist2d?

@drphilmarshall
Copy link

drphilmarshall commented Sep 28, 2016 via email

@sultan-hassan
Copy link
Author

Cool I would be very much happy to contribute and modify the code as this routine + emcee already have been providing a great help in my research, many thanks to the owner.

@dfm
Copy link
Owner

dfm commented Sep 28, 2016

The first thing that we need is a convincing argument for why you want this
feature. I'm currently skeptical that it's actually something that we want
and I'm hesitant to add features that might be misleading so I'd love to
hear the specific use case and the story that you're trying to tell.
On Wed, Sep 28, 2016 at 12:54 PM Sultan Hassan notifications@github.com
wrote:

Cool I would be very much happy to contribute and modify the code as this
routine + emcee already have been providing a great help in my research,
many thanks to the owner.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#86 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAVYSpEQ5rChFMnlO1i3lVkcEk8OM0-Kks5qusXJgaJpZM4KIAYE
.

@sultan-hassan
Copy link
Author

Well, the only reason is that a much more better visualisation in comparing different samples in terms of the distribution shape and width. However, if this doesnt seem useful to be a part of the routine, then thats fine with me anyway. But I would still like to know how to do such a thing for my self. Any ideas?

@dfm
Copy link
Owner

dfm commented Sep 28, 2016

Let's get some more details – what are the samples that you're comparing and why is your suggestion better for comparison? If the histograms are properly normalized then a wider distribution will also be "shorter". I think that this actually makes the visualization clearer! I expect that this is also the same reason why neither matplotlib or numpy have native support for this.

If you want to mock up a change, it will be easiest to do with weights = np.ones(n) and modifying this line:

y0 = np.array(list(zip(n, n))).flatten()

to

y0 = np.array(list(zip(n, n))).flatten() / np.max(n)

Note: that I still stand by my opinion that this would lead to a misleading result!

@drphilmarshall
Copy link

I think if we are talking about samples from a PDF, then you always want to
normalize each histogram to 1. However, if you want to use corner to
visualize number density rather than probability density then I can see
how one might want to specify the relative normalizations of the datasets
being overlaid. I have never needed to normalize to equal peak height...

On Wed, Sep 28, 2016 at 2:04 PM, Dan Foreman-Mackey <
notifications@github.com> wrote:

Let's get some more details – what are the samples that you're comparing
and why is your suggestion better for comparison? If the histograms are
properly normalized then a wider distribution will also be "shorter". I
think that this actually makes the visualization clearer! I expect that
this is also the same reason why neither matplotlib or numpy have native
support for this.

If you want to mock up a change, it will be easiest to do with weights =
np.ones(n) and modifying this line
https://github.com/dfm/corner.py/blob/master/corner/corner.py#L248:

y0 = np.array(list(zip(n, n))).flatten()

to

y0 = np.array(list(zip(n, n))).flatten() / np.max(n)

Note: that I still stand by my opinion that this would lead to a
misleading result!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#86 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AArY91jVkoaSgzQNTvcM__98B9ZeoavKks5qutZYgaJpZM4KIAYE
.

@dfm
Copy link
Owner

dfm commented Sep 28, 2016

Totally! The current default behavior is actually to normalize to the number density and you can add hist_args=dict(normed=True) to get the PDF behavior.

@drphilmarshall
Copy link

That is good to know indeed! I went back and checked the API docs: this
default behavior is not explained anywhere, but perhaps it should. Also, to
get the PDF behavior do you need to specify hist2d_args=dict(normed=True)
as well? I can't think of any reason you would want to normalize the 2D and
1D histograms differently, can you? Maybe we need a normed=True kwarg
in corner.corner, that turns on both 1 and 2D normalizations?

In the meantime, Sultan, it sounds as though you can get the behavior you
want through judicious choice of sample sizes...

@dfm
Copy link
Owner

dfm commented Sep 28, 2016

I agree that it's worth saying something about the default behavior in the docs – I am actually inclined to change the default to density normalization!

I'm not sure what you mean about the "normalization" of the 2D histograms. The contours are always at percentiles of the sample mass. I guess you could choose to have the contours defined in terms of numbers but that's some craziness! I don't want to go there.

I also don't think that you'll ever be able to get the requested behavior by changing the sample size because the peak height in each panel actually depends on the bin sizes and the shape of the distribution. That's the whole reason why it's meaningless to give the "peaks" equal heights!

@drphilmarshall
Copy link

Yeah - the word "judicious" can cover a lot of fiddling around... :-)

I'd support a move to probability density normalization by default,
especially since the contour levels are defined in terms of probability
mass! However if someone really was trying to visualize number density, I
guess they might want contours in absolute number density, but I agree
it's better to wait for that to be requested... In the meantime they could
still have the 2D grayscale, 2D scatter plot and 1D histograms all
represent absolute number density (which I think is the current default).
I bet they woudl still find it useful to be able to switch easily from
"number density" to "probabilty density" and back, though.

On Wed, Sep 28, 2016 at 4:35 PM, Dan Foreman-Mackey <
notifications@github.com> wrote:

I agree that it's worth saying something about the default behavior in the
docs – I am actually inclined to change the default to density
normalization!

I'm not sure what you mean about the "normalization" of the 2D histograms.
The contours are always at percentiles of the sample mass. I guess you
could choose to have the contours defined in terms of numbers but that's
some craziness! I don't want to go there.

I also don't think that you'll ever be able to get the requested behavior
by changing the sample size because the peak height in each panel actually
depends on the bin sizes and the shape of the distribution. That's the
whole reason why it's meaningless to give the "peaks" equal heights!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#86 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AArY96UgWG2yOWh3CblJX2xpOSYQw_3oks5quvmxgaJpZM4KIAYE
.

@sultan-hassan
Copy link
Author

Well, here's plot taken from Greig+Mesinger2015 21cmmc paper where the plot shows different Pdfs with equal heights. I thought this is a good representation to compare different pdfs and might be able to the same with corner....
screen shot 2016-10-11 at 5 53 13 pm

@jtlz2
Copy link

jtlz2 commented Jul 12, 2021

normed is now deprecated in matplotlib - rather use density.

@dfm dfm closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants