Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate bandwidth in x and y directions? + NRD formula #27

Open
jrus opened this issue May 23, 2018 · 3 comments
Open

Separate bandwidth in x and y directions? + NRD formula #27

jrus opened this issue May 23, 2018 · 3 comments

Comments

@jrus
Copy link

jrus commented May 23, 2018

As discussed over at the Observable forum, it might be nice for the bandwidth to accept a 2-entry list or an object with x and y attributes or the like, especially since internally the implementation is already blurring separately in x and y directions.

@mbostock mbostock changed the title separate bandwidth in x and y directions Separate bandwidth in x and y directions? Aug 10, 2019
@Fil
Copy link
Member

Fil commented Jun 15, 2020

I've now implemented this through various test notebooks that are not yet fully ready (coming soon). I'm enthusiastic about the idea of selecting the bandwidth relative to the variance of each dimension, but there are already a few observations I can share:

First, there is an obvious case where it doesn't work: when the data has no variance (maybe it's a single point, or points that show very small variation in y, for some reason). In those cases we wouldn't want the density contours to be flattened to a line. So there's going to be a sort of minimum, which can be set at the (arbitrary) default value of 20 pixels that exists already.

Second, the way it looks is a bit underwhelming. The current strategy creates "circles" around the data, the x/y aspect ratio creates "ellipses" (on purpose). Certainly nicer for statistics, but not as nice on the eye. So, I would not want to have a different aspect ratio with the default bandwidth generator.

Third, the nrd formula returns values that don't coincide with the way we use the given bandwidth. (Currently bandwidth represents, let's say, the radius of 1 iteration of blurring on a 4x grid, whereas in the litterature it's something like the std dev of the gaussian.) In my experiments, the scale factor between these values is about 5.

As a consequence, either we change, and users will have to rescale their hand-tuned bandwidths (my experience with this is that it's always hand-tuned to give a "nice" graph), or we continue with the same "bandwidth" and scale nrd to match what it's supposed to deliver, but its statistical properties are incorrect. Maybe a solution could be to deprecate bandwidth() and replace it with a new name like blur() or something.

@Fil Fil mentioned this issue Jun 15, 2020
7 tasks
@Fil
Copy link
Member

Fil commented Jun 19, 2020

Here's an implementation that seems to work, based on the new d3.blur proposal.
https://observablehq.com/@fil/x-y-bandwidth-for-density-contours

The remarks above still stand.

before
before

after
after

@Fil
Copy link
Member

Fil commented Jul 9, 2020

I figure that as a first step we should ship a version that accepts x/y bandwidths as inputs, and allow experimentations (this depends in turn on d3.blur (d3/d3-array#151).

For the nrd stuff, I'd wait for serious statisticians to test and validate the approach.

@Fil Fil changed the title Separate bandwidth in x and y directions? Separate bandwidth in x and y directions? + NRD formula Jul 9, 2020
@Fil Fil added the idea label Jul 10, 2020
@mbostock mbostock removed the idea label Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants