Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing contours from discrete points. #7

Closed
mbostock opened this issue Apr 8, 2017 · 5 comments
Closed

Computing contours from discrete points. #7

mbostock opened this issue Apr 8, 2017 · 5 comments
Assignees

Comments

@mbostock
Copy link
Member

mbostock commented Apr 8, 2017

Like here:

image

A quick guess on how you could do it: create a coarse grid, bin the points into the grid, blur the grid, and then compute the contours. The grid size and the blur radius should presumably be configurable.

@mbostock mbostock self-assigned this Apr 8, 2017
@mbostock
Copy link
Member Author

mbostock commented Apr 8, 2017

Seems like geom_density2d defaults to a grid size of 100×100 (n defaults to 100) and by default uses bandwidth.nrd as the Gaussian kernel. I’m guessing that should map pretty closely to a stack blur radius.

@mbostock
Copy link
Member Author

Released in 1.1!

@curran
Copy link

curran commented Apr 11, 2017

Random thought - A 1D version of this Gaussian kernel would be a really nice addition to D3, possibly as a sibling of d3-array#histogram. I see this was worked on long ago in d3/Add kernel density estimation. PR #143.

@mbostock
Copy link
Member Author

mbostock commented Apr 11, 2017

Here’s a related example (just updated): https://bl.ocks.org/mbostock/4341954

That example computes the kernel density estimation the brute force way: it evaluates the kernel for each data point (n) for each grid point (m) = O(nm). This tends to be prohibitive for estimating two-dimensional density since the number of grid points in two dimensions is much larger (e.g., m = 480×250 = 120,000 instead of m = 100).

One way of making the brute force method faster would be to sort the data, and then keep a moving window of data points whose kernel value is non-zero. Although if your bandwidth is large, then this wouldn’t be a big improvement, since you’d still be evaluating the kernel on a large number of data points for each grid point.

Or you could use the same fast blur technique d3-contour uses for two-dimensional density estimation. This is less accurate because the data points are first rounded to the resolution of the grid before blurring, but often this is acceptable.

@curran
Copy link

curran commented Apr 11, 2017

I really like the idea of blur(histogram(data)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants