Example of plotting points with associated probabilities #102

jbednar · 2016-03-07T22:27:24Z

Currently, datashader's scatterplot/heatmap approach for points data partitions the set of points, allocating each one into non-overlapping pixel-shaped bins. Some types of data come with associated probabilities, such as a known measurement error bound or an estimated uncertainty per point.

It would be good to have an example of how to aggregate such data, such that the value of each datapoint is assigned to multiple bins in the aggregate array, according to some kernel function (e.g. a 2D Gaussian, where errors are specified as stddevs).

For the special case of a square error kernel, this approach is equivalent to implementing support for raster data (see #86), where each raster datapoint represents a specified area of the X,Y plane with equal probability or weighting within that square.

We'll need a suitable dataset of this type, preferably one with widely varying error estimates across the datapoints, such that some points have tight bounds and others are less constrained.

thoth291 · 2016-03-10T02:40:16Z

Thank you, @jbednar .
Two questions.

First:
Will this feature help to crossplot data like this:
X Y VAL
1 1 0.2
2 1 0.3
...
1 2 0.3
2 2 0.4
....
5 5 1.0

Where for each pair (X,Y) there are unique value VAL.
And the result is a scatter plot of these points colored by some mapping of VAL to RGB?

Basically equivalent of

df.plot(kind='scatter', x='X', y='Y', c='VAL', s=50);

Second:
Is (or will be) there any way to define size of the points in datashader?

Thanks!

jbednar · 2016-03-10T05:07:56Z

We're working on making point sizing be more flexible and automatic, and on properly documenting how to do it, but in the meantime you can apply the tf.spread function on your final image, as shown in this notebook:
https://gist.github.com/jcrist/62b366727886561356d8

The code is already available for the application you describe above; just pass the field you want to the appropriate aggregation function:

cvs = ds.Canvas(plot_width=800, plot_height=500, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'X', 'Y', ds.mean('VAL'))
img = tf.interpolate(agg, low="white", high='darkblue', how='linear')

where mean tells datashader that you want to average the VAL of all points falling into that pixel; you could instead take the max, median, etc.

thoth291 · 2016-03-10T06:16:30Z

Thanks, @jbednar .
I was able to colorize my plot - thanks for the example! It was quite easy and my understanding of datashader got more solid!
But it looks like tf.spread is not available (version from conda -c conda) - I guess I need use the github version instead...

jbednar · 2016-03-10T12:28:58Z

Oops, yes -- spread requires the Github master version.

thoth291 · 2016-03-10T20:00:29Z

Thanks,

When I run

import datashader as ds

I get this error:

OSError: [Errno 13] Permission denied: '/opt/dist/anaconda/lib/python2.7/site-packages/datashader-0.1.0-py2.7.egg/datashader/__pycache__'

DatashaderImportError.txt
The reason is that I install this package as system-admin, but I run it as my regular user.
Is there anyway to prohibit any file creations like that in your library? Or at least isolate them so that one user is not affecting other user.

The version from conda -c conda never had this problem.

For now - I just gave rwx permissions for all users to datashader directory and it seems to work.
Other than that - all the features are perfect! Thank you!

P.S. I'm curious if by design of spread API shape + px = mask. Then Why wouldn't you just generalize shape parameter to accept numpy masks and then just ignore px in that case... Or even beter - somehow scale the mask based on px... but I'm just curious - no demanding here :-)

jbednar · 2016-03-10T20:15:16Z

I don't think that issues with __pycache__ would be due to datashader per se, as we don't access that directly ourselves (though it looks like the separate numba library that we use does access it). So I'd assume that there's a different way to install it that would avoid permissions errors, but I don't know how you originally installed it, and thus what change to suggest.

For the shape, we often want to specify a circular mask at different radius values, which the px argument makes easy to do; it would be painful to make a new mask for every px value we wanted to try. Yes, scaling the mask based on the px value would be handy, but there are lots of ways to scale matrices, and so we'd rather leave that up to the user to do based on any of the many libraries available for that.

jcrist · 2016-03-10T20:15:18Z

The reason is that I install this package as system-admin, but I run it as my regular user.
Is there anyway to prohibit any file creations like that in your library? Or at least isolate them so that one user is not affecting other user.

We started caching code compilation in numba, which writes a cache file on first import. I've filed an issue, see numba/numba#1771.

For now, try running python -c "import datashader" with admin privileges after install. This should cause the compilation to happen once (and you have permission to write those files). Subsequent imports should only read the cache, which should be fine.

thoth291 · 2016-03-10T20:22:35Z

That all makes sense!
Thank you for the ticket at numba - I'll watch it.

Nithanaroy · 2018-06-01T03:35:13Z

In the comment above, tf.interpolate is deprecated. The new code would be:

cvs = ds.Canvas(plot_width=800, plot_height=500, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'X', 'Y', ds.mean('VAL'))
img = tf.shade(agg, cmap=["white", 'darkblue'], how='linear')

naavis · 2022-07-28T15:44:03Z

Hi! I have been trying to use this method for plotting data points with associated probabilities/weights, but bumped into something I do not understand. If I pass all zeros values in the column used as the weighting factor, I expect the image to become empty. Yet it does not! Is it a bug or am I misunderstanding something?

Below is minimal code to reproduce it with datashader 0.13.0:

import datashader as ds
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

num_datapoints = 1000
xs = 200 * np.random.rand(num_datapoints)
ys = 200 * np.random.rand(num_datapoints)
weights = np.random.rand(num_datapoints)
# Uncommenting the line below should probably
# result in a black image, yet it doesn't?
# weights = np.zeros((num_datapoints,))

df = pd.DataFrame(np.array([xs, ys, weights]).T, columns=['x', 'y', 'weight'])
cvs = ds.Canvas(plot_width=200, plot_height=200, x_range=(0, 200), y_range=(0, 200))
agg = cvs.points(df, 'x', 'y', ds.sum('weight'))
img = ds.tf.shade(agg, cmap='white')

plt.imshow(img, origin='lower', cmap='gray')
plt.show()

And below is what I see if I uncomment the line that sets all the weights to zero.

In my other work the outputs of cvs.points(df, 'x', 'y', ds.sum('weight')) and a Matplotlib scatter plot with the weights used as colors or sizes look very different at the moment, so maybe I'm misunderstanding how it is supposed to work in Datashader. I assume using the ds.sum('weight') aggregator would make the brightness of each bin/pixel equal to the sum of the weights for data points that land in that bin.

ianthomas23 · 2022-07-29T09:50:25Z

@naavis If you look at the contents of agg when you are using your zero weights you will see that it contains two values, 0 and np.nan. Zeros correspond to where you have data points that has a weight of zero, np.nan where there are no data points. If there is only a single finite data value in agg, it is mapped to the top end of the cmap, hence white.

Secondly, your combination of ds.tf.shade() and plt.imshow() is almost certainly not doing what you want. ds.tf.shade() outputs a 200x200 array containing RGBA values that are encoded into uint32, and if you pass an MxN array to imshow it will treat is as scalar data and apply a colormap. Hence you are applying a colormap twice. I recommend for debug purposes replacing your matplotlib code with a call to ds.util.export_image() and it should all be easier to understand.

Anyway, this is really a usage question and should have been posted to https://discourse.holoviz.org/ rather than being appended to a 6-year old github issue. If you have further questions about this, please could you ask on the discourse instead. Thanks!

naavis · 2022-07-29T10:34:19Z

Thanks, and sorry. This Github issue was the only place I found mentioning using data point specific weights/probabilities with Datashader. The documentation isn't exactly abundant on this:
https://datashader.org/user_guide/Points.html
https://datashader.org/api.html#definitions

I was not aware of the Discourse page. I'll post any further thoughts there.

jbednar assigned jcrist Mar 7, 2016

jbednar added ready wishlist and removed ready labels Mar 10, 2016

jcrist removed their assignment Mar 18, 2016

jbednar mentioned this issue Mar 27, 2017

How to handle discrete data and scale factor #302

Open

jbednar added this to the wishlist milestone Jun 7, 2021

jbednar removed the wishlist label Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example of plotting points with associated probabilities #102

Example of plotting points with associated probabilities #102

jbednar commented Mar 7, 2016 •

edited

Loading

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

jcrist commented Mar 10, 2016

thoth291 commented Mar 10, 2016

Nithanaroy commented Jun 1, 2018

naavis commented Jul 28, 2022 •

edited

Loading

ianthomas23 commented Jul 29, 2022

naavis commented Jul 29, 2022

Example of plotting points with associated probabilities #102

Example of plotting points with associated probabilities #102

Comments

jbednar commented Mar 7, 2016 • edited Loading

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

thoth291 commented Mar 10, 2016

jbednar commented Mar 10, 2016

jcrist commented Mar 10, 2016

thoth291 commented Mar 10, 2016

Nithanaroy commented Jun 1, 2018

naavis commented Jul 28, 2022 • edited Loading

ianthomas23 commented Jul 29, 2022

naavis commented Jul 29, 2022

jbednar commented Mar 7, 2016 •

edited

Loading

naavis commented Jul 28, 2022 •

edited

Loading