-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of plotting points with associated probabilities #102
Comments
Thank you, @jbednar . First: Where for each pair (X,Y) there are unique value VAL. Basically equivalent of df.plot(kind='scatter', x='X', y='Y', c='VAL', s=50); Second: Thanks! |
We're working on making point sizing be more flexible and automatic, and on properly documenting how to do it, but in the meantime you can apply the The code is already available for the application you describe above; just pass the field you want to the appropriate aggregation function: cvs = ds.Canvas(plot_width=800, plot_height=500, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'X', 'Y', ds.mean('VAL'))
img = tf.interpolate(agg, low="white", high='darkblue', how='linear') where |
Thanks, @jbednar . |
Oops, yes -- spread requires the Github master version. |
Thanks, When I run import datashader as ds I get this error: OSError: [Errno 13] Permission denied: '/opt/dist/anaconda/lib/python2.7/site-packages/datashader-0.1.0-py2.7.egg/datashader/__pycache__' DatashaderImportError.txt The version from conda -c conda never had this problem. For now - I just gave rwx permissions for all users to datashader directory and it seems to work. P.S. I'm curious if by design of spread API |
I don't think that issues with For the shape, we often want to specify a circular mask at different radius values, which the |
We started caching code compilation in numba, which writes a cache file on first import. I've filed an issue, see numba/numba#1771. For now, try running |
That all makes sense! |
In the comment above,
|
Hi! I have been trying to use this method for plotting data points with associated probabilities/weights, but bumped into something I do not understand. If I pass all zeros values in the column used as the weighting factor, I expect the image to become empty. Yet it does not! Is it a bug or am I misunderstanding something? Below is minimal code to reproduce it with datashader 0.13.0: import datashader as ds
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
num_datapoints = 1000
xs = 200 * np.random.rand(num_datapoints)
ys = 200 * np.random.rand(num_datapoints)
weights = np.random.rand(num_datapoints)
# Uncommenting the line below should probably
# result in a black image, yet it doesn't?
# weights = np.zeros((num_datapoints,))
df = pd.DataFrame(np.array([xs, ys, weights]).T, columns=['x', 'y', 'weight'])
cvs = ds.Canvas(plot_width=200, plot_height=200, x_range=(0, 200), y_range=(0, 200))
agg = cvs.points(df, 'x', 'y', ds.sum('weight'))
img = ds.tf.shade(agg, cmap='white')
plt.imshow(img, origin='lower', cmap='gray')
plt.show() And below is what I see if I uncomment the line that sets all the weights to zero. In my other work the outputs of |
@naavis If you look at the contents of Secondly, your combination of Anyway, this is really a usage question and should have been posted to https://discourse.holoviz.org/ rather than being appended to a 6-year old github issue. If you have further questions about this, please could you ask on the discourse instead. Thanks! |
Thanks, and sorry. This Github issue was the only place I found mentioning using data point specific weights/probabilities with Datashader. The documentation isn't exactly abundant on this: I was not aware of the Discourse page. I'll post any further thoughts there. |
Currently, datashader's scatterplot/heatmap approach for points data partitions the set of points, allocating each one into non-overlapping pixel-shaped bins. Some types of data come with associated probabilities, such as a known measurement error bound or an estimated uncertainty per point.
It would be good to have an example of how to aggregate such data, such that the value of each datapoint is assigned to multiple bins in the aggregate array, according to some kernel function (e.g. a 2D Gaussian, where errors are specified as stddevs).
For the special case of a square error kernel, this approach is equivalent to implementing support for raster data (see #86), where each raster datapoint represents a specified area of the X,Y plane with equal probability or weighting within that square.
We'll need a suitable dataset of this type, preferably one with widely varying error estimates across the datapoints, such that some points have tight bounds and others are less constrained.
The text was updated successfully, but these errors were encountered: