-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask array usage #10
Comments
When dask/dask#1838 gets in we'll actually have access to almost all of dask.array. Arithmetic, elemwise, reductions, transpose, tensordot, etc.. The only things that definitely won't work are things like slicing. |
This should now be resolved if you work from dask/master In [1]: import dask.dataframe as dd
In [2]: df = dd.demo.make_timeseries('2000', '2001',
...: {'a': int, 'b': float},
...: freq='10s', partition_freq='7D', seed=1)
In [3]: df.head()
Out[3]:
a b
2000-01-01 00:00:00 1018 -0.023142
2000-01-01 00:00:10 1029 -0.911436
2000-01-01 00:00:20 992 0.898917
2000-01-01 00:00:30 999 -0.579152
2000-01-01 00:00:40 1001 -0.712870
In [4]: df
Out[4]: dd.DataFrame<make-ti..., npartitions=52, divisions=(Timestamp('2000-01-01 00:00:00', freq='7D'), Timestamp('2000-01-08 00:00:00', freq='7D'), Timestamp('2000-01-15 00:00:00', freq='7D'), ..., Timestamp('2000-12-23 00:00:00', freq='7D'), Timestamp('2000-12-30 00:00:00', freq='7D'))>
In [5]: x = df.values
In [6]: x
Out[6]: dask.array<values-..., shape=(nan, 2), dtype=float64, chunksize=(nan, 2)>
In [7]: x.dtype
Out[7]: dtype('float64')
In [8]: y = x.T.dot(x)
In [9]: y
Out[9]: dask.array<sum-a15..., shape=(2, 2), dtype=float64, chunksize=(2, 2)>
In [10]: y.compute()
Out[10]:
array([[ 3.14803663e+12, -5.60514928e+05],
[ -5.60514928e+05, 1.04724887e+06]]) Or if you want a numpy record array: In [13]: df.to_records()
Out[13]: dask.array<to-reco..., shape=(nan,), dtype=(numpy.record, [('index', 'O'), ('a', '<i8'), ('b', '<f8')]), chunksize=(nan,)>
In [14]: df.to_records().dtype
Out[14]: dtype((numpy.record, [('index', 'O'), ('a', '<i8'), ('b', '<f8')])) |
Resolved with https://github.com/moody-marlin/dask-glm/commit/b5f5033d3d2104b346869a7aa37ec8cd3608cdd1, dask Series still throw a few errors which will be fixed shortly (due to shape issues in the algorithms). |
What issues did you run into? |
@mrocklin
Here, It's an edge case that can only occur whenever there's a single input variable, but my plan was just to "thicken" it up and add another dimension to |
Ah, I see. Right, I just checked and it looks like In [22]: s.values.shape
Out[22]: (nan,)
In [23]: x.shape
Out[23]: (1,)
In [24]: s.values[:, None].dot(x)
Out[24]: dask.array<sum-d05..., shape=(nan,), dtype=float64, chunksize=(nan,)> |
Nice, fixed with 2d10f73. Closing this issue! |
Performance improvements; the only CI checks that were failing were flake8 - all tests passed.
Ensure only methods needed for inputs X,y are: .dot() and .T.
The text was updated successfully, but these errors were encountered: