You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But the following code produces the same index as the original one
foo = lambda a, m: pd.DataFrame(np.array([[a**m, a**(2*m), a**(3*m)]] * 3), columns=['m', '2m', '3m'])
tasks = [dask.delayed(foo)(2, m) for m in range(3)]
ddf = dd.from_delayed(tasks)
ddf = ddf.reset_index() # drop=True does not work as well, here False for clarity of the example result
ddf.compute()
yields:
index
m
2m
3m
0
0
1
1
1
1
1
1
1
1
2
2
1
1
1
0
0
2
4
8
1
1
2
4
8
2
2
2
4
8
0
0
4
16
64
1
1
4
16
64
2
2
4
16
64
The text was updated successfully, but these errors were encountered:
Same behavior with read_csv when reading from multiple files
fpaths: list= [...]
ddf=dd.read_csv(fpaths)
ddf=ddf.reset_index(drop=True) # also no effect
CermakM
changed the title
DataFrame reset_index() not working with DataFrames created with from_delayed method
DataFrame reset_index() not working with DataFrames created with from_delayed or read_csv methods
Jul 23, 2018
@CermakM I think this is working correctly. From the docstring:
Note that unlike in pandas, the reset dask.dataframe index will
not be monotonically increasing from 0. Instead, it will restart at 0
for each partition (e.g. index1 = [0, ..., 10], index2 = [0, ...]).
This is due to the inability to statically know the full length of the
index.
ddf has three partitions and thus the reset_index builds 3 [0,1,3] indexes for each partition.
I'm going to close for now as this is quite a bit old but feel free to reopen if you are still having trouble
When creating dataframes from normal pandas dfs, there is no problem with resetting the index:
But the following code produces the same index as the original one
yields:
The text was updated successfully, but these errors were encountered: