-
Notifications
You must be signed in to change notification settings - Fork 69
Closed
Description
first of all, greate thanks to the authors and contributers.
I'm looking for fast data load/save solutions between python and julia for a long time, and now i've got rid of csv and database format.
i've tried a lot,but finally i choose feather/arrow. it's now the fastest format for me.
here comes the issue and reproducible codes:
here is python code creating several feather files....
import numpy as np
from pyarrow.feather import write_feather
import os
for i in range(5):
for comp_type in ['lz4','zstd']:
fout = 'pydata.{}.feather.{}'.format(i+1,comp_type)
if os.path.isfile(fout):
continue
write_feather(pd.DataFrame(pd.np.random.rand(100000,1000)), fout, compression=comp_type)
and here is repeatly
using Arrow
using DataFrames
@showprogress 1 "read repeatly" for ii in range(1,10)
i = mod(ii,5)+1
@time df = copy(DataFrame(Arrow.Table("pydata.$i.feather.zstd"), copycols=true))
finalize(df)
sleep(1)
end
and here is the mem size ,nearly up to 16G
| PID | VSZ | RSS |
|---|---|---|
| 55623 | 19133564 | 17207488 |
Metadata
Metadata
Assignees
Labels
No labels