Skip to content

memory not freed reading python exported compressed feather format  #93

@JesseTsing

Description

@JesseTsing

first of all, greate thanks to the authors and contributers.
I'm looking for fast data load/save solutions between python and julia for a long time, and now i've got rid of csv and database format.
i've tried a lot,but finally i choose feather/arrow. it's now the fastest format for me.

here comes the issue and reproducible codes:

here is python code creating several feather files....

import numpy as np
from pyarrow.feather import write_feather
import os

for i in range(5):
    for comp_type in ['lz4','zstd']:
        fout = 'pydata.{}.feather.{}'.format(i+1,comp_type)
        if os.path.isfile(fout):
            continue
        write_feather(pd.DataFrame(pd.np.random.rand(100000,1000)), fout, compression=comp_type)

and here is repeatly

using Arrow
using DataFrames

@showprogress 1 "read repeatly" for ii in range(1,10)
    i = mod(ii,5)+1
    @time df = copy(DataFrame(Arrow.Table("pydata.$i.feather.zstd"), copycols=true))
    finalize(df)
    sleep(1)
end

and here is the mem size ,nearly up to 16G

PID VSZ RSS
55623 19133564 17207488

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions