Skip to content

Memory Consumption of CSV.Rows with ZipFile #997

@thedacheng

Description

@thedacheng

Hi, I am trying to use CSV.Rows to iterate through a zipped text file which is about 3.4GB uncompressed. It seems CSV.Rows is hold a large chunk of memory (about double the size of the file), which defeats the purpose.
If I load the unzipped text file then I don't see this problem. It seems this line of the code in utils.jl buffer_to_tempfile function allocated memory which isn't freed. I tried to set stream and output to nothing, and the program is holding memory about the size of the file (instead of double).

I am not sure if this is a problem with how I am using ZipFile, CSV.

using CSV, ZipFile;
z=ZipFile.Reader("test.zip")
r=z.files[1]
a= CSV.Rows(r; types=[Int64, String, Float64, Float64, Int8, Float64, Float64, Int64])
for c in a
    # custom aggregation code on c
end

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions