Skip to content

[C++][parquet][hadoop]memory leak when read parquet file from hadoop #26606

@asfimport

Description

@asfimport

when I use hdfs interface under arrow/io folder to access and read parquets files, it will lead memory leak. some of example codes blow:

 

std::shared_ptrarrow::io::HadoopFileSystem hdfs = nullptr;

...//set hdfs conf var;

msg = arrow::io::HadoopFileSystem::Connect(&conf, &hdfs);

...//check is connect hdfs or not;

std::shared_ptrarrow::io::HdfsReadableFile file_reader;

...//set file_path var;

arrow::Status ret = hdfs->OpenReadable(file_path, &file_reader);

...//check open success or not;

std::unique_ptrparquet::ParquetFileReader parquet_reader;

parquet_reader = parquet::ParquetFileReader::Open(file_reader);

...//through parquet_reader to get data;

 

And I check when I read parquet file in local path,

static std::unique_ptr OpenFile(
const std::string& path, bool memory_map = true,
const ReaderProperties& props = default_reader_properties(),
std::shared_ptr metadata = NULLPTR);

use OpenFile API, it is OK, will not lead memory leak;

 

is there any wrong step when I use hdfs API? or exist some issues. 

any quetions can reply me, thanks.

Environment: linux
Reporter: yzr

Note: This issue was originally created as ARROW-10650. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions