-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
when I use hdfs interface under arrow/io folder to access and read parquets files, it will lead memory leak. some of example codes blow:
std::shared_ptrarrow::io::HadoopFileSystem hdfs = nullptr;
...//set hdfs conf var;
msg = arrow::io::HadoopFileSystem::Connect(&conf, &hdfs);
...//check is connect hdfs or not;
std::shared_ptrarrow::io::HdfsReadableFile file_reader;
...//set file_path var;
arrow::Status ret = hdfs->OpenReadable(file_path, &file_reader);
...//check open success or not;
std::unique_ptrparquet::ParquetFileReader parquet_reader;
parquet_reader = parquet::ParquetFileReader::Open(file_reader);
...//through parquet_reader to get data;
And I check when I read parquet file in local path,
static std::unique_ptr OpenFile(
const std::string& path, bool memory_map = true,
const ReaderProperties& props = default_reader_properties(),
std::shared_ptr metadata = NULLPTR);
use OpenFile API, it is OK, will not lead memory leak;
is there any wrong step when I use hdfs API? or exist some issues.
any quetions can reply me, thanks.
Environment: linux
Reporter: yzr
Note: This issue was originally created as ARROW-10650. Please see the migration documentation for further details.