[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

mderoy · 2024-02-01T22:43:20Z

Describe the usage question you have. Please include as many useful details as possible.

I'm writing a simple program which uses the low-level parquet parser apis parquet::ParquetFileReader... this parser calls PreBuffer with the rowgroups and columns I want to read (along with CachceOptions::Defaults())... I get parquet::ColumnReaders for each column and then loop through those and copy to a buffer so that the data is formed into a row-based format (as opposed to parquet's columnar format).
in this example I'm only parsing booleans using
bool_reader->ReadBatch(1, nullptr, nullptr, (bool*)buf, &values_read);
to write directly to buf.
my total test data is 284K and consists of parquet files which contain 3 boolean columns, so it is very simple and should be fast.

I find that when I benchmark my code against files on my local filesystem it takes about 1.7s to parse the data in this way...but when I give it an S3 file handle (created from arrow's s3) class It takes significantly longer EVEN WITH PREBUFFER SETTINGS

I'm not including the prebuffer in my timings, but here is what I'm seeing
localfilesystem 1.7s
s3 without prebuffer 8.3s
s3 with prebuffer 3.9s (not benchmarking the prebuffer time)

I don't understand why the parsing of localfilesystem would be faster than s3 if I've prebuffered it into memory (remember I'm not including the prebuffer in these timings). I've tried changing the parquet::ReaderProperties buffer size to 20MB (which should fit the whole file in memory) but I can't seem to get equivalent performance with the local filesystem.

Looking for some guidance...I'd really like to be as close to the localfilesystem performance as possible... I want to avoid downloading the whole file, but want the be able to read these prebuffered sections of the file efficiently.

Component(s)

C++, Parquet

The text was updated successfully, but these errors were encountered:

mderoy · 2024-02-01T22:55:08Z

one addon comment.. I still find the localfilesystem performance to be pretty slow for such a small amount of data. I see other tickets referencing rates of 150mb/s to 1gb/s #38389 and I'm orders of magnitudes from that :(

mderoy · 2024-02-02T14:34:38Z

I found that I could add a call to WhenBuffered to wait for the data to fully buffer and now I meet the performance of the localfilesystem! I'll still welcome the advice of anyone who can tell me how to increase the performance even further 👍

mapleFU · 2024-02-02T19:09:18Z

I found that I could add a call to WhenBuffered to wait for the data to fully buffer and now I meet the performance of the localfilesystem!

Firstly I think bool_reader->ReadBatch is a bit dangerous for nullable values. This is unrelated to this issue but I think the caller should understanding the concept of rep-level and def-levels if using the ColumnReader api

3s is so slow, would you mind tell the io pattern you're using? Actually the best pattern is send all io (if memory is enough) and waiting for them to finished, and read the file( or split the request by row-groups)

mderoy · 2024-02-02T19:24:56Z

Firstly I think bool_reader->ReadBatch is a bit dangerous for nullable values

I made the asumption that if values_read == 0 than I've processed a null value for that batch..but I will look into those rep-level and def-level concepts you mention... I've not really tested with nulls yet... I'm not dealing with any complex types like struct/list/map in my parser..mostly the simple primitive types.

3s is so slow, would you mind tell the io pattern you're using? Actually the best pattern is send all io (if memory is enough) and waiting for them to finished, and read the file( or split the request by row-groups)

I got the best (same as local file) performance when I prebuffered all the rowgroups and columns I wanted to read and then called WhenBuffered. We have a good amount of memory available to us. Splitting the request by row-groups would certainly help control memory provided the writer of the file did not write them too large. In my use case I have many processes processing their own files so I do not want to parallelize reading each column with an individual thread. I want one CPU thread to process the parsing of that one file (I know the prebuffering is happening by background threads but ideally this would be done serially as well)

mderoy · 2024-02-06T14:53:16Z

I'm going to close this, as I was able to get equivalent performance in the parsing once I called WhenBuffered.

mderoy added the Type: usage Issue is a user question label Feb 1, 2024

github-actions bot added Component: Parquet Component: C++ labels Feb 1, 2024

kou changed the title ~~Performance reading S3 based files won't match localfilesystem even with large prebuffering.~~ [C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. Feb 2, 2024

mderoy closed this as completed Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

mderoy commented Feb 1, 2024

mderoy commented Feb 1, 2024

mderoy commented Feb 2, 2024

mapleFU commented Feb 2, 2024

mderoy commented Feb 2, 2024

mderoy commented Feb 6, 2024

[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

Comments

mderoy commented Feb 1, 2024

Describe the usage question you have. Please include as many useful details as possible.

Component(s)

mderoy commented Feb 1, 2024

mderoy commented Feb 2, 2024

mapleFU commented Feb 2, 2024

mderoy commented Feb 2, 2024

mderoy commented Feb 6, 2024