Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. #39899

Closed
mderoy opened this issue Feb 1, 2024 · 5 comments

Comments

@mderoy
Copy link

mderoy commented Feb 1, 2024

Describe the usage question you have. Please include as many useful details as possible.

I'm writing a simple program which uses the low-level parquet parser apis parquet::ParquetFileReader... this parser calls PreBuffer with the rowgroups and columns I want to read (along with CachceOptions::Defaults())... I get parquet::ColumnReaders for each column and then loop through those and copy to a buffer so that the data is formed into a row-based format (as opposed to parquet's columnar format).
in this example I'm only parsing booleans using
bool_reader->ReadBatch(1, nullptr, nullptr, (bool*)buf, &values_read);
to write directly to buf.
my total test data is 284K and consists of parquet files which contain 3 boolean columns, so it is very simple and should be fast.

I find that when I benchmark my code against files on my local filesystem it takes about 1.7s to parse the data in this way...but when I give it an S3 file handle (created from arrow's s3) class It takes significantly longer EVEN WITH PREBUFFER SETTINGS

I'm not including the prebuffer in my timings, but here is what I'm seeing
localfilesystem 1.7s
s3 without prebuffer 8.3s
s3 with prebuffer 3.9s (not benchmarking the prebuffer time)

I don't understand why the parsing of localfilesystem would be faster than s3 if I've prebuffered it into memory (remember I'm not including the prebuffer in these timings). I've tried changing the parquet::ReaderProperties buffer size to 20MB (which should fit the whole file in memory) but I can't seem to get equivalent performance with the local filesystem.

Looking for some guidance...I'd really like to be as close to the localfilesystem performance as possible... I want to avoid downloading the whole file, but want the be able to read these prebuffered sections of the file efficiently.

Component(s)

C++, Parquet

@mderoy
Copy link
Author

mderoy commented Feb 1, 2024

one addon comment.. I still find the localfilesystem performance to be pretty slow for such a small amount of data. I see other tickets referencing rates of 150mb/s to 1gb/s #38389 and I'm orders of magnitudes from that :(

@mderoy
Copy link
Author

mderoy commented Feb 2, 2024

I found that I could add a call to WhenBuffered to wait for the data to fully buffer and now I meet the performance of the localfilesystem! I'll still welcome the advice of anyone who can tell me how to increase the performance even further 👍

@mapleFU
Copy link
Member

mapleFU commented Feb 2, 2024

I found that I could add a call to WhenBuffered to wait for the data to fully buffer and now I meet the performance of the localfilesystem!

Firstly I think bool_reader->ReadBatch is a bit dangerous for nullable values. This is unrelated to this issue but I think the caller should understanding the concept of rep-level and def-levels if using the ColumnReader api

3s is so slow, would you mind tell the io pattern you're using? Actually the best pattern is send all io (if memory is enough) and waiting for them to finished, and read the file( or split the request by row-groups)

@mderoy
Copy link
Author

mderoy commented Feb 2, 2024

Firstly I think bool_reader->ReadBatch is a bit dangerous for nullable values

I made the asumption that if values_read == 0 than I've processed a null value for that batch..but I will look into those rep-level and def-level concepts you mention... I've not really tested with nulls yet... I'm not dealing with any complex types like struct/list/map in my parser..mostly the simple primitive types.

3s is so slow, would you mind tell the io pattern you're using? Actually the best pattern is send all io (if memory is enough) and waiting for them to finished, and read the file( or split the request by row-groups)

I got the best (same as local file) performance when I prebuffered all the rowgroups and columns I wanted to read and then called WhenBuffered. We have a good amount of memory available to us. Splitting the request by row-groups would certainly help control memory provided the writer of the file did not write them too large. In my use case I have many processes processing their own files so I do not want to parallelize reading each column with an individual thread. I want one CPU thread to process the parsing of that one file (I know the prebuffering is happening by background threads but ideally this would be done serially as well)

@kou kou changed the title Performance reading S3 based files won't match localfilesystem even with large prebuffering. [C++][Parquet] Performance reading S3 based files won't match localfilesystem even with large prebuffering. Feb 2, 2024
@mderoy
Copy link
Author

mderoy commented Feb 6, 2024

I'm going to close this, as I was able to get equivalent performance in the parsing once I called WhenBuffered.

@mderoy mderoy closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants