Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41760: [C++][Parquet] Add file metadata read/write benchmark #41761

Merged
merged 3 commits into from
May 22, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented May 21, 2024

Following the discussions on the Parquet ML (see this thread and this thread), and the various complaints about poor Parquet metadata performance on wide schemas, this adds a benchmark to measure the overhead of Parquet file metadata parsing or serialization for different numbers of row groups and columns.

Sample output:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
WriteFileMetadataAndData/num_columns:1/num_row_groups:1           11743 ns        11741 ns        59930 data_size=54 file_size=290 items_per_second=85.1726k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:100        843137 ns       842920 ns          832 data_size=5.4k file_size=20.486k items_per_second=1.18635k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:1000      8232304 ns      8230294 ns           85 data_size=54k file_size=207.687k items_per_second=121.502/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1         101214 ns       101190 ns         6910 data_size=540 file_size=2.11k items_per_second=9.8824k/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:100      8026185 ns      8024361 ns           87 data_size=54k file_size=193.673k items_per_second=124.621/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1000    81370293 ns     81343455 ns            8 data_size=540k file_size=1.94392M items_per_second=12.2936/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1        955862 ns       955528 ns          733 data_size=5.4k file_size=20.694k items_per_second=1.04654k/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:100    80115516 ns     80086117 ns            9 data_size=540k file_size=1.94729M items_per_second=12.4866/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1000  856428565 ns    856065370 ns            1 data_size=5.4M file_size=19.7673M items_per_second=1.16814/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:1      9330003 ns      9327439 ns           75 data_size=54k file_size=211.499k items_per_second=107.211/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:100  834609159 ns    834354590 ns            1 data_size=5.4M file_size=19.9623M items_per_second=1.19853/s

ReadFileMetadata/num_columns:1/num_row_groups:1                    3824 ns         3824 ns       182381 data_size=54 file_size=290 items_per_second=261.518k/s
ReadFileMetadata/num_columns:1/num_row_groups:100                 88519 ns        88504 ns         7879 data_size=5.4k file_size=20.486k items_per_second=11.299k/s
ReadFileMetadata/num_columns:1/num_row_groups:1000               849558 ns       849391 ns          825 data_size=54k file_size=207.687k items_per_second=1.17731k/s
ReadFileMetadata/num_columns:10/num_row_groups:1                  19918 ns        19915 ns        35449 data_size=540 file_size=2.11k items_per_second=50.2138k/s
ReadFileMetadata/num_columns:10/num_row_groups:100               715822 ns       715667 ns          975 data_size=54k file_size=193.673k items_per_second=1.3973k/s
ReadFileMetadata/num_columns:10/num_row_groups:1000             7017008 ns      7015432 ns          100 data_size=540k file_size=1.94392M items_per_second=142.543/s
ReadFileMetadata/num_columns:100/num_row_groups:1                175988 ns       175944 ns         3958 data_size=5.4k file_size=20.694k items_per_second=5.68363k/s
ReadFileMetadata/num_columns:100/num_row_groups:100             6814382 ns      6812781 ns          103 data_size=540k file_size=1.94729M items_per_second=146.783/s
ReadFileMetadata/num_columns:100/num_row_groups:1000           77858645 ns     77822157 ns            9 data_size=5.4M file_size=19.7673M items_per_second=12.8498/s
ReadFileMetadata/num_columns:1000/num_row_groups:1              1670001 ns      1669563 ns          419 data_size=54k file_size=211.499k items_per_second=598.959/s
ReadFileMetadata/num_columns:1000/num_row_groups:100           77339599 ns     77292924 ns            9 data_size=5.4M file_size=19.9623M items_per_second=12.9378/s

@pitrou pitrou requested a review from wgtmac as a code owner May 21, 2024 15:27
@pitrou pitrou requested a review from mapleFU May 21, 2024 15:27
@pitrou
Copy link
Member Author

pitrou commented May 21, 2024

@mapleFU @emkornfield FYI

@pitrou
Copy link
Member Author

pitrou commented May 21, 2024

Benchmark results here. We see that performance is O(num_columns * num_row_groups).

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
WriteMetadata/num_columns:1/num_row_groups:1            11493 ns        11491 ns        61331 file_size=459 items_per_second=87.0256k/s
WriteMetadata/num_columns:1/num_row_groups:100         820026 ns       819815 ns          854 file_size=37.383k items_per_second=1.21979k/s
WriteMetadata/num_columns:1/num_row_groups:1000       8024208 ns      8022519 ns           87 file_size=374.885k items_per_second=124.649/s
WriteMetadata/num_columns:10/num_row_groups:1           98586 ns        98558 ns         7083 file_size=3.762k items_per_second=10.1463k/s
WriteMetadata/num_columns:10/num_row_groups:100       7816090 ns      7814200 ns           89 file_size=358.835k items_per_second=127.972/s
WriteMetadata/num_columns:10/num_row_groups:1000     79490918 ns     79462535 ns            8 file_size=3.614M items_per_second=12.5845/s
WriteMetadata/num_columns:100/num_row_groups:1         932833 ns       932560 ns          759 file_size=37.352k items_per_second=1.07232k/s
WriteMetadata/num_columns:100/num_row_groups:100     78799934 ns     78771226 ns            9 file_size=3.61693M items_per_second=12.695/s
WriteMetadata/num_columns:100/num_row_groups:1000   857600506 ns    857330657 ns            1 file_size=36.2887M items_per_second=1.16641/s
WriteMetadata/num_columns:1000/num_row_groups:1       9051274 ns      9049407 ns           77 file_size=376.655k items_per_second=110.504/s
WriteMetadata/num_columns:1000/num_row_groups:100   827747343 ns    827468643 ns            1 file_size=36.4815M items_per_second=1.20851/s
WriteMetadata/num_columns:10000/num_row_groups:1     95165920 ns     95125167 ns            7 file_size=3.82213M items_per_second=10.5125/s
WriteMetadata/num_columns:10000/num_row_groups:100 8698273757 ns   8693696946 ns            1 file_size=369.089M items_per_second=0.115026/s

ReadMetadata/num_columns:1/num_row_groups:1              3767 ns         3766 ns       185550 file_size=459 items_per_second=265.553k/s
ReadMetadata/num_columns:1/num_row_groups:100           87250 ns        87235 ns         8004 file_size=37.383k items_per_second=11.4633k/s
ReadMetadata/num_columns:1/num_row_groups:1000         831546 ns       831380 ns          842 file_size=374.885k items_per_second=1.20282k/s
ReadMetadata/num_columns:10/num_row_groups:1            19477 ns        19474 ns        35220 file_size=3.762k items_per_second=51.3513k/s
ReadMetadata/num_columns:10/num_row_groups:100         698405 ns       698268 ns          994 file_size=358.835k items_per_second=1.43211k/s
ReadMetadata/num_columns:10/num_row_groups:1000       6841245 ns      6839685 ns          102 file_size=3.614M items_per_second=146.206/s
ReadMetadata/num_columns:100/num_row_groups:1          174932 ns       174898 ns         3979 file_size=37.352k items_per_second=5.71763k/s
ReadMetadata/num_columns:100/num_row_groups:100       6640500 ns      6638581 ns          105 file_size=3.61693M items_per_second=150.635/s
ReadMetadata/num_columns:100/num_row_groups:1000     75471970 ns     75433100 ns            9 file_size=36.2887M items_per_second=13.2568/s
ReadMetadata/num_columns:1000/num_row_groups:1        1671059 ns      1670522 ns          421 file_size=376.655k items_per_second=598.615/s
ReadMetadata/num_columns:1000/num_row_groups:100     74756894 ns     74713295 ns            9 file_size=36.4815M items_per_second=13.3845/s
ReadMetadata/num_columns:10000/num_row_groups:1      17139819 ns     17132091 ns           41 file_size=3.82213M items_per_second=58.37/s
ReadMetadata/num_columns:10000/num_row_groups:100   774363229 ns    773118912 ns            1 file_size=369.089M items_per_second=1.29346/s

Comment on lines 61 to 62
writer_properties_ =
prop_builder.version(ParquetVersion::PARQUET_2_6)->disable_dictionary()->build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we later add benchmark for drop statistics? ( this is useful in some case )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(besides, should we later add benchmark for byte-array statistics decoding ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "drop statistics"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(decrypt can also be added later)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(besides, should we later add benchmark for byte-array statistics decoding ?)

We can add benchmarks for whatever may be a performance bottleneck :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "drop statistics"?

By default, parquet enable "statistics" for column, this will add a Statistics in column-chunk. We may add benchmark for droping the "statistics". And for ByteArray, this might be a bit long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can add it in further patches, this could be a good start

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stats are already encoded so I think it is pretty trivial. But may be different if it is BYTE_ARRAY type with large strings.

That'd be valuable, but here we want to benchmark metadata cost in isolation of other factors?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea to also try this with the statistics dropped since these do have a big impact and have been heavily discussed. That is to say, run the numbers with builder.disable_statistics(). Based on previous profiling I have done in #39676 this will cut the metadata read time by about 1/3. Still the main conclusion that the scaling is O(num_columns * num_row_groups) should still hold. @pitrou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, thanks a bunch for going to the effort to create these benchmarks! I think they are very helpful for the discussion!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 21, 2024
benchmark.ReadFile(contents);
}
state.SetItemsProcessed(state.iterations());
state.counters["file_size"] = static_cast<double>(contents->size());
Copy link
Member

@mapleFU mapleFU May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add metadata()->SerializeToString().size() as metadata size here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What additional information would it give?

Copy link
Member

@mapleFU mapleFU May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File has been write with different batches with multiple columns. The file_size might also changed if encoding changed. Instead of batch-size, shouldn't we focus more on file metadata thrift size / runtrip, since no "data" is being read, and here we only benchmark reading metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File has been write with different batches with multiple columns.

Varying the number of columns is important here.

The file_size might also changed if encoding changed.

Did you check the data? It's tiny (one INT32 value per column chunk). The entire file will consist of metadata overhead.

Copy link
Member

@mapleFU mapleFU May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check the data? It's tiny (one INT32 value per column chunk). The entire file will consist of metadata overhead.

I'll download one. I think we'd better get "concrete" size of footer, but we can choose current impl if you insist

( Besides, footer read by default would be 64KiB, if the footer is greater than 64KiB, a more around of read and parse might be run here)

Copy link
Member Author

@pitrou pitrou May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Metadata" is not only the footer, it's all the Thrift overhead + other signaling : page headers, statistics, column chunk metadata...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right

    auto reader = ParquetFileReader::Open(source, props);
    auto metadata = reader->metadata();

Any way, this only reads from footer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concern from @mapleFU is that the write and read paths in this benchmark are not identical. On the read path, only file footer is loaded. However, there are more work done on the write path (e.g. encoding and writing page headers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that's a good point. I can try to rewrite the write benchmark so as to only measure the metadata write part of writing a file, though that's dependent on where exactly it happens in the Parquet API calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm not sure it makes sense, as metadata and data are quite intertwined in practice (why would we not measure page header serialization?).

The reason the benchmarks are not symmetrical is that reading and writing is not symmetrical. You have to write an entire Parquet file at once (there's no practical way to avoid that), while you can read a Parquet file piecewise or partially.

I can rename the write benchmark to something else if that's less misleading...

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 21, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels May 21, 2024
@pitrou
Copy link
Member Author

pitrou commented May 22, 2024

Besides changing the benchmark names, I've fixed a bug where the benchmark was writing too many rows. Updated benchmark numbers:

------------------------------------------------------------------------------------------------------------------------
Benchmark                                                              Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------
WriteFileMetadataAndData/num_columns:1/num_row_groups:1            11753 ns        11751 ns        60000 data_size=54 file_size=290 items_per_second=85.0979k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:100         826599 ns       826369 ns          845 data_size=5.4k file_size=20.486k items_per_second=1.21011k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:1000       8125279 ns      8123486 ns           86 data_size=54k file_size=207.687k items_per_second=123.1/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1          100357 ns       100329 ns         6982 data_size=540 file_size=2.11k items_per_second=9.96719k/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:100       7913763 ns      7911870 ns           89 data_size=54k file_size=193.673k items_per_second=126.392/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1000     80283750 ns     80259596 ns            8 data_size=540k file_size=1.94392M items_per_second=12.4596/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1         937374 ns       937047 ns          745 data_size=5.4k file_size=20.694k items_per_second=1.06718k/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:100     78617577 ns     78590896 ns            9 data_size=540k file_size=1.94729M items_per_second=12.7241/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1000   847927859 ns    847245720 ns            1 data_size=5.4M file_size=19.7673M items_per_second=1.1803/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:1       9169263 ns      9167280 ns           76 data_size=54k file_size=211.499k items_per_second=109.084/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:100   826783518 ns    826541970 ns            1 data_size=5.4M file_size=19.9623M items_per_second=1.20986/s
WriteFileMetadataAndData/num_columns:10000/num_row_groups:1     97516452 ns     97475073 ns            7 data_size=540k file_size=2.15317M items_per_second=10.259/s
WriteFileMetadataAndData/num_columns:10000/num_row_groups:100 8294650136 ns   8292024551 ns            1 data_size=54M file_size=201.916M items_per_second=0.120598/s

ReadFileMetadata/num_columns:1/num_row_groups:1                     3669 ns         3668 ns       190349 data_size=54 file_size=290 items_per_second=272.625k/s
ReadFileMetadata/num_columns:1/num_row_groups:100                  84829 ns        84814 ns         8322 data_size=5.4k file_size=20.486k items_per_second=11.7905k/s
ReadFileMetadata/num_columns:1/num_row_groups:1000                814708 ns       814559 ns          858 data_size=54k file_size=207.687k items_per_second=1.22766k/s
ReadFileMetadata/num_columns:10/num_row_groups:1                   19247 ns        19243 ns        36787 data_size=540 file_size=2.11k items_per_second=51.9679k/s
ReadFileMetadata/num_columns:10/num_row_groups:100                690304 ns       690181 ns         1006 data_size=54k file_size=193.673k items_per_second=1.4489k/s
ReadFileMetadata/num_columns:10/num_row_groups:1000              6723370 ns      6721934 ns          105 data_size=540k file_size=1.94392M items_per_second=148.767/s
ReadFileMetadata/num_columns:100/num_row_groups:1                 171196 ns       171161 ns         4077 data_size=5.4k file_size=20.694k items_per_second=5.84244k/s
ReadFileMetadata/num_columns:100/num_row_groups:100              6585073 ns      6583501 ns          106 data_size=540k file_size=1.94729M items_per_second=151.895/s
ReadFileMetadata/num_columns:100/num_row_groups:1000            75581479 ns     75544415 ns            9 data_size=5.4M file_size=19.7673M items_per_second=13.2372/s
ReadFileMetadata/num_columns:1000/num_row_groups:1               1631872 ns      1631517 ns          435 data_size=54k file_size=211.499k items_per_second=612.926/s
ReadFileMetadata/num_columns:1000/num_row_groups:100            75144953 ns     75104316 ns            9 data_size=5.4M file_size=19.9623M items_per_second=13.3148/s
ReadFileMetadata/num_columns:10000/num_row_groups:1             16866177 ns     16857406 ns           41 data_size=540k file_size=2.15317M items_per_second=59.3211/s
ReadFileMetadata/num_columns:10000/num_row_groups:100          780744719 ns    779370723 ns            1 data_size=54M file_size=201.916M items_per_second=1.28309/s

@m4rs-mt
Copy link

m4rs-mt commented May 22, 2024

@pitrou, thank you very much for your work on these benchmarks. They actually confirm the observation we made months ago. Great job! 👍

@pitrou
Copy link
Member Author

pitrou commented May 22, 2024

I'll merge once CI is green.

@pitrou pitrou merged commit f3d4639 into apache:main May 22, 2024
32 of 33 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label May 22, 2024
@pitrou pitrou deleted the gh41760-pq-md-benchmark branch May 22, 2024 13:09
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit f3d4639.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 12 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…apache#41761)

Following the discussions on the Parquet ML (see [this thread](https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo) and [this thread](https://lists.apache.org/thread/vs3w2z5bk6s3c975rrkqdttr1dpsdn7h)), and the various complaints about poor Parquet metadata performance on wide schemas, this adds a benchmark to measure the overhead of Parquet file metadata parsing or serialization for different numbers of row groups and columns.

Sample output:
```
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
WriteFileMetadataAndData/num_columns:1/num_row_groups:1           11743 ns        11741 ns        59930 data_size=54 file_size=290 items_per_second=85.1726k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:100        843137 ns       842920 ns          832 data_size=5.4k file_size=20.486k items_per_second=1.18635k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:1000      8232304 ns      8230294 ns           85 data_size=54k file_size=207.687k items_per_second=121.502/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1         101214 ns       101190 ns         6910 data_size=540 file_size=2.11k items_per_second=9.8824k/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:100      8026185 ns      8024361 ns           87 data_size=54k file_size=193.673k items_per_second=124.621/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1000    81370293 ns     81343455 ns            8 data_size=540k file_size=1.94392M items_per_second=12.2936/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1        955862 ns       955528 ns          733 data_size=5.4k file_size=20.694k items_per_second=1.04654k/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:100    80115516 ns     80086117 ns            9 data_size=540k file_size=1.94729M items_per_second=12.4866/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1000  856428565 ns    856065370 ns            1 data_size=5.4M file_size=19.7673M items_per_second=1.16814/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:1      9330003 ns      9327439 ns           75 data_size=54k file_size=211.499k items_per_second=107.211/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:100  834609159 ns    834354590 ns            1 data_size=5.4M file_size=19.9623M items_per_second=1.19853/s

ReadFileMetadata/num_columns:1/num_row_groups:1                    3824 ns         3824 ns       182381 data_size=54 file_size=290 items_per_second=261.518k/s
ReadFileMetadata/num_columns:1/num_row_groups:100                 88519 ns        88504 ns         7879 data_size=5.4k file_size=20.486k items_per_second=11.299k/s
ReadFileMetadata/num_columns:1/num_row_groups:1000               849558 ns       849391 ns          825 data_size=54k file_size=207.687k items_per_second=1.17731k/s
ReadFileMetadata/num_columns:10/num_row_groups:1                  19918 ns        19915 ns        35449 data_size=540 file_size=2.11k items_per_second=50.2138k/s
ReadFileMetadata/num_columns:10/num_row_groups:100               715822 ns       715667 ns          975 data_size=54k file_size=193.673k items_per_second=1.3973k/s
ReadFileMetadata/num_columns:10/num_row_groups:1000             7017008 ns      7015432 ns          100 data_size=540k file_size=1.94392M items_per_second=142.543/s
ReadFileMetadata/num_columns:100/num_row_groups:1                175988 ns       175944 ns         3958 data_size=5.4k file_size=20.694k items_per_second=5.68363k/s
ReadFileMetadata/num_columns:100/num_row_groups:100             6814382 ns      6812781 ns          103 data_size=540k file_size=1.94729M items_per_second=146.783/s
ReadFileMetadata/num_columns:100/num_row_groups:1000           77858645 ns     77822157 ns            9 data_size=5.4M file_size=19.7673M items_per_second=12.8498/s
ReadFileMetadata/num_columns:1000/num_row_groups:1              1670001 ns      1669563 ns          419 data_size=54k file_size=211.499k items_per_second=598.959/s
ReadFileMetadata/num_columns:1000/num_row_groups:100           77339599 ns     77292924 ns            9 data_size=5.4M file_size=19.9623M items_per_second=12.9378/s
```

* GitHub Issue: apache#41760

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants