Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 19, 2025

Which issue does this PR close?

Rationale for this change

Get latest and greatest code from arrow

What changes are included in this PR?

  1. Update to Arrow 57.1.0
  2. Update for API changes (comments inline)

TODO:

  • Add parquet setting to control the filter representation (to allow backwards compat)
  • Add a note to the upgrade guide about the force_filter_selections

Are these changes tested?

Yes, by CI

Are there any user-facing changes?

No

| alltypes_plain.parquet | 1851 | 6957 | 2 | page_index=false |
| alltypes_tiny_pages.parquet | 454233 | 267014 | 2 | page_index=true |
| lz4_raw_compressed_larger.parquet | 380836 | 996 | 2 | page_index=false |
| alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metadata didn't actually get bigger, we just actually included the encryption information (better reporting)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it more and I think the size growth is a bug. See

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/upgrade_arrow_57.1.0 (840487e) to 6d9ab45 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb alamb mentioned this pull request Nov 20, 2025
13 tasks
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 20, 2025
@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖: Benchmark completed

Details

Comparing HEAD and alamb_upgrade_arrow_57.1.0
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2669.04 ms │               2708.77 ms │    no change │
│ QQuery 1     │  1235.58 ms │               1311.33 ms │ 1.06x slower │
│ QQuery 2     │  2388.97 ms │               2475.47 ms │    no change │
│ QQuery 3     │  1206.27 ms │               1200.33 ms │    no change │
│ QQuery 4     │  2326.86 ms │               2244.02 ms │    no change │
│ QQuery 5     │ 28556.87 ms │              28558.86 ms │    no change │
│ QQuery 6     │  4095.23 ms │               3958.01 ms │    no change │
│ QQuery 7     │  3903.82 ms │               3868.19 ms │    no change │
└──────────────┴─────────────┴──────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 46382.65ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 46324.97ms │
│ Average Time (HEAD)                     │  5797.83ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  5790.62ms │
│ Queries Faster                          │          0 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │          7 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.11 ms │                  2.51 ms │  1.19x slower │
│ QQuery 1     │    49.63 ms │                 49.65 ms │     no change │
│ QQuery 2     │   137.94 ms │                134.20 ms │     no change │
│ QQuery 3     │   163.23 ms │                154.87 ms │ +1.05x faster │
│ QQuery 4     │  1087.29 ms │               1111.76 ms │     no change │
│ QQuery 5     │  1490.99 ms │               1535.73 ms │     no change │
│ QQuery 6     │     2.22 ms │                  2.19 ms │     no change │
│ QQuery 7     │    54.53 ms │                 54.27 ms │     no change │
│ QQuery 8     │  1489.20 ms │               1499.13 ms │     no change │
│ QQuery 9     │  1864.60 ms │               1878.47 ms │     no change │
│ QQuery 10    │   375.25 ms │                386.99 ms │     no change │
│ QQuery 11    │   428.15 ms │                440.85 ms │     no change │
│ QQuery 12    │  1369.91 ms │               1379.43 ms │     no change │
│ QQuery 13    │  2122.32 ms │               2132.63 ms │     no change │
│ QQuery 14    │  1291.97 ms │               1313.81 ms │     no change │
│ QQuery 15    │  1261.86 ms │               1267.39 ms │     no change │
│ QQuery 16    │  2719.83 ms │               2737.71 ms │     no change │
│ QQuery 17    │  2710.92 ms │               2742.06 ms │     no change │
│ QQuery 18    │  5919.25 ms │               5077.25 ms │ +1.17x faster │
│ QQuery 19    │   126.87 ms │                120.96 ms │     no change │
│ QQuery 20    │  2104.85 ms │               1933.69 ms │ +1.09x faster │
│ QQuery 21    │  2406.78 ms │               2211.99 ms │ +1.09x faster │
│ QQuery 22    │  4076.54 ms │               3818.94 ms │ +1.07x faster │
│ QQuery 23    │ 12929.23 ms │              12607.37 ms │     no change │
│ QQuery 24    │   211.40 ms │                207.20 ms │     no change │
│ QQuery 25    │   484.28 ms │                477.07 ms │     no change │
│ QQuery 26    │   222.10 ms │                205.52 ms │ +1.08x faster │
│ QQuery 27    │  2839.44 ms │               2746.85 ms │     no change │
│ QQuery 28    │ 23650.44 ms │              23486.62 ms │     no change │
│ QQuery 29    │   970.57 ms │                986.64 ms │     no change │
│ QQuery 30    │  1358.29 ms │               1355.65 ms │     no change │
│ QQuery 31    │  1399.94 ms │               1375.33 ms │     no change │
│ QQuery 32    │  5469.96 ms │               4955.15 ms │ +1.10x faster │
│ QQuery 33    │  6330.13 ms │               5881.95 ms │ +1.08x faster │
│ QQuery 34    │  6616.67 ms │               6409.89 ms │     no change │
│ QQuery 35    │  2074.80 ms │               2052.74 ms │     no change │
│ QQuery 36    │   119.25 ms │                116.62 ms │     no change │
│ QQuery 37    │    51.77 ms │                 51.71 ms │     no change │
│ QQuery 38    │   119.58 ms │                115.50 ms │     no change │
│ QQuery 39    │   199.66 ms │                189.24 ms │ +1.06x faster │
│ QQuery 40    │    44.56 ms │                 42.01 ms │ +1.06x faster │
│ QQuery 41    │    38.38 ms │                 38.30 ms │     no change │
│ QQuery 42    │    31.94 ms │                 31.93 ms │     no change │
└──────────────┴─────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 98418.61ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 95319.78ms │
│ Average Time (HEAD)                     │  2288.80ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  2216.74ms │
│ Queries Faster                          │         10 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │         32 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 139.21 ms │                130.93 ms │ +1.06x faster │
│ QQuery 2     │  29.26 ms │                 29.13 ms │     no change │
│ QQuery 3     │  38.71 ms │                 34.79 ms │ +1.11x faster │
│ QQuery 4     │  29.41 ms │                 28.65 ms │     no change │
│ QQuery 5     │  87.42 ms │                 88.35 ms │     no change │
│ QQuery 6     │  19.55 ms │                 20.18 ms │     no change │
│ QQuery 7     │ 228.31 ms │                227.00 ms │     no change │
│ QQuery 8     │  34.20 ms │                 32.54 ms │     no change │
│ QQuery 9     │  97.79 ms │                110.99 ms │  1.14x slower │
│ QQuery 10    │  64.27 ms │                 63.16 ms │     no change │
│ QQuery 11    │  17.23 ms │                 17.26 ms │     no change │
│ QQuery 12    │  52.89 ms │                 51.90 ms │     no change │
│ QQuery 13    │  46.75 ms │                 46.20 ms │     no change │
│ QQuery 14    │  14.19 ms │                 13.74 ms │     no change │
│ QQuery 15    │  25.06 ms │                 24.65 ms │     no change │
│ QQuery 16    │  25.08 ms │                 25.22 ms │     no change │
│ QQuery 17    │ 147.82 ms │                153.69 ms │     no change │
│ QQuery 18    │ 307.83 ms │                284.87 ms │ +1.08x faster │
│ QQuery 19    │  37.51 ms │                 38.95 ms │     no change │
│ QQuery 20    │  49.58 ms │                 49.73 ms │     no change │
│ QQuery 21    │ 334.74 ms │                321.67 ms │     no change │
│ QQuery 22    │  20.67 ms │                 20.62 ms │     no change │
└──────────────┴───────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 1847.47ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 1814.23ms │
│ Average Time (HEAD)                     │   83.98ms │
│ Average Time (alamb_upgrade_arrow_57.1) │   82.46ms │
│ Queries Faster                          │         3 │
│ Queries Slower                          │         1 │
│ Queries with No Change                  │        18 │
│ Queries with Failure                    │         0 │
└─────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/upgrade_arrow_57.1.0 (fafb102) to 7d8b860 diff using: clickbench_pushdown
Results will be posted here when complete

query TTT
select arrow_typeof(column1), arrow_typeof(column2), arrow_typeof(column3) from arrays;
----
List(nullable List(nullable Int64)) List(nullable Float64) List(nullable Utf8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the DataType parsing code did not handle this syntax (it only supported List(Float64)). We have now made the display and parsing consistent, see apache/arrow-rs#8649 (comment) for background and details

| alltypes_plain.parquet | 1851 | 6957 | 2 | page_index=false |
| alltypes_tiny_pages.parquet | 454233 | 267014 | 2 | page_index=true |
| lz4_raw_compressed_larger.parquet | 380836 | 996 | 2 | page_index=false |
| alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it more and I think the size growth is a bug. See

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖: Benchmark completed

Details

Comparing HEAD and alamb_upgrade_arrow_57.1.0
--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.19 ms │                  2.67 ms │  1.22x slower │
│ QQuery 1     │    53.32 ms │                 51.49 ms │     no change │
│ QQuery 2     │   141.71 ms │                134.25 ms │ +1.06x faster │
│ QQuery 3     │   167.28 ms │                156.44 ms │ +1.07x faster │
│ QQuery 4     │  1084.35 ms │               1106.65 ms │     no change │
│ QQuery 5     │  1556.95 ms │               1490.85 ms │     no change │
│ QQuery 6     │     2.16 ms │                  2.32 ms │  1.07x slower │
│ QQuery 7     │    74.61 ms │                 66.99 ms │ +1.11x faster │
│ QQuery 8     │  1486.06 ms │               1416.15 ms │     no change │
│ QQuery 9     │  1893.82 ms │               1877.58 ms │     no change │
│ QQuery 10    │   480.16 ms │                496.91 ms │     no change │
│ QQuery 11    │   557.20 ms │                549.42 ms │     no change │
│ QQuery 12    │  1608.14 ms │               1537.86 ms │     no change │
│ QQuery 13    │  2579.34 ms │               2322.88 ms │ +1.11x faster │
│ QQuery 14    │  1693.59 ms │               1457.10 ms │ +1.16x faster │
│ QQuery 15    │  1298.92 ms │               1255.58 ms │     no change │
│ QQuery 16    │  2729.63 ms │               2662.03 ms │     no change │
│ QQuery 17    │  2739.17 ms │               2653.09 ms │     no change │
│ QQuery 18    │  5301.07 ms │               4998.57 ms │ +1.06x faster │
│ QQuery 19    │   149.56 ms │                139.48 ms │ +1.07x faster │
│ QQuery 20    │  2047.96 ms │               1894.37 ms │ +1.08x faster │
│ QQuery 21    │  2453.34 ms │               2307.63 ms │ +1.06x faster │
│ QQuery 22    │  4124.78 ms │               3992.67 ms │     no change │
│ QQuery 23    │  1144.82 ms │               1083.87 ms │ +1.06x faster │
│ QQuery 24    │   258.15 ms │                248.11 ms │     no change │
│ QQuery 25    │   676.36 ms │                648.55 ms │     no change │
│ QQuery 26    │   358.14 ms │                343.48 ms │     no change │
│ QQuery 27    │  3125.24 ms │               3006.62 ms │     no change │
│ QQuery 28    │ 23975.91 ms │              23762.45 ms │     no change │
│ QQuery 29    │   961.14 ms │                989.91 ms │     no change │
│ QQuery 30    │  2163.07 ms │               1380.73 ms │ +1.57x faster │
│ QQuery 31    │  2089.33 ms │               1351.52 ms │ +1.55x faster │
│ QQuery 32    │  4853.46 ms │               4935.63 ms │     no change │
│ QQuery 33    │  6051.64 ms │               5677.56 ms │ +1.07x faster │
│ QQuery 34    │  6305.89 ms │               5969.09 ms │ +1.06x faster │
│ QQuery 35    │  1936.33 ms │               1863.02 ms │     no change │
│ QQuery 36    │    26.21 ms │                 26.35 ms │     no change │
│ QQuery 37    │    26.08 ms │                 25.69 ms │     no change │
│ QQuery 38    │    25.20 ms │                 25.25 ms │     no change │
│ QQuery 39    │    25.50 ms │                 26.15 ms │     no change │
│ QQuery 40    │    26.74 ms │                 27.07 ms │     no change │
│ QQuery 41    │    25.65 ms │                 26.27 ms │     no change │
│ QQuery 42    │    25.35 ms │                 25.87 ms │     no change │
└──────────────┴─────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 88305.52ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 84016.17ms │
│ Average Time (HEAD)                     │  2053.62ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  1953.86ms │
│ Queries Faster                          │         14 │
│ Queries Slower                          │          2 │
│ Queries with No Change                  │         27 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘

@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from fafb102 to 191db07 Compare November 21, 2025 13:43
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate common Related to common crate proto Related to proto crate datasource Changes to the datasource crate labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate optimizer Optimizer rules proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant