Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: apache/iceberg-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: apache/iceberg-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: fd-infer-types
Choose a head ref
  • 9 commits
  • 6 files changed
  • 1 contributor

Commits on Feb 16, 2025

  1. Arrow: Infer the types when reading

    When reading a Parquet file using PyArrow, there is some metadata
    stored in the Parquet file to either make it a large type (eg
    `large_string`, or a normal type (`string`). The difference is that
    the large types use a 64 bit offset to encode their arrays.
    This is not always needed, and we can could first check all the
    in the types of which it is stored, and let PyArrow decide here:
    
    https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/io/pyarrow.py#L1579
    
    In PyArrow today we just bump everything to a large type, which
    might lead to additional memory consumption because it allocates
    a int64 array to allocate the offsets, instead of an int32.
    
    I thought we would be good to go for this now with the new lower
    bound of PyArrow to 17. But, it looks like we still have to wait
    for Arrow 18 to fix the issue with the `date` types:
    
    apache/arrow#43183
    
    Fixes: #1049
    Fokko committed Feb 16, 2025
    Copy the full SHA
    fa9b3ca View commit details

Commits on Feb 18, 2025

  1. Less is more 😍

    Fokko committed Feb 18, 2025
    Copy the full SHA
    0384b4e View commit details
  2. Reinstate the table property

    Fokko committed Feb 18, 2025
    Copy the full SHA
    6dd9308 View commit details
  3. Cleanup

    Fokko committed Feb 18, 2025
    Copy the full SHA
    2817c61 View commit details

Commits on Mar 4, 2025

  1. Copy the full SHA
    d6fbca9 View commit details
  2. Fix import

    Fokko committed Mar 4, 2025
    Copy the full SHA
    fff7414 View commit details
  3. Add warning

    Fokko committed Mar 4, 2025
    Copy the full SHA
    0d19987 View commit details
  4. MOAR deprecation

    Fokko committed Mar 4, 2025
    Copy the full SHA
    7382112 View commit details

Commits on Mar 26, 2025

  1. Copy the full SHA
    6526cc2 View commit details

This comparison is taking too long to generate.

Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.

You can try running this command locally to see the comparison on your machine:
git diff main...fd-infer-types