Skip to content

fix(azure): fall back to client-side filtering for list_with_offset on OneLake#696

Open
kevinjqliu wants to merge 5 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/onelake-startFrom
Open

fix(azure): fall back to client-side filtering for list_with_offset on OneLake#696
kevinjqliu wants to merge 5 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/onelake-startFrom

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented Apr 23, 2026

Which issue does this PR close?

Closes #695.

Rationale for this change

list_with_offset returns empty results on OneLake when both of the following conditions are met:

  1. Friendly-name URLs — the endpoint addresses the workspace and lakehouse by display name (e.g. onelake.blob.fabric.microsoft.com/MyWorkspace/lakehouse.Lakehouse/...) rather than by GUID
  2. startFrom query parameter — the List Blobs request includes startFrom, which OneLake silently ignores in this case, returning 200 OK with zero results rather than rejecting the request

(Note that when using GUID-based URLs (onelake.blob.fabric.microsoft.com/{workspace-guid}), startFrom works correctly.)

This regression was introduced in #623 which implemented optimizations using startFrom pushdown.

We (The OneLake team) are actively fixing startFrom with friendly-name URLs on our side.
In the meantime, object_store can fall back to client-side filtering for OneLake endpoints.
I created a tracking issue (#697) to re-enable this optimization for OneLake once the fix is complete

What changes are included in this PR?

  • Auto-detect Fabric/OneLake endpoints (*.fabric.microsoft.com) and skip startFrom, falling back to client-side filtering -- same approach already used for Azurite
  • Add #[ignore] integration test (test_onelake_list_with_offset) that verifies offset exclusivity, boundary cases, and ordering against a live OneLake endpoint

Tested locally

GUID-based URI works:

export AZURE_STORAGE_TOKEN=$(az account get-access-token --resource https://storage.azure.com/ --query accessToken -o tsv) && \
ONELAKE_URL="https://msit-onelake.blob.fabric.microsoft.com/OLSTeamWorkspace/lh.Lakehouse/Files/test_startfrom" \
cargo test --features azure test_onelake_list_with_offset -- --ignored --no-capture

Friendly name based URI fails:

export AZURE_STORAGE_TOKEN=$(az account get-access-token --resource https://storage.azure.com/ --query accessToken -o tsv) && \
ONELAKE_URL="https://msit-onelake.blob.fabric.microsoft.com/45a753bc-c074-42cf-8b30-5dfa920b241f/3e56af9e-3832-47ed-b18c-dcc58b562e87/Files/test_startfrom" \
cargo test --features azure test_onelake_list_with_offset -- --ignored --no-capture

Are there any user-facing changes?

No, this is a behavioral change for list_with_offset against OneLake endpoints

@kevinjqliu
Copy link
Copy Markdown
Contributor Author

Need to rebase #694 to pass CI

@kevinjqliu
Copy link
Copy Markdown
Contributor Author

@tustvold could you take a look?

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/onelake-startFrom branch from 9b2d474 to c014f56 Compare April 23, 2026 21:06
@kevinjqliu
Copy link
Copy Markdown
Contributor Author

rebased to pull in #694

Copy link
Copy Markdown
Contributor

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only suggestion that I have is to provide a true listing of the OneLake location used for the manual testing in case anybody motivated enough to come along and try that manual test later needs to 😄

Is there a publicly linkable reference to the bug witht he OneLake service that can be referenced as well?

I am cautiously optimistic that this could get into a 0.13.3 so some downstream breaks can be resolved ahead of 0.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MicrosoftAzure::list_with_offset returns empty on OneLake since 0.13.0 (regression from #623)

2 participants