Skip to content

Fix KeyError in S3BasedDocs when prefix path has no common prefixes#190

Merged
acarbonetto merged 1 commit intoawslabs:mainfrom
mykola-pereyma:fix/s3-based-docs-common-prefixes
Apr 9, 2026
Merged

Fix KeyError in S3BasedDocs when prefix path has no common prefixes#190
acarbonetto merged 1 commit intoawslabs:mainfrom
mykola-pereyma:fix/s3-based-docs-common-prefixes

Conversation

@mykola-pereyma
Copy link
Copy Markdown
Contributor

Description

Fixes a KeyError: 'CommonPrefixes' crash in both S3DocDownloader.download() and S3ChunkDownloader.download() when the S3 prefix path is empty or has no sub-prefixes.

Problem

The S3 list_objects_v2 paginator omits the CommonPrefixes key entirely from the response when there are no common prefixes. Both downloaders accessed this key directly:

for source_doc_obj in source_doc_page['CommonPrefixes']  # KeyError if key absent

This crashes when:

  • The S3 bucket/prefix is empty (no data uploaded yet)
  • The collection path doesn't exist
  • A paginator page has no sub-prefixes

Fix

Replace direct dict access with .get() and empty list default at both locations:

for source_doc_obj in source_doc_page.get('CommonPrefixes', [])

Tests

Added 4 unit tests covering:

  • S3DocDownloader with empty S3 response (no CommonPrefixes key)
  • S3ChunkDownloader with empty S3 response
  • S3DocDownloader with mixed pages (some with prefixes, some without)
  • S3ChunkDownloader with mixed pages

All 21 tests in test_s3_based_docs.py pass.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

S3 list_objects_v2 paginator omits the CommonPrefixes key entirely
when there are no sub-prefixes in the response. Both S3DocDownloader
and S3ChunkDownloader accessed this key directly, causing a KeyError
when querying an empty or non-existent S3 path.

Replace dict key access with .get() and empty list default at both
locations (lines 82 and 290).
Copy link
Copy Markdown
Collaborator

@oussamahansal oussamahansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@acarbonetto acarbonetto merged commit 0f70b22 into awslabs:main Apr 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants