Skip to content

Fix GcsPinotFS null safety and PinotFS contract compliance#18360

Open
Akanksha-kedia wants to merge 8 commits into
apache:masterfrom
Akanksha-kedia:fix-gcs-pinot-fs-null-safety
Open

Fix GcsPinotFS null safety and PinotFS contract compliance#18360
Akanksha-kedia wants to merge 8 commits into
apache:masterfrom
Akanksha-kedia:fix-gcs-pinot-fs-null-safety

Conversation

@Akanksha-kedia
Copy link
Copy Markdown
Contributor

Description

Fix several null-safety issues and PinotFS contract violations in GcsPinotFS. The GCS SDK's Storage.get(BlobId) returns null when a blob does not exist (unlike the S3 SDK which throws NoSuchKeyException), so callers must handle null explicitly.

Fixes #17714

Changes Made

1. lastModified() — return 0L for missing files

The PinotFS contract specifies that lastModified() should return 0L if the file does not exist or if an I/O error occurs. Previously this method called getBlob().getUpdateTime() without a null check, resulting in an NPE. Now returns 0L for missing blobs, consistent with LocalPinotFS.

2. touch() — create an empty file when the blob does not exist

The PinotFS contract specifies that touch() should create an empty file if the file does not exist. Previously this method assumed the blob exists and called blob.getUpdateTime() / blob.toBuilder(), resulting in an NPE. Now creates an empty blob for missing files, consistent with S3PinotFS and LocalPinotFS.

3. open() — throw a descriptive IOException for missing files

Previously called blob.reader() on a potentially null blob, resulting in an NPE with no context. Now throws a descriptive IOException that includes the URI.

4. copyFile() — null check on source blob and cleanup on failure

Previously called blob.copyTo() on a potentially null source blob. Also, a zero-byte destination blob is created before the copy starts — if the copy fails, this empty blob was left behind. Now guards against null source and cleans up the destination blob on failure.

5. Guarded all getUpdateTime() calls

BlobInfo.getUpdateTime() can return null for directory entries or newly created blobs. Added null guards on all call sites.

Testing Done

  • mvn compile passes for the pinot-gcs module
  • mvn spotless:apply — no formatting changes needed
  • mvn license:check passes

The existing GcsPinotFSTest is an integration test that requires GCS credentials (env vars GOOGLE_APPLICATION_CREDENTIALS, GCP_PROJECT, GCS_BUCKET). The null-safety fixes address runtime paths that are triggered when blobs don't exist — these are inherently guarded by the GCS SDK's behavior and are consistent with how S3PinotFS and LocalPinotFS handle the same cases.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.69%. Comparing base (1881884) to head (c8ffd40).
⚠️ Report is 112 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18360      +/-   ##
============================================
+ Coverage     63.40%   63.69%   +0.29%     
- Complexity     1679     1684       +5     
============================================
  Files          3253     3266      +13     
  Lines        198767   199836    +1069     
  Branches      30791    31023     +232     
============================================
+ Hits         126034   127293    +1259     
+ Misses        62659    62399     -260     
- Partials      10074    10144      +70     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-21 63.69% <ø> (+0.29%) ⬆️
temurin 63.69% <ø> (+0.29%) ⬆️
unittests 63.69% <ø> (+0.29%) ⬆️
unittests1 55.84% <ø> (+0.48%) ⬆️
unittests2 34.94% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Akanksha-kedia
Copy link
Copy Markdown
Contributor Author

@xiangfu0 please review

Copy link
Copy Markdown
Contributor

@noob-se7en noob-se7en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add tests catching these exceptions.

@noob-se7en noob-se7en added pinot-filesystem Related to PinotFS abstraction (S3, GCS, ADLS, HDFS) bug Something is not working as expected GCP Related to Google Cloud-specific features or deployment labels Apr 29, 2026
@Akanksha-kedia
Copy link
Copy Markdown
Contributor Author

@xiangfu0 @noob-se7en please review

@Akanksha-kedia Akanksha-kedia force-pushed the fix-gcs-pinot-fs-null-safety branch from 0b5b8f7 to 4de1ffd Compare May 5, 2026 11:11
@Akanksha-kedia
Copy link
Copy Markdown
Contributor Author

@xiangfu0 @noob-se7en please review

@Akanksha-kedia
Copy link
Copy Markdown
Contributor Author

@noob-se7en please review and merge

Akanksha-kedia and others added 7 commits May 10, 2026 19:06
- lastModified(): return 0L when blob does not exist instead of throwing
  NPE, matching the PinotFS contract and LocalPinotFS behavior
- touch(): create an empty file when the blob does not exist instead of
  throwing NPE, matching the PinotFS contract (S3PinotFS, LocalPinotFS)
- open(): throw a descriptive IOException when the blob does not exist
  instead of throwing NPE from blob.reader()
- copyFile(): guard against null source blob and clean up the empty
  destination blob on copy failure to prevent leaked zero-byte objects
- Guard all getUpdateTime() calls against null return values

Fixes apache#17714
…copy, fix deprecated getUpdateTime

- Use FileNotFoundException (not IOException) in open() and copyFile() when a blob
  is missing, as FileNotFoundException is the standard Java exception for missing files
  and is a subtype of IOException so callers need not change
- Drop String.format in exception messages (ref: apache#14404); use string concatenation
- copyFile: use blob.copyTo(BlobId) directly instead of pre-creating an empty
  destination blob; this removes the redundant _storage.create() call, the cleanup
  try-catch, and the unnecessary blob.exists() call on the return path
- Replace deprecated Blob.getUpdateTime() (returns Long) with
  Blob.getUpdateTimeOffsetDateTime() (returns OffsetDateTime) in lastModified(),
  touch(), and both listFilesWithMetadata() overloads
- Add GcsPinotFSNullSafetyTest with unit tests verifying that open() and copyDir()
  throw FileNotFoundException (not NullPointerException) when blobs are missing
- Move java.time.OffsetDateTime import after java.net.URI to fix spotless
  import ordering violation
- Replace TestNG assertThrows (returns void) with try-catch blocks so
  the caught FileNotFoundException message can be verified

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t mocks

The production code calls blob.getUpdateTimeOffsetDateTime() to populate
FileMetadata.lastModifiedTime, but mockBlob() only stubbed getUpdateTime()
(Long). Mockito returned null for the unstubbed OffsetDateTime getter,
causing lastModifiedTime to always be 0. Stub both methods so the
testFileMetadataAttributes assertion passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… timestamp comparison

The previous comparison `newUpdateTime > updateTime` incorrectly returned false
when either the original or post-update blob had a null updateTime (both fell back
to 0L, making 0 > 0 always false). As suggested in review, return true directly
after _storage.update() succeeds since any GCS objects.patch call advances
updateTime server-side.
@Akanksha-kedia Akanksha-kedia force-pushed the fix-gcs-pinot-fs-null-safety branch from 29091ee to 3bb323d Compare May 10, 2026 13:37
@Akanksha-kedia
Copy link
Copy Markdown
Contributor Author

@noob-se7en please review

Copy link
Copy Markdown
Contributor

@noob-se7en noob-se7en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 medium scope creep.

Comment thread pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/StreamOp.java Outdated
…exists() in mkdir

- copy(): replace String.format with string concatenation per project style (issue apache#14404)
- mkdir(): _storage.create() throws on failure, so returning blob.exists() after a
  successful create is redundant; return true directly
@Akanksha-kedia Akanksha-kedia force-pushed the fix-gcs-pinot-fs-null-safety branch from 7aec6fc to c8ffd40 Compare May 14, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected GCP Related to Google Cloud-specific features or deployment pinot-filesystem Related to PinotFS abstraction (S3, GCS, ADLS, HDFS)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve GcsPinotFS null safety and contract compliance

3 participants