Fix GcsPinotFS null safety and PinotFS contract compliance#18360
Open
Akanksha-kedia wants to merge 8 commits into
Open
Fix GcsPinotFS null safety and PinotFS contract compliance#18360Akanksha-kedia wants to merge 8 commits into
Akanksha-kedia wants to merge 8 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18360 +/- ##
============================================
+ Coverage 63.40% 63.69% +0.29%
- Complexity 1679 1684 +5
============================================
Files 3253 3266 +13
Lines 198767 199836 +1069
Branches 30791 31023 +232
============================================
+ Hits 126034 127293 +1259
+ Misses 62659 62399 -260
- Partials 10074 10144 +70
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Author
|
@xiangfu0 please review |
noob-se7en
reviewed
Apr 29, 2026
Contributor
noob-se7en
left a comment
There was a problem hiding this comment.
Lets add tests catching these exceptions.
Contributor
Author
|
@xiangfu0 @noob-se7en please review |
0b5b8f7 to
4de1ffd
Compare
Contributor
Author
|
@xiangfu0 @noob-se7en please review |
noob-se7en
reviewed
May 5, 2026
Contributor
Author
|
@noob-se7en please review and merge |
- lastModified(): return 0L when blob does not exist instead of throwing NPE, matching the PinotFS contract and LocalPinotFS behavior - touch(): create an empty file when the blob does not exist instead of throwing NPE, matching the PinotFS contract (S3PinotFS, LocalPinotFS) - open(): throw a descriptive IOException when the blob does not exist instead of throwing NPE from blob.reader() - copyFile(): guard against null source blob and clean up the empty destination blob on copy failure to prevent leaked zero-byte objects - Guard all getUpdateTime() calls against null return values Fixes apache#17714
…copy, fix deprecated getUpdateTime - Use FileNotFoundException (not IOException) in open() and copyFile() when a blob is missing, as FileNotFoundException is the standard Java exception for missing files and is a subtype of IOException so callers need not change - Drop String.format in exception messages (ref: apache#14404); use string concatenation - copyFile: use blob.copyTo(BlobId) directly instead of pre-creating an empty destination blob; this removes the redundant _storage.create() call, the cleanup try-catch, and the unnecessary blob.exists() call on the return path - Replace deprecated Blob.getUpdateTime() (returns Long) with Blob.getUpdateTimeOffsetDateTime() (returns OffsetDateTime) in lastModified(), touch(), and both listFilesWithMetadata() overloads - Add GcsPinotFSNullSafetyTest with unit tests verifying that open() and copyDir() throw FileNotFoundException (not NullPointerException) when blobs are missing
- Move java.time.OffsetDateTime import after java.net.URI to fix spotless import ordering violation - Replace TestNG assertThrows (returns void) with try-catch blocks so the caught FileNotFoundException message can be verified Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t mocks The production code calls blob.getUpdateTimeOffsetDateTime() to populate FileMetadata.lastModifiedTime, but mockBlob() only stubbed getUpdateTime() (Long). Mockito returned null for the unstubbed OffsetDateTime getter, causing lastModifiedTime to always be 0. Stub both methods so the testFileMetadataAttributes assertion passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… timestamp comparison The previous comparison `newUpdateTime > updateTime` incorrectly returned false when either the original or post-update blob had a null updateTime (both fell back to 0L, making 0 > 0 always false). As suggested in review, return true directly after _storage.update() succeeds since any GCS objects.patch call advances updateTime server-side.
29091ee to
3bb323d
Compare
Contributor
Author
|
@noob-se7en please review |
noob-se7en
reviewed
May 12, 2026
Contributor
noob-se7en
left a comment
There was a problem hiding this comment.
1 medium scope creep.
6 tasks
…exists() in mkdir - copy(): replace String.format with string concatenation per project style (issue apache#14404) - mkdir(): _storage.create() throws on failure, so returning blob.exists() after a successful create is redundant; return true directly
7aec6fc to
c8ffd40
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix several null-safety issues and PinotFS contract violations in
GcsPinotFS. The GCS SDK'sStorage.get(BlobId)returnsnullwhen a blob does not exist (unlike the S3 SDK which throwsNoSuchKeyException), so callers must handlenullexplicitly.Fixes #17714
Changes Made
1.
lastModified()— return 0L for missing filesThe PinotFS contract specifies that
lastModified()should return0Lif the file does not exist or if an I/O error occurs. Previously this method calledgetBlob().getUpdateTime()without a null check, resulting in an NPE. Now returns0Lfor missing blobs, consistent withLocalPinotFS.2.
touch()— create an empty file when the blob does not existThe PinotFS contract specifies that
touch()should create an empty file if the file does not exist. Previously this method assumed the blob exists and calledblob.getUpdateTime()/blob.toBuilder(), resulting in an NPE. Now creates an empty blob for missing files, consistent withS3PinotFSandLocalPinotFS.3.
open()— throw a descriptive IOException for missing filesPreviously called
blob.reader()on a potentially null blob, resulting in an NPE with no context. Now throws a descriptiveIOExceptionthat includes the URI.4.
copyFile()— null check on source blob and cleanup on failurePreviously called
blob.copyTo()on a potentially null source blob. Also, a zero-byte destination blob is created before the copy starts — if the copy fails, this empty blob was left behind. Now guards against null source and cleans up the destination blob on failure.5. Guarded all
getUpdateTime()callsBlobInfo.getUpdateTime()can returnnullfor directory entries or newly created blobs. Added null guards on all call sites.Testing Done
mvn compilepasses for the pinot-gcs modulemvn spotless:apply— no formatting changes neededmvn license:checkpassesThe existing
GcsPinotFSTestis an integration test that requires GCS credentials (env varsGOOGLE_APPLICATION_CREDENTIALS,GCP_PROJECT,GCS_BUCKET). The null-safety fixes address runtime paths that are triggered when blobs don't exist — these are inherently guarded by the GCS SDK's behavior and are consistent with howS3PinotFSandLocalPinotFShandle the same cases.