Skip to content

PinotFS consistency testsPinotFS consistency tests#16416

Open
shounakmk219 wants to merge 1 commit intoapache:masterfrom
shounakmk219:pinot-fs-consistency-tests
Open

PinotFS consistency testsPinotFS consistency tests#16416
shounakmk219 wants to merge 1 commit intoapache:masterfrom
shounakmk219:pinot-fs-consistency-tests

Conversation

@shounakmk219
Copy link
Copy Markdown
Collaborator

Description

This PR adds a base test suite that covers all the possible operations which can be done with PinotFS and validates if expected interface behaviour is followed.
We can extend this suite for any PinotFS implementation to check if the implementation is working as expected.

This PR onboards the Local, ADLS and S3 FS implementations.

  • Few tests are disabled explicitly for each FS as currently they are failing (reason mentioned in comment)
  • These tests are not enabled by default as they are require to connect with actual file stores. Users can provide the right creds as environment variables to run these tests.
  • Only LocalPinotFS test is enabled as it only needs local FS.

Configuration

Below are the required env vars for respective FS tests.

ADLS S3
ADLS_ACCOUNT_NAME S3_ACCESS_KEY
ADLS_ACCESS_KEY S3_SECRET_KEY
ADLS_FILE_SYSTEM_NAME S3_REGION
ADLS_FS_URI S3_FS_URI
ADLS_ENABLE_FS_TESTS S3_ENABLE_FS_TESTS

Set ADLS_ENABLE_FS_TESTS and S3_ENABLE_FS_TESTS to enable these tests.

Test Results

Sr. No. Test Case LocalFS S3 ADLS
1 testInit
2 testMkdir
3 testDelete
4 testDeleteBatch
5 testMove
6 testCopy
7 testExists
8 testLength
9 testListFiles
10 testListFilesWithMetadata
11 testCopyToFromLocalFile
12 testIsDirectory
13 testLastModified
14 testTouch
15 testOpen

Note: Check code comments for details on failures

@shounakmk219 shounakmk219 added testing Related to tests or test infrastructure pinot-filesystem Related to PinotFS abstraction (S3, GCS, ADLS, HDFS) labels Jul 24, 2025
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shounakmk219. Have you verified the S3 / ADLS test results locally considering that these won't be running on CI by default?

Comment on lines +30 to +32
private static final String ACCOUNT_NAME = "accountName";
private static final String ACCESS_KEY = "accessKey";
private static final String FILE_SYSTEM_NAME = "fileSystemName";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these already defined elsewhere?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing it up. These are private constants in the respective FS implementations, we can think of making them public instead of duplicating it.

Comment on lines +65 to +67
protected static String getEnvVar(String varName) {
return System.getenv(varName);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate method for this and why is it protected static?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled it out as more of a utility. Would be the single place to update if we decide to support passing test creds through file as well.

Comment on lines +76 to +78
// test fails as interface expects the FS client to throw an IOException when
// PinotFS.open() is called on non existent file,
// while the ADLS client throws a BlobStorageException which is a RuntimeException.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug that needs fixing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes these are the existing inconsistencies that we are surfacing/auditing through these tests. Fixes for these cases can follow after this.

Comment on lines +112 to +120
// Clean up test files in the filesystem
try {
_pinotFS.delete(_baseDirectoryUri, true);
} catch (Exception e) {
// Ignore cleanup errors
}

// Clean up local temp directory
FileUtils.deleteDirectory(_localTempDir);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these steps in both AfterMethod and AfterClass?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, having it here is sufficient, will remove it from AfterClass.

Comment on lines +41 to +51
@Override
@Test(enabled = false)
public void testListFiles() {
// test fails when local FS location is passed without the scheme
}

@Override
@Test(enabled = false)
public void testListFilesWithMetadata() {
// test fails when local FS location is passed without the scheme
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the test need to use an FS location without the scheme?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For local FS other operations work even if you don't provide the scheme


@Override
protected boolean disableTests() {
// only run the tests when ADLS_ENABLE_FS_TESTS is specifically set to true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// only run the tests when ADLS_ENABLE_FS_TESTS is specifically set to true
// only run the tests when S3_ENABLE_FS_TESTS is specifically set to true

Comment on lines +46 to +47
String adlsUri = getEnvVar(S3_FS_URI);
return new URI(adlsUri + (adlsUri.endsWith("/") ? "" : "/") + "fsTest/" + _uuid);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adls -> S3

Comment on lines +65 to +108
@Override
@Test(enabled = false)
public void testCopy()
throws Exception {
// test fails as S3PinotFS.sanitizePath() trims the leading delimiter due to which
// URI object creation fails as it expects an absolute path (starting with '/')
}

@Override
@Test(enabled = false)
public void testDelete() {
// test fails as interface expects the FS client to return false when
// PinotFS.delete() is called on a non-empty directory and forceDelete is not set to true,
// while the FS implementation has a check on it which throws IllegalStateException
}

@Override
@Test(enabled = false)
public void testListFiles() {
// test fails as PinotFS.listFiles() is expected to list all the files as well as directories while
// the S3 client only lists files and skips listing directories
}

@Override
@Test(enabled = false)
public void testListFilesWithMetadata() {
// test fails as PinotFS.listFiles() is expected to list all the files as well as directories while
// the S3 client only lists files and skips listing directories
}

@Override
@Test(enabled = false)
public void testOpen() {
// test fails as interface expects the FS client to throw an IOException when
// PinotFS.open() is called on non existent file,
// while the S3 client throws a NoSuchKeyException which is a RuntimeException.
}

@Override
@Test(enabled = false)
public void testMove() {
// test fails as S3PinotFS.sanitizePath() trims the leading delimiter due to which
// URI object creation fails as it expects an absolute path (starting with '/')
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a lot of disabled / skipped tests - what's the plan here? Do we aim to eventually unify the behavior across all FS implementations so that all these tests pass for all the clients? Or do you plan to update the tests to take into account different behavior across FS client implementations? As is, this looks pretty odd to add so many new tests that don't work with one implementation or another.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the behaviour should be consistent across all FS implementations as we want to use FS without worrying about the underlying implementations and avoid running into issues like things working on local (LocalFs) or any particular cloud (say azure) but then starts breaking when same code is deployed to another cloud (say aws).
We are kind of doing TDD here where we first find all the failure cases and then think about fixing it or updating the interface contract based on what makes more sense.

Comment on lines +607 to +614
// Test non-existent path
try {
_pinotFS.isDirectory(nonExistentUri);
// Some implementations might return false instead of throwing an exception
// so we don't assert here
} catch (IOException e) {
// Expected exception in some implementations
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this essentially doing nothing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface has kind of an ambiguous contract

@return true if uri is a directory, false otherwise.
@throws IOException on IO failure, e.g uri is not valid or not present

hence allowing false (as per false otherwise) as well as IOException (as per the example).
The test is just ensuring nothing other than IOException is thrown on non-existent path

Comment on lines +728 to +733
try {
_pinotFS.open(dirUri);
// Some implementations might not throw an exception, but return an empty stream
} catch (IOException e) {
// Expected exception in some implementations
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above - is this really testing anything?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! this should always throw IOException on directory


@Override
@Test(enabled = false)
public void testLength() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans to have some uniformity in the way this interface has been implemented ?
Do we aim to change the ADLS implementation to also throw an exception when we get 0 in case of directory ?

I recall making a change for fs.delete in case of GCS when the directory or file is not present.
This divergent behavior lead to issues in segment deletion manager.

I understand it might be beyond of the scope of this PR though.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this PR is mostly aimed to surface the inconsistencies in FS implementations. We can have follow-up PRs to address these inconsistencies as they would be more involved based on special handling done at places like the example you gave above.
If any fs is unable to satisfy the interface requirement efficiently, it should call it out.


@Override
@Test(enabled = false)
public void testTouch() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Why can't the implementation be tweaked a bit to check whether the path exists or not and if not, create an empty file.

I guess the interface follows the default s3 behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above, these tests are just auditing the existing gaps. Fixes can be in separate PRs with more involved discussions around the underlying cloud limitations and impact.

*
* @return PinotFS implementation to test
*/
protected abstract PinotFS getPinotFS();
Copy link
Copy Markdown
Contributor

@9aman 9aman Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the init() being done in the setup()
https://github.com/apache/pinot/pull/16416/files#diff-fc0f4c1c873c4cdfd225ff79f7b89085223928168ed2287d940ee7c1c0a641ebR92

. The implementation too don't seem to initialize the FS.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, changed the code but forgot to update the comment. Will fix the comment.

// Clean up test files in the filesystem
try {
_pinotFS.delete(_baseDirectoryUri, true);
_pinotFS.mkdir(_baseDirectoryUri);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be part of before Method instead ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}
// Clean up local temp directory
FileUtils.deleteDirectory(_localTempDir);
FileUtils.forceMkdir(_localTempDir);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

@9aman
Copy link
Copy Markdown
Contributor

9aman commented Jul 28, 2025

@shounakmk219 was there any testing done for non local FS ?
Why is GCS skipped ?

@xiangfu0 xiangfu0 added AZURE Related to Azure-specific features or deployment AWS Related to AWS-specific features or deployment labels Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AWS Related to AWS-specific features or deployment AZURE Related to Azure-specific features or deployment pinot-filesystem Related to PinotFS abstraction (S3, GCS, ADLS, HDFS) testing Related to tests or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants