PinotFS consistency testsPinotFS consistency tests#16416
PinotFS consistency testsPinotFS consistency tests#16416shounakmk219 wants to merge 1 commit intoapache:masterfrom
Conversation
yashmayya
left a comment
There was a problem hiding this comment.
Thanks @shounakmk219. Have you verified the S3 / ADLS test results locally considering that these won't be running on CI by default?
| private static final String ACCOUNT_NAME = "accountName"; | ||
| private static final String ACCESS_KEY = "accessKey"; | ||
| private static final String FILE_SYSTEM_NAME = "fileSystemName"; |
There was a problem hiding this comment.
Aren't these already defined elsewhere?
There was a problem hiding this comment.
Thanks for bringing it up. These are private constants in the respective FS implementations, we can think of making them public instead of duplicating it.
| protected static String getEnvVar(String varName) { | ||
| return System.getenv(varName); | ||
| } |
There was a problem hiding this comment.
Why do we need a separate method for this and why is it protected static?
There was a problem hiding this comment.
Pulled it out as more of a utility. Would be the single place to update if we decide to support passing test creds through file as well.
| // test fails as interface expects the FS client to throw an IOException when | ||
| // PinotFS.open() is called on non existent file, | ||
| // while the ADLS client throws a BlobStorageException which is a RuntimeException. |
There was a problem hiding this comment.
Is this a bug that needs fixing?
There was a problem hiding this comment.
yes these are the existing inconsistencies that we are surfacing/auditing through these tests. Fixes for these cases can follow after this.
| // Clean up test files in the filesystem | ||
| try { | ||
| _pinotFS.delete(_baseDirectoryUri, true); | ||
| } catch (Exception e) { | ||
| // Ignore cleanup errors | ||
| } | ||
|
|
||
| // Clean up local temp directory | ||
| FileUtils.deleteDirectory(_localTempDir); |
There was a problem hiding this comment.
Why do we need these steps in both AfterMethod and AfterClass?
There was a problem hiding this comment.
Agree, having it here is sufficient, will remove it from AfterClass.
| @Override | ||
| @Test(enabled = false) | ||
| public void testListFiles() { | ||
| // test fails when local FS location is passed without the scheme | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testListFilesWithMetadata() { | ||
| // test fails when local FS location is passed without the scheme | ||
| } |
There was a problem hiding this comment.
Why does the test need to use an FS location without the scheme?
There was a problem hiding this comment.
For local FS other operations work even if you don't provide the scheme
|
|
||
| @Override | ||
| protected boolean disableTests() { | ||
| // only run the tests when ADLS_ENABLE_FS_TESTS is specifically set to true |
There was a problem hiding this comment.
| // only run the tests when ADLS_ENABLE_FS_TESTS is specifically set to true | |
| // only run the tests when S3_ENABLE_FS_TESTS is specifically set to true |
| String adlsUri = getEnvVar(S3_FS_URI); | ||
| return new URI(adlsUri + (adlsUri.endsWith("/") ? "" : "/") + "fsTest/" + _uuid); |
| @Override | ||
| @Test(enabled = false) | ||
| public void testCopy() | ||
| throws Exception { | ||
| // test fails as S3PinotFS.sanitizePath() trims the leading delimiter due to which | ||
| // URI object creation fails as it expects an absolute path (starting with '/') | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testDelete() { | ||
| // test fails as interface expects the FS client to return false when | ||
| // PinotFS.delete() is called on a non-empty directory and forceDelete is not set to true, | ||
| // while the FS implementation has a check on it which throws IllegalStateException | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testListFiles() { | ||
| // test fails as PinotFS.listFiles() is expected to list all the files as well as directories while | ||
| // the S3 client only lists files and skips listing directories | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testListFilesWithMetadata() { | ||
| // test fails as PinotFS.listFiles() is expected to list all the files as well as directories while | ||
| // the S3 client only lists files and skips listing directories | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testOpen() { | ||
| // test fails as interface expects the FS client to throw an IOException when | ||
| // PinotFS.open() is called on non existent file, | ||
| // while the S3 client throws a NoSuchKeyException which is a RuntimeException. | ||
| } | ||
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testMove() { | ||
| // test fails as S3PinotFS.sanitizePath() trims the leading delimiter due to which | ||
| // URI object creation fails as it expects an absolute path (starting with '/') | ||
| } |
There was a problem hiding this comment.
These are a lot of disabled / skipped tests - what's the plan here? Do we aim to eventually unify the behavior across all FS implementations so that all these tests pass for all the clients? Or do you plan to update the tests to take into account different behavior across FS client implementations? As is, this looks pretty odd to add so many new tests that don't work with one implementation or another.
There was a problem hiding this comment.
Ideally the behaviour should be consistent across all FS implementations as we want to use FS without worrying about the underlying implementations and avoid running into issues like things working on local (LocalFs) or any particular cloud (say azure) but then starts breaking when same code is deployed to another cloud (say aws).
We are kind of doing TDD here where we first find all the failure cases and then think about fixing it or updating the interface contract based on what makes more sense.
| // Test non-existent path | ||
| try { | ||
| _pinotFS.isDirectory(nonExistentUri); | ||
| // Some implementations might return false instead of throwing an exception | ||
| // so we don't assert here | ||
| } catch (IOException e) { | ||
| // Expected exception in some implementations | ||
| } |
There was a problem hiding this comment.
Isn't this essentially doing nothing?
There was a problem hiding this comment.
The interface has kind of an ambiguous contract
@return true if uri is a directory, false otherwise.
@throws IOException on IO failure, e.g uri is not valid or not present
hence allowing false (as per false otherwise) as well as IOException (as per the example).
The test is just ensuring nothing other than IOException is thrown on non-existent path
| try { | ||
| _pinotFS.open(dirUri); | ||
| // Some implementations might not throw an exception, but return an empty stream | ||
| } catch (IOException e) { | ||
| // Expected exception in some implementations | ||
| } |
There was a problem hiding this comment.
Same question as above - is this really testing anything?
There was a problem hiding this comment.
good catch! this should always throw IOException on directory
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testLength() { |
There was a problem hiding this comment.
Are there any plans to have some uniformity in the way this interface has been implemented ?
Do we aim to change the ADLS implementation to also throw an exception when we get 0 in case of directory ?
I recall making a change for fs.delete in case of GCS when the directory or file is not present.
This divergent behavior lead to issues in segment deletion manager.
I understand it might be beyond of the scope of this PR though.
There was a problem hiding this comment.
Yeah this PR is mostly aimed to surface the inconsistencies in FS implementations. We can have follow-up PRs to address these inconsistencies as they would be more involved based on special handling done at places like the example you gave above.
If any fs is unable to satisfy the interface requirement efficiently, it should call it out.
|
|
||
| @Override | ||
| @Test(enabled = false) | ||
| public void testTouch() { |
There was a problem hiding this comment.
Same as above. Why can't the implementation be tweaked a bit to check whether the path exists or not and if not, create an empty file.
I guess the interface follows the default s3 behavior.
There was a problem hiding this comment.
As explained above, these tests are just auditing the existing gaps. Fixes can be in separate PRs with more involved discussions around the underlying cloud limitations and impact.
| * | ||
| * @return PinotFS implementation to test | ||
| */ | ||
| protected abstract PinotFS getPinotFS(); |
There was a problem hiding this comment.
Isn't the init() being done in the setup()
https://github.com/apache/pinot/pull/16416/files#diff-fc0f4c1c873c4cdfd225ff79f7b89085223928168ed2287d940ee7c1c0a641ebR92
. The implementation too don't seem to initialize the FS.
There was a problem hiding this comment.
good catch, changed the code but forgot to update the comment. Will fix the comment.
| // Clean up test files in the filesystem | ||
| try { | ||
| _pinotFS.delete(_baseDirectoryUri, true); | ||
| _pinotFS.mkdir(_baseDirectoryUri); |
There was a problem hiding this comment.
Could this be part of before Method instead ?
| } | ||
| // Clean up local temp directory | ||
| FileUtils.deleteDirectory(_localTempDir); | ||
| FileUtils.forceMkdir(_localTempDir); |
|
@shounakmk219 was there any testing done for non local FS ? |
Description
This PR adds a base test suite that covers all the possible operations which can be done with PinotFS and validates if expected interface behaviour is followed.
We can extend this suite for any PinotFS implementation to check if the implementation is working as expected.
This PR onboards the Local, ADLS and S3 FS implementations.
Configuration
Below are the required env vars for respective FS tests.
Set
ADLS_ENABLE_FS_TESTSandS3_ENABLE_FS_TESTSto enable these tests.Test Results
Note: Check code comments for details on failures