New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-2996] CarbonSchemaReader support read schema from folder path #2804
[CARBONDATA-2996] CarbonSchemaReader support read schema from folder path #2804
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/746/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9012/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/944/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/757/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/758/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/956/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9024/ |
@KanakaKumar @kunal642 @jackylk Please review it. |
if (carbonFiles == null || carbonFiles.length < 1) { | ||
throw new RuntimeException("Carbon data file not exists."); | ||
} | ||
dataFilePath = carbonFiles[0].getAbsolutePath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking only one data file (first file) ?
What if this folder has multiple files with different schema. what if user wanted schema info from other file also?
Supporting schema read from folder is not required as this is exposed for user and he has the list of files.
a) to read one file, user passes single file for this API. -- already supported
b) to read multiple files, user can list files and pass all the files he want schema and call our API in a list -- already supported.
Just reading first file from folder doesn't make sense. This PR is not required as existing API already support all user scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, take the only one data file.
It's more convenient for user give a path to read schema。and maybe the folder has sub-folder,use need list iteratively。There are some customer has this problem。
We can judge the different files schema if it's necessary。SDK can throw exception if multiple files has different schema。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case you can implement,
String getFirstCarbonFile(path, ExtenstionType)
and pass it to existing method. ReadSchemaFromFile() must only read it. It should not do any extra work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I add ReadSchemaFromFirstDataFile and ReadSchemaFromFirstIndexFile
@xubo245 : |
@ajantha-bhat There are already some user has this problem。 Between different services,they only give the path to other, the user need list the index/data file, even though they need list sub-folder iteratively to find the carbon index/data file, which is not convenient for user。 We can make it become public function for all user。 |
@ajantha-bhat @KanakaKumar Please review again. |
@xubo245 : In that case you can implement, String getFirstCarbonFile(path, ExtenstionType) and pass it to existing method. ReadSchemaFromFile() must only read it. It should not do any extra work. |
FileUtils.deleteDirectory(new File(path)); | ||
|
||
Field[] fields = new Field[11]; | ||
fields[0] = new Field("stringField", DataTypes.STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write you can move it in the setup() step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
assert (strings[1].equalsIgnoreCase("shortField")); | ||
assert (strings[2].equalsIgnoreCase("intField")); | ||
assert (strings[3].equalsIgnoreCase("longField")); | ||
assert (strings[4].equalsIgnoreCase("doubleField")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can move it to a method and use for both the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
a1f9629
to
5046e76
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1148/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1359/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9413/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1365/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1158/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9420/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1374/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1161/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9425/ |
@ajantha-bhat CI pass, please check again. |
* @param path carbondata file path | ||
* @return first carbondata file name | ||
*/ | ||
public static String getFirstCarbonDataFile(String path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already suggested to keep getFirstCarbonFile(path, extension) -- this only will give data or index file based on the extension.
no need to have duplicate code for both index and data file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, misunderstand , sorry。
Updated
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9448/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1187/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1242/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9507/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1458/ |
@KanakaKumar @kunal642 @ajantha-bhat CI pass, Please check. |
docs/sdk-guide.md
Outdated
* @return schema | ||
* @throws IOException | ||
*/ | ||
public static Schema readSchema(String path, boolean checkFilesSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkFilesSchema should be validateSchema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
.asOriginOrder(); | ||
|
||
assertEquals(schema.getFieldsLength(), 12); | ||
checkSchema(schema); | ||
} catch (Throwable e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done,added Assert.fail();
* @return schema | ||
* @throws IOException | ||
*/ | ||
public static Schema readSchema(String path, boolean checkFilesSchema) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readSchema(String path, boolean checkFilesSchema)
-- Is this schema validation method is required ? If no use case we can skip this.. during query execution anyways schema is validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when user only want to check schema and no need to query data, they can use readSchema.
} | ||
}); | ||
if (carbonFiles == null || carbonFiles.length < 1) { | ||
throw new RuntimeException("Carbon file not exists."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why RunTimeException, IO related failures should throw IOException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
c61c6a6
to
a27280e
Compare
…path 1.Deprecated readSchemaInIndexFile and readSchemaInDataFile, unify them to readSchema in SDK 2.delete readSchemaInIndexFile and readSchemaInDataFile, unify them to readSchema in CSDK 3.readSchema support read schema from folder path,carbonindex file, and carbondata file. and user can decide whether check all files schema
a27280e
to
e853036
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1297/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1509/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9558/ |
@KanakaKumar @kunal642 CI pass, please check it. |
schema = readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath()); | ||
for (int i = 1; i < carbonIndexFiles.length; i++) { | ||
Schema schema2 = readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath()); | ||
if (schema != schema2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use equals .. schema.equals(schema2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
if (indexFilePath != null) { | ||
return readSchemaFromIndexFile(indexFilePath); | ||
} else { | ||
String dataFilePath = getCarbonFile(path, CARBON_DATA_EXT)[0].getAbsolutePath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per getCarbonFile(...) implementation, if there is no INDEX file found, it throws exception. So, there is no need of this else case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, removed else
} | ||
return carbonFiles; | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can stick to one contract from the method. Either return the list or throw exception. Generally listing APIs should not return null, if this case is not expected, we can throw exception to avoid null checks in the callers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, throw exception
* @return | ||
* @throws IOException | ||
*/ | ||
public static String getVersionDetails(String dataFilePath) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complete method is displayed as removed and added again. Is it possible to avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't
} | ||
} else { | ||
String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT)[0].getAbsolutePath(); | ||
if (indexFilePath != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this null check is not required. Is there any chance the absolute path can be null ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1531/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9578/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1320/ |
@KanakaKumar CI pass, please check it. |
LGTM |
1 similar comment
LGTM |
…path 1.Deprecated readSchemaInIndexFile and readSchemaInDataFile, unify them to readSchema in SDK 2.delete readSchemaInIndexFile and readSchemaInDataFile, unify them to readSchema in CSDK 3.readSchema support read schema from folder path,carbonindex file, and carbondata file. and user can decide whether check all files schema This closes #2804
[CARBONDATA-2996] CarbonSchemaReader support read schema from folder path
1.Deprecated readSchemaInIndexFile and readSchemaInDataFile, unify them to readSchema
2.Deprecated readSchemaInSchemaFile
3.readSchema support read schema from folder path,carbonindex file, and carbondata file. and user can decide whether check all files schema
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
No
No
No
add test case
https://issues.apache.org/jira/browse/CARBONDATA-2951