New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-2997] Support read schema from index file and data file in CSDK #2807
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/759/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/761/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/959/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9027/ |
@KanakaKumar @kunal642 @jackylk Please review it. |
docs/CSDK-guide.md
Outdated
@@ -126,12 +126,39 @@ bool readFromS3(JNIEnv *env, char *argv[]) { | |||
reader.close(); | |||
} | |||
|
|||
// 3. destory JVM | |||
// 3. read schema | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
As after #2792. we just put link to main.cpp. No need to duplicate the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, removed
store/CSDK/CarbonSchemaReader.cpp
Outdated
#include "CarbonSchemaReader.h" | ||
|
||
CarbonSchemaReader::CarbonSchemaReader(JNIEnv *env) { | ||
this->carbonSchemaReaderClass = env->FindClass("org/apache/carbondata/sdk/file/CarbonSchemaReader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation, env->FindClass can return null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added.
store/CSDK/CarbonSchemaReader.cpp
Outdated
} | ||
|
||
jobject CarbonSchemaReader::readSchemaInDataFile(char *path) { | ||
jmethodID buildID = jniEnv->GetStaticMethodID(carbonSchemaReaderClass, "readSchemaInDataFile", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation and for below get the exception from java using jni.
Do this for all the newly added API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
store/CSDK/main.cpp
Outdated
bool readSchemaInDataFile(JNIEnv *env) { | ||
printf("\nread Schema from Data File:\n"); | ||
CarbonSchemaReader carbonSchemaReader(env); | ||
jobject schema = carbonSchemaReader.readSchemaInDataFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbondata? I think we should not keep any binaries in repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no binaries files in repo, only in my local
store/CSDK/main.cpp
Outdated
printf("\nread Schema from Index File from S3:\n"); | ||
CarbonSchemaReader carbonSchemaReader(env); | ||
jobject schema = carbonSchemaReader.readSchemaInIndexFile( | ||
"s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbonindex? I think we should not keep any binaries in repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no binaries files in repo, only in my local
store/CSDK/main.cpp
Outdated
printf("\nread Schema from Index File:\n"); | ||
CarbonSchemaReader carbonSchemaReader(env); | ||
jobject schema = carbonSchemaReader.readSchemaInIndexFile( | ||
"s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no binaries files in repo, only in my local
store/CSDK/main.cpp
Outdated
* @param env jni env | ||
* @return whether it is success | ||
*/ | ||
bool readSchemaInIndexFileFromS3(JNIEnv *env) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in main.cpp this testcase is commented. so remove here also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed, I will implement in the future
@@ -0,0 +1,233 @@ | |||
/* | |||
* Licensed to the Apache Software Foundation (ASF) under one or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. No need to generate data for each testcase, can generate in beforeAll and all test case use same data.
b. CarbonReaderTest has already covered scenarios mentioned in the below test. (both data and index file read)
so, this test case is not required. Please remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is test for new class, like Schema. java and we should move read schema test case from CarbonReaderTest to CarbonSchemaReaderTest
store/CSDK/Schema.cpp
Outdated
#include <jni.h> | ||
#include "Schema.h" | ||
|
||
Schema::Schema(JNIEnv *env, jobject schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other file.
Add validation and for below get the exception from java using jni.
Do this for all the newly added API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
store/CSDK/main.cpp
Outdated
printf("\nread Schema from Index File:\n"); | ||
CarbonSchemaReader carbonSchemaReader(env); | ||
jobject schema = carbonSchemaReader.readSchemaInIndexFile( | ||
"../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not have binary in the repo. (index file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no in repo
51d0a75
to
bdc025f
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1061/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1275/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9327/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1085/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1297/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9349/ |
store/CSDK/test/main.cpp
Outdated
jobject schema; | ||
try { | ||
schema = carbonSchemaReader.readSchemaInDataFile( | ||
"../../../../resources/carbondata/part-0-510199997055746_batchno0-0-null-510199277323454.carbondata"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot keep your local test case in repo. Make a general test case without binary dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed, after write function merged, I will optimized again.
try { | ||
schema = carbonSchemaReader.readSchemaInIndexFile( | ||
"../../../../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex"); | ||
} catch (jthrowable e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot keep your local test case in repo. Make a general test case without binary dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed, after write function merged, I will optimized again.
bdc025f
to
4e6a8ef
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1133/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9397/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1345/ |
@ajantha-bhat updated, please check again. |
…n CSDK 1.support readSchemaInIndexFile 2.support readSchemaInDataFile 3.support get field name and data type name 4.suppport get array child element data type name 5.can read schema when carbonreader has set ak,sk,endpoint 6.TODO: need support read scehma from S3 in the future
4e6a8ef
to
48fd308
Compare
rebase |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1171/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9435/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1384/ |
@ajantha-bhat updated, please check again. |
LGTM. Let one more reviewer review. |
I suggest let's hold this PR till 2804 is concluded & merged to avoid duplicate work. |
@KanakaKumar Can we provide C++ interface to read schema in this version? PR2804 we can discuss how to enhance/optimize in next version, it's ok. provide C++ interface to read schema is more important |
Ok Xubo, fine. |
LGTM |
ok, merge into master |
…n CSDK 1.support readSchemaInIndexFile 2.support readSchemaInDataFile 3.support get field name and data type name 4.suppport get array child element data type name 5.can read schema when carbonreader has set ak,sk,endpoint 6.TODO: need support read scehma from S3 in the future This closes #2807
[CARBONDATA-2997] Support read schema from index file and data file in CSDK
1.support readSchemaInIndexFile
2.support readSchemaInDataFile
3.support get field name and data type name
4.suppport get array child element data type name
5.can read schema when carbonreader has set ak,sk,endpoint
6.TODO: need support read scehma from S3 in the future
read Schema from Index File:
read Schema from Data File:
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
add interface
No
Yes
add test case
https://issues.apache.org/jira/browse/CARBONDATA-2951