[CARBONDATA-2997] Support read schema from index file and data file in CSDK #2807

xubo245 · 2018-10-10T03:53:53Z

[CARBONDATA-2997] Support read schema from index file and data file in CSDK

1.support readSchemaInIndexFile
2.support readSchemaInDataFile
3.support get field name and data type name
4.suppport get array child element data type name
5.can read schema when carbonreader has set ak,sk,endpoint
6.TODO: need support read scehma from S3 in the future

read Schema from Index File:

schema length is:12
0	stringfield	STRING
1	datefield	DATE
2	timefield	TIMESTAMP
3	varcharfield	VARCHAR
4	arrayfield	ARRAY
STRING
5	shortfield	SHORT
6	intfield	INT
7	longfield	LONG
8	doublefield	DOUBLE
9	boolfield	BOOLEAN
10	decimalfield	DECIMAL
11	floatfield	FLOAT

read Schema from Data File:

schema length is:12
0	stringfield	STRING
1	datefield	DATE
2	timefield	TIMESTAMP
3	varcharfield	VARCHAR
4	arrayfield	ARRAY
STRING
5	shortfield	SHORT
6	intfield	INT
7	longfield	LONG
8	doublefield	DOUBLE
9	boolfield	BOOLEAN
10	decimalfield	DECIMAL
11	floatfield	FLOAT

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
add interface
Any backward compatibility impacted?
No
Document update required?
Yes
Testing done
add test case
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
https://issues.apache.org/jira/browse/CARBONDATA-2951

CarbonDataQA · 2018-10-10T04:04:28Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/759/

CarbonDataQA · 2018-10-10T04:19:57Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/761/

CarbonDataQA · 2018-10-10T05:27:46Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/959/

CarbonDataQA · 2018-10-10T05:37:04Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9027/

xubo245 · 2018-10-10T06:29:47Z

@KanakaKumar @kunal642 @jackylk Please review it.

ajantha-bhat · 2018-10-26T05:09:57Z

docs/CSDK-guide.md

@@ -126,12 +126,39 @@ bool readFromS3(JNIEnv *env, char *argv[]) {
    reader.close();
 }

-// 3. destory JVM
+// 3. read schema
+/**


This can be removed.

As after #2792. we just put link to main.cpp. No need to duplicate the code

yes, removed

ajantha-bhat · 2018-10-26T05:11:46Z

store/CSDK/CarbonSchemaReader.cpp

+#include "CarbonSchemaReader.h"
+
+CarbonSchemaReader::CarbonSchemaReader(JNIEnv *env) {
+    this->carbonSchemaReaderClass = env->FindClass("org/apache/carbondata/sdk/file/CarbonSchemaReader");


Add validation, env->FindClass can return null.

ajantha-bhat · 2018-10-26T05:12:44Z

store/CSDK/CarbonSchemaReader.cpp

+}
+
+jobject CarbonSchemaReader::readSchemaInDataFile(char *path) {
+    jmethodID buildID = jniEnv->GetStaticMethodID(carbonSchemaReaderClass, "readSchemaInDataFile",


Add validation and for below get the exception from java using jni.
Do this for all the newly added API

ajantha-bhat · 2018-10-26T05:30:34Z

store/CSDK/main.cpp

+bool readSchemaInDataFile(JNIEnv *env) {
+    printf("\nread Schema from Data File:\n");
+    CarbonSchemaReader carbonSchemaReader(env);
+    jobject schema = carbonSchemaReader.readSchemaInDataFile(


do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbondata? I think we should not keep any binaries in repo.

no binaries files in repo, only in my local

ajantha-bhat · 2018-10-26T05:30:55Z

store/CSDK/main.cpp

+    printf("\nread Schema from Index File from S3:\n");
+    CarbonSchemaReader carbonSchemaReader(env);
+    jobject schema = carbonSchemaReader.readSchemaInIndexFile(
+            "s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");


do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbonindex? I think we should not keep any binaries in repo.

no binaries files in repo, only in my local

ajantha-bhat · 2018-10-26T05:31:44Z

store/CSDK/main.cpp

+    printf("\nread Schema from Index File:\n");
+    CarbonSchemaReader carbonSchemaReader(env);
+    jobject schema = carbonSchemaReader.readSchemaInIndexFile(
+            "s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");


same as above.

no binaries files in repo, only in my local

ajantha-bhat · 2018-10-26T05:32:42Z

store/CSDK/main.cpp

+ * @param env jni env
+ * @return whether it is success
+ */
+bool readSchemaInIndexFileFromS3(JNIEnv *env) {


in main.cpp this testcase is commented. so remove here also

ok, removed, I will implement in the future

ajantha-bhat · 2018-10-26T05:41:29Z

store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonSchemaReaderTest.java

@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more


a. No need to generate data for each testcase, can generate in beforeAll and all test case use same data.
b. CarbonReaderTest has already covered scenarios mentioned in the below test. (both data and index file read)

so, this test case is not required. Please remove it.

This test case is test for new class, like Schema. java and we should move read schema test case from CarbonReaderTest to CarbonSchemaReaderTest

ajantha-bhat · 2018-10-26T05:42:27Z

store/CSDK/Schema.cpp

+#include <jni.h>
+#include "Schema.h"
+
+Schema::Schema(JNIEnv *env, jobject schema) {


same as other file.

Add validation and for below get the exception from java using jni.
Do this for all the newly added API

ajantha-bhat · 2018-10-26T05:46:02Z

store/CSDK/main.cpp

+    printf("\nread Schema from Index File:\n");
+    CarbonSchemaReader carbonSchemaReader(env);
+    jobject schema = carbonSchemaReader.readSchemaInIndexFile(
+            "../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");


should not have binary in the repo. (index file)

CarbonDataQA · 2018-10-26T16:24:38Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1061/

CarbonDataQA · 2018-10-26T17:23:32Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1275/

CarbonDataQA · 2018-10-26T17:23:55Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9327/

xubo245 · 2018-10-27T08:28:03Z

retest this please

CarbonDataQA · 2018-10-27T08:42:19Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1085/

CarbonDataQA · 2018-10-27T09:43:06Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1297/

CarbonDataQA · 2018-10-27T09:46:00Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9349/

ajantha-bhat · 2018-10-29T13:22:03Z

store/CSDK/test/main.cpp

+    jobject schema;
+    try {
+        schema = carbonSchemaReader.readSchemaInDataFile(
+                "../../../../resources/carbondata/part-0-510199997055746_batchno0-0-null-510199277323454.carbondata");


cannot keep your local test case in repo. Make a general test case without binary dependency.

ok, removed, after write function merged, I will optimized again.

ajantha-bhat · 2018-10-29T13:22:10Z

store/CSDK/test/main.cpp

+    try {
+        schema = carbonSchemaReader.readSchemaInIndexFile(
+                "../../../../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");
+    } catch (jthrowable e) {


cannot keep your local test case in repo. Make a general test case without binary dependency.

ok, removed, after write function merged, I will optimized again.

CarbonDataQA · 2018-10-30T03:54:48Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1133/

CarbonDataQA · 2018-10-30T04:56:55Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9397/

CarbonDataQA · 2018-10-30T05:15:54Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1345/

xubo245 · 2018-10-30T07:54:41Z

@ajantha-bhat updated, please check again.

…n CSDK 1.support readSchemaInIndexFile 2.support readSchemaInDataFile 3.support get field name and data type name 4.suppport get array child element data type name 5.can read schema when carbonreader has set ak,sk,endpoint 6.TODO: need support read scehma from S3 in the future

xubo245 · 2018-10-31T02:04:14Z

rebase

CarbonDataQA · 2018-10-31T02:45:32Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1171/

CarbonDataQA · 2018-10-31T03:48:34Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9435/

CarbonDataQA · 2018-10-31T04:32:22Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1384/

xubo245 · 2018-10-31T06:47:26Z

@ajantha-bhat updated, please check again.

ajantha-bhat · 2018-10-31T09:56:02Z

LGTM.

Let one more reviewer review.

KanakaKumar · 2018-10-31T15:33:31Z

I suggest let's hold this PR till 2804 is concluded & merged to avoid duplicate work.

xubo245 · 2018-10-31T15:48:41Z

@KanakaKumar Can we provide C++ interface to read schema in this version？ PR2804 we can discuss how to enhance/optimize in next version, it's ok. provide C++ interface to read schema is more important

KanakaKumar · 2018-11-01T10:52:53Z

Ok Xubo, fine.

KanakaKumar · 2018-11-01T11:36:54Z

LGTM

QiangCai · 2018-11-01T12:06:38Z

ok, merge into master

…n CSDK 1.support readSchemaInIndexFile 2.support readSchemaInDataFile 3.support get field name and data type name 4.suppport get array child element data type name 5.can read schema when carbonreader has set ak,sk,endpoint 6.TODO: need support read scehma from S3 in the future This closes #2807

ajantha-bhat reviewed Oct 26, 2018

View reviewed changes

xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch 3 times, most recently from 51d0a75 to bdc025f Compare October 26, 2018 16:10

ajantha-bhat reviewed Oct 29, 2018

View reviewed changes

xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch from bdc025f to 4e6a8ef Compare October 30, 2018 03:41

xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch from 4e6a8ef to 48fd308 Compare October 31, 2018 02:03

asfgit closed this in ac94dba Nov 1, 2018

		@@ -0,0 +1,233 @@
		/*
		* Licensed to the Apache Software Foundation (ASF) under one or more

[CARBONDATA-2997] Support read schema from index file and data file in CSDK #2807

[CARBONDATA-2997] Support read schema from index file and data file in CSDK #2807

Conversation

xubo245 commented Oct 10, 2018 • edited

CarbonDataQA commented Oct 10, 2018

CarbonDataQA commented Oct 10, 2018

CarbonDataQA commented Oct 10, 2018

CarbonDataQA commented Oct 10, 2018

xubo245 commented Oct 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajantha-bhat Oct 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajantha-bhat Oct 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Oct 26, 2018

CarbonDataQA commented Oct 26, 2018

CarbonDataQA commented Oct 26, 2018

xubo245 commented Oct 27, 2018

CarbonDataQA commented Oct 27, 2018

CarbonDataQA commented Oct 27, 2018

CarbonDataQA commented Oct 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Oct 30, 2018

CarbonDataQA commented Oct 30, 2018

CarbonDataQA commented Oct 30, 2018

xubo245 commented Oct 30, 2018

xubo245 commented Oct 31, 2018

CarbonDataQA commented Oct 31, 2018

CarbonDataQA commented Oct 31, 2018

CarbonDataQA commented Oct 31, 2018

xubo245 commented Oct 31, 2018

ajantha-bhat commented Oct 31, 2018

KanakaKumar commented Oct 31, 2018

xubo245 commented Oct 31, 2018

KanakaKumar commented Nov 1, 2018

KanakaKumar commented Nov 1, 2018

QiangCai commented Nov 1, 2018

xubo245 commented Oct 10, 2018 •

edited

ajantha-bhat Oct 26, 2018 •

edited

ajantha-bhat Oct 26, 2018 •

edited