Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2997] Support read schema from index file and data file in CSDK #2807

Closed

Conversation

xubo245
Copy link
Contributor

@xubo245 xubo245 commented Oct 10, 2018

[CARBONDATA-2997] Support read schema from index file and data file in CSDK

1.support readSchemaInIndexFile
2.support readSchemaInDataFile
3.support get field name and data type name
4.suppport get array child element data type name
5.can read schema when carbonreader has set ak,sk,endpoint
6.TODO: need support read scehma from S3 in the future

read Schema from Index File:

schema length is:12
0	stringfield	STRING
1	datefield	DATE
2	timefield	TIMESTAMP
3	varcharfield	VARCHAR
4	arrayfield	ARRAY
STRING
5	shortfield	SHORT
6	intfield	INT
7	longfield	LONG
8	doublefield	DOUBLE
9	boolfield	BOOLEAN
10	decimalfield	DECIMAL
11	floatfield	FLOAT

read Schema from Data File:

schema length is:12
0	stringfield	STRING
1	datefield	DATE
2	timefield	TIMESTAMP
3	varcharfield	VARCHAR
4	arrayfield	ARRAY
STRING
5	shortfield	SHORT
6	intfield	INT
7	longfield	LONG
8	doublefield	DOUBLE
9	boolfield	BOOLEAN
10	decimalfield	DECIMAL
11	floatfield	FLOAT

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    add interface
  • Any backward compatibility impacted?
    No
  • Document update required?
    Yes
  • Testing done
    add test case
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    https://issues.apache.org/jira/browse/CARBONDATA-2951

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/759/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/761/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/959/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9027/

@xubo245
Copy link
Contributor Author

xubo245 commented Oct 10, 2018

@KanakaKumar @kunal642 @jackylk Please review it.

@@ -126,12 +126,39 @@ bool readFromS3(JNIEnv *env, char *argv[]) {
reader.close();
}

// 3. destory JVM
// 3. read schema
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

As after #2792. we just put link to main.cpp. No need to duplicate the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed

#include "CarbonSchemaReader.h"

CarbonSchemaReader::CarbonSchemaReader(JNIEnv *env) {
this->carbonSchemaReaderClass = env->FindClass("org/apache/carbondata/sdk/file/CarbonSchemaReader");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation, env->FindClass can return null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added.

}

jobject CarbonSchemaReader::readSchemaInDataFile(char *path) {
jmethodID buildID = jniEnv->GetStaticMethodID(carbonSchemaReaderClass, "readSchemaInDataFile",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation and for below get the exception from java using jni.
Do this for all the newly added API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

bool readSchemaInDataFile(JNIEnv *env) {
printf("\nread Schema from Data File:\n");
CarbonSchemaReader carbonSchemaReader(env);
jobject schema = carbonSchemaReader.readSchemaInDataFile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbondata? I think we should not keep any binaries in repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no binaries files in repo, only in my local

printf("\nread Schema from Index File from S3:\n");
CarbonSchemaReader carbonSchemaReader(env);
jobject schema = carbonSchemaReader.readSchemaInIndexFile(
"s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have these binaries part-0-510199997055746_batchno0-0-null-510199277323454.carbonindex? I think we should not keep any binaries in repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no binaries files in repo, only in my local

printf("\nread Schema from Index File:\n");
CarbonSchemaReader carbonSchemaReader(env);
jobject schema = carbonSchemaReader.readSchemaInIndexFile(
"s3a://sdk/WriterOutput/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no binaries files in repo, only in my local

* @param env jni env
* @return whether it is success
*/
bool readSchemaInIndexFileFromS3(JNIEnv *env) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in main.cpp this testcase is commented. so remove here also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed, I will implement in the future

@@ -0,0 +1,233 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. No need to generate data for each testcase, can generate in beforeAll and all test case use same data.
b. CarbonReaderTest has already covered scenarios mentioned in the below test. (both data and index file read)

so, this test case is not required. Please remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is test for new class, like Schema. java and we should move read schema test case from CarbonReaderTest to CarbonSchemaReaderTest

#include <jni.h>
#include "Schema.h"

Schema::Schema(JNIEnv *env, jobject schema) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other file.

Add validation and for below get the exception from java using jni.
Do this for all the newly added API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

printf("\nread Schema from Index File:\n");
CarbonSchemaReader carbonSchemaReader(env);
jobject schema = carbonSchemaReader.readSchemaInIndexFile(
"../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not have binary in the repo. (index file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no in repo

@xubo245 xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch 3 times, most recently from 51d0a75 to bdc025f Compare October 26, 2018 16:10
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1061/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1275/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9327/

@xubo245
Copy link
Contributor Author

xubo245 commented Oct 27, 2018

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1085/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1297/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9349/

jobject schema;
try {
schema = carbonSchemaReader.readSchemaInDataFile(
"../../../../resources/carbondata/part-0-510199997055746_batchno0-0-null-510199277323454.carbondata");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot keep your local test case in repo. Make a general test case without binary dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed, after write function merged, I will optimized again.

try {
schema = carbonSchemaReader.readSchemaInIndexFile(
"../../../../resources/carbondata/510199997055746_batchno0-0-null-510199277323454.carbonindex");
} catch (jthrowable e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot keep your local test case in repo. Make a general test case without binary dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed, after write function merged, I will optimized again.

@xubo245 xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch from bdc025f to 4e6a8ef Compare October 30, 2018 03:41
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1133/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9397/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1345/

@xubo245
Copy link
Contributor Author

xubo245 commented Oct 30, 2018

@ajantha-bhat updated, please check again.

…n CSDK

1.support readSchemaInIndexFile
2.support readSchemaInDataFile
3.support get field name and data type name
4.suppport get array child element data type name
5.can read schema when carbonreader has set ak,sk,endpoint
6.TODO: need support read scehma from S3 in the future
@xubo245 xubo245 force-pushed the CARBONDATA-2997_supportReadSchema branch from 4e6a8ef to 48fd308 Compare October 31, 2018 02:03
@xubo245
Copy link
Contributor Author

xubo245 commented Oct 31, 2018

rebase

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1171/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9435/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1384/

@xubo245
Copy link
Contributor Author

xubo245 commented Oct 31, 2018

@ajantha-bhat updated, please check again.

@ajantha-bhat
Copy link
Member

LGTM.

Let one more reviewer review.

@KanakaKumar
Copy link
Contributor

I suggest let's hold this PR till 2804 is concluded & merged to avoid duplicate work.

@xubo245
Copy link
Contributor Author

xubo245 commented Oct 31, 2018

@KanakaKumar Can we provide C++ interface to read schema in this version? PR2804 we can discuss how to enhance/optimize in next version, it's ok. provide C++ interface to read schema is more important

@KanakaKumar
Copy link
Contributor

Ok Xubo, fine.

@KanakaKumar
Copy link
Contributor

LGTM

@QiangCai
Copy link
Contributor

QiangCai commented Nov 1, 2018

ok, merge into master

@asfgit asfgit closed this in ac94dba Nov 1, 2018
asfgit pushed a commit that referenced this pull request Nov 21, 2018
…n CSDK

1.support readSchemaInIndexFile
2.support readSchemaInDataFile
3.support get field name and data type name
4.suppport get array child element data type name
5.can read schema when carbonreader has set ak,sk,endpoint
6.TODO: need support read scehma from S3 in the future

This closes #2807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants