Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-1400] Fix bug of array column out of bound when writing carbondata file #1273

Closed
wants to merge 7 commits into from

Conversation

jackylk
Copy link
Contributor

@jackylk jackylk commented Aug 19, 2017

If there is a big array in input csv file, when loading carbondata table, it may throw ArrayIndexOutOfBoundException because data exceed page size (32000 rows)

This PR fixed it by changing complex column encoding to DirectCompressionEncoding
This PR added a test case to test input data with big array

@CarbonDataQA
Copy link

SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/265/

@CarbonDataQA
Copy link

SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/268/

@CarbonDataQA
Copy link

SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/273/

@jackylk jackylk force-pushed the complex branch 3 times, most recently from 75bfdcc to 7f956b1 Compare August 29, 2017 15:14
@CarbonDataQA
Copy link

SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/310/

@jackylk jackylk force-pushed the complex branch 2 times, most recently from aaefd63 to 174814f Compare September 11, 2017 17:02
@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/677/

@jackylk
Copy link
Contributor Author

jackylk commented Sep 11, 2017

retest this please

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/680/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/684/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/685/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/686/

@zzcclp
Copy link
Contributor

zzcclp commented Sep 12, 2017

retest this please

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/688/

ByteArrayOutputStream stream = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(stream);
for (byte[] byteArrayDatum : byteArrayData) {
out.writeInt(byteArrayDatum.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can' t we use short here?

Copy link
Contributor Author

@jackylk jackylk Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for backward compatible, old code use 4 bytes to store length. I use the same for write and read for variable length column page

@ravipesala
Copy link
Contributor

@jackylk tests are failing in 1.6, please check

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/711/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/713/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/717/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/22/

static ColumnPage newDecimalColumnPage(byte[] lvEncodedBytes, int scale, int precision)
throws MemoryException {
static ColumnPage newDecimalColumnPage(TableSpec.ColumnSpec columnSpec, byte[] lvEncodedBytes,
int scale, int precision) throws MemoryException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since columnSpec is passed can we remove scale and precision from methods and get from columnspec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try, but it is a temporary solution.
I think the correct way is to make DataType a class instead of enum, and keep precision and scale in that.

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/724/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/98/

@ravipesala
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/99/

@ravipesala
Copy link
Contributor

@jackylk it seems compilation fails after rebase to master. please check once

@ravipesala
Copy link
Contributor

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder1/4/

@ravipesala
Copy link
Contributor

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder1/5/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/103/

@ravipesala
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 8c1ddbf Sep 13, 2017
@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/725/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/731/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/730/

xubo245 pushed a commit to xubo245/carbondata that referenced this pull request Sep 17, 2017
…arbondata file

If there is a big array in input csv file, when loading carbondata table, it may throw ArrayIndexOutOfBoundException because data exceed page size (32000 rows)

This PR fixed it by changing complex column encoding to DirectCompressionEncoding
This PR added a test case to test input data with big array

This closes apache#1273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants