Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3127]Fix the HiveExample & TestCarbonSerde exception #2969

Closed
wants to merge 38 commits into from
Closed

[CARBONDATA-3127]Fix the HiveExample & TestCarbonSerde exception #2969

wants to merge 38 commits into from

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Dec 2, 2018

[CARBONDATA-3127]Hive module test case has been commented off,can' t run.
This pull request fix TestCarbonSerde test case exception with maven compile.Method called deserializeAndSerializeLazySimple of TestCarbonSerde class calls the interface called serializeStartKey which is not exsit in CarbonHiveSerDe class.Just modify replace serializeStartKey with serialize and remove comments of TestCarbonSerde.
Previous hive-integration HiveExample exists obvious bug that there is no data when select HIVE_CARBON_EXAMPLE table.I have already fixed the bug above in this pull request.Location was configured wrong after creating hive table.And block lack of detail info and footer offset when do querying hive table operation.Therefore I set detail info of block with footer offset through input split.

  • Any interfaces changed?
    No
  • Any backward compatibility impacted?
    No
  • Document update required?
    No
  • Testing done
    Modify TestCarbonSerde test
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    No

@SteNicholas
Copy link
Member Author

@xubo245 @zzcclp Please review the small update.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1609/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9869/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1820/

final Properties tbl = new Properties();

// Set the configuration parameters
tbl.setProperty("columns", "ashort,aint,along,adouble,adecimal,astring,alist");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care the spell format, including the case sensitive, for example: aShort

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xubo245 I have already fixed the spell formate you referred.


import java.util.Properties;

public class TestCarbonSerde extends TestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care the spell format, including the case sensitive,for example:TestCarbonSerDe

Please check other code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xubo245 I have already fixed the spell formate you referred.

arr[0] = new ShortWritable((short) 456);
arr[1] = new IntWritable(789);
arr[2] = new LongWritable(1000l);
arr[3] = new DoubleWritable((double) 5.3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need: (double) ,please remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xubo245 I have already update this transformation.

@SteNicholas
Copy link
Member Author

@zzcclp Please review this update;

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1610/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9870/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1821/

@zzcclp
Copy link
Contributor

zzcclp commented Dec 3, 2018

LGTM

arr[2] = new LongWritable(1000l);
arr[3] = new DoubleWritable(5.3);
arr[4] = new HiveDecimalWritable(HiveDecimal.create(1));
arr[5] = new Text("carbonSerde binary".getBytes("UTF-8"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care the spell format, including the case sensitive,for example:CarbonSerDe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xubo245 Sorry to forget string spell format,please review it.Sorry for the inconvenience.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1640/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9900/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1851/

@SteNicholas
Copy link
Member Author

@zzcclp Please review this update.

@xubo245
Copy link
Contributor

xubo245 commented Dec 5, 2018

LGTM

1 similar comment
@zzcclp
Copy link
Contributor

zzcclp commented Dec 5, 2018

LGTM


import java.util.Properties;

public class TestCarbonSerDe extends TestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the other tests in carbon such as carbon-core module. Actually we do not need to extend TestCase here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already remove "extend TestCase" here, and update asserts of method.

public void testCarbonHiveSerDe() throws Throwable {
try {
// Create the SerDe
System.out.println("test: testCarbonHiveSerDe");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not recommended to use stdout in test code. Currently we only use stdout in example code not test code. If you want to print something, use carbon Logger instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,I have already replace stdout with Logger as same as the way of other carbon test code.

System.out.println("test: testCarbonHiveSerDe - OK");

} catch (final Throwable e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Logger instead of printing stacktarce in test code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already remove this catch-rethrow.

deserializeAndSerializeLazySimple(serDe, arrWritable);
System.out.println("test: testCarbonHiveSerDe - OK");

} catch (final Throwable e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after observing the following procedure, I think there is no need to catch-rethrow the exception here. The test framework will handle this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,I think so.I have already remove this non-sense catch-rethrow in this method.

@xuchuanyin
Copy link
Contributor

xuchuanyin commented Dec 5, 2018

@SteNicholas Nice to see your work on the existed problems. And it seems the previous code has some problems which are inherited by your code. So I suggest you to fix them at the same time. Please check the above comments.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1670/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9930/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1881/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1672/

ajantha-bhat and others added 21 commits December 21, 2018 01:43
problem:
Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session.
But presto carbon integration doesn't support that. It can currently read
only the transactional table output files.

solution:
Hence we can enhance presto to read SDK output files. This will increase
the use cases for presto-carbon integration.

The above scenario can be achieved by inferring schema if metadata folder
not exists and
setting read committed scope to LatestFilesReadCommittedScope, if
non-transctional table output files are present.

This closes #2982
Changed the two Complex Delimiters used to '\001' and '\002'.

This closes #2979
… Describe Formatted

Updated Document and added column compressor used in Describe Formatted Command

This closes #2986
Support Create DDL for Map type.

This closes #2980
Fix some spell error

This closes #2890
…beeline to read/write data from/to S3

This PR fix NoClassDefFoundError when use thriftServer and beeline to use table on cloud storage

This closes #2925
Changed the complex delimiter in testcases

This closes #2989
…maps

Changes in this PR:
BlockDatamap and blockletDatamap can store multiple files information. Each file is one row in that datamap. But non-default datamaps are not like that, so default datamaps distribution in multithread happens based on number of entries in datamaps, for non-default datamps distribution is based on number of datamaps (one datamap is considered as one record for non-default datamaps)

so, supported block pruning in multi-thread for non-default datamaps in this PR.

This closes #2949
interface in carbon writer of C++ SDK support withTableProperty,
withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,withBlockSize,
withBlockletSize, localDictionaryThreshold, enableLocalDictionary, sortBy in C++ SDK

support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,
withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary,sortBy in C++ SDK

This closes #2899
…s Empty

What was the issue?
For SDK Write , it was going into bad record by returning null on passing empty array for Complex Type.

What has been changed?
Added a check for empty array. This will return an empty array.

This closes #2983
Problem
When some pages is giving 0 rows, then also BlockletScanResult is uncompressing all the pages. When compression is high and one blocklet contains more number of pages in this case page uncompression is taking more time and impacting query performance.

Solution
Added check if number of record after filtering is 0 then no need to uncompress that page

This closes #2985
What was the issue?
After doing SDK Write, Select * was failing for 'long_string_columns' with trailing space.

What has been changed?
Removed the trailing space in ColumnName.

This closes #2988
…ult sort_scope

What are the changes proposed in this PR?
This PR is based on the below discussion that is mentioned in the mail-chain.

http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Make-no-sort-as-default-sort-scope-and-keep-sort-columns-as-empty-by-default-td70233.html

changes are done to make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default.
Also few issues are fixed in 'no_sort' and empty sort_columns flow.

This closes #2966
SDV Testcases were failing because Delimiter and Complex Delimiter was same.
So changed the Complex Delimiter in Load Option.

This closes #3002
replace apache common log with carbondata log4j

This closes #2999
Added datasource test cases for Spark File Format.

This closes #2951
…bloom filter

Problem
java.lang.IllegalAccessError is thrown when query on bloom filter without compress on CarbonThriftServer.

Analyse
similar problem was occur when get/set BitSet in CarbonBloomFilter, it uses reflection to solve. We can do it like this.
Since we have set the BitSet already, another easier way is to call super class method to avoid accessing it from CarbonBloomFilter

Solution
if bloom filter is not compressed, call super method to test membership

This closes #3000
Problem: Global Dictionary was not working for Map datatype and giving Null values.

Solution:Added the case for Global Dictionary to be created in case the datatype is Complex Map.

This closes #3006
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1887/

@SteNicholas SteNicholas deleted the CARBONDATA-3127 branch December 20, 2018 18:25
@SteNicholas SteNicholas restored the CARBONDATA-3127 branch December 20, 2018 18:26
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1888/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10142/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2097/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet