New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-1305] Add limit for external dictionary values #1166
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/462/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3050/ |
d3768a8
to
5f90235
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3051/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/463/ |
/** | ||
* Max number of dictionary values that can be given with external dictionary | ||
*/ | ||
public static final int MAX_EXTERNAL_DICTIONARY_SIZE = 10000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this constant to specific file usage as this is not configurable
@@ -83,7 +84,12 @@ case class PrimitiveParser(dimension: CarbonDimension, | |||
|
|||
def parseString(input: String): Unit = { | |||
if (hasDictEncoding && input != null) { | |||
set.add(input) | |||
if (set.size < CarbonCommonConstants.MAX_EXTERNAL_DICTIONARY_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will be done for each value for each column...please ensure there is no performance degradation in dictionary generation due to this check...try with some 100 columns with and without this change
5f90235
to
ecb490d
Compare
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/471/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3060/ |
|
||
package org.apache.carbondata.processing.newflow.exception; | ||
|
||
public class DictionaryLimitExceededException extends CarbonDataLoadingException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use no retry exception instead of specific exception.
@@ -460,6 +467,10 @@ class CarbonGlobalDictionaryGenerateRDD( | |||
s"\n write sort info: $sortIndexWriteTime") | |||
} | |||
} catch { | |||
case dictionaryException: DictionaryLimitExceededException => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a generic NoRetryExceptionClass instead of Specific exception class.
ecb490d
to
fd4bec9
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3078/ |
Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/489/ |
fd4bec9
to
fe3c28f
Compare
Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/490/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3079/ |
fe3c28f
to
5ab9eff
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3081/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/492/ |
15d32db
to
1dab411
Compare
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/590/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3185/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/591/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3186/ |
Can one of the admins verify this patch? |
1dab411
to
49f808f
Compare
SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/327/ |
49f808f
to
b47ba2b
Compare
SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/336/ |
b47ba2b
to
9a68491
Compare
retest this please |
1 similar comment
retest this please |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/489/ |
if (set.size < CarbonLoadOptionConstants.MAX_EXTERNAL_DICTIONARY_SIZE) { | ||
set.add(input) | ||
} else { | ||
throw new NoRetryException(s"Cannot provide more than ${ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also handle NoRetryException for update also inset in case of badrecords
retest this please |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/501/ |
retest this please |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/508/ |
SDV one failure is common across other PRs |
LGTM |
Analysis: During dictionary creation the dictionary values are kept in a HashSet. When the size of hashset reaches more than 500000000 this exception is thrown. Solution: Limit the dictionary values to 10000000 This closes apache#1166
Analysis: During dictionary creation the dictionary values are kept in a HashSet. When the size of hashset reaches more than 500000000 this exception is thrown.
Solution: Limit the dictionary values to 10000000