-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KYLIN-3926 set sourceRecordCount when updating statistics #579
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## master #579 +/- ##
============================================
- Coverage 25.38% 25.38% -0.01%
- Complexity 5803 5864 +61
============================================
Files 1372 1379 +7
Lines 81137 81855 +718
Branches 11377 11475 +98
============================================
+ Hits 20599 20777 +178
- Misses 58525 59057 +532
- Partials 2013 2021 +8
Continue to review full report at Codecov.
|
@@ -43,6 +43,11 @@ public static void writeCuboidStatistics(Configuration conf, Path outputPath, // | |||
writeCuboidStatistics(conf, outputPath, cuboidHLLMap, samplingPercentage, 0, 0, 0); | |||
} | |||
|
|||
public static void writeCuboidStatistics(Configuration conf, Path outputPath, // | |||
Map<Long, HLLCounter> cuboidHLLMap, int samplingPercentage, long sourceRecordCoun) throws IOException { | |||
writeCuboidStatistics(conf, outputPath, cuboidHLLMap, samplingPercentage, 0, 0, sourceRecordCoun); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: should be sourceRecordCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just copied from the existing parameter name. However, it's better to change it from sourceRecordCoun to sourceRecordCount.
@@ -120,8 +129,11 @@ protected ExecuteResult doWork(ExecutableContext context) throws ExecuteExceptio | |||
tempFile.delete(); | |||
} | |||
} | |||
sourceRecordCount *= effectiveTimeRange == 0 ? 0 : newSegment.getTSRange().duration() / effectiveTimeRange; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not convert to double to calculate the accurate data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to do so. Row count itself is an approximate one. And this data is set to long type at other place and we shouldn't break the compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For details, please check http://issues.apache.org/jira/browse/KYLIN-3453
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the division
Should SparkMergingDictionary MergeDictionaryMapper also be changed? |
@kyotoYaho Hi, would you please update the pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to me
@@ -120,8 +129,11 @@ protected ExecuteResult doWork(ExecutableContext context) throws ExecuteExceptio | |||
tempFile.delete(); | |||
} | |||
} | |||
sourceRecordCount *= effectiveTimeRange == 0 ? 0 : newSegment.getTSRange().duration() / effectiveTimeRange; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the division
No description provided.