Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KYLIN-4035 Calculate column cardinality by using spark engine #680

Merged
merged 1 commit into from Jun 24, 2019

Conversation

majic31
Copy link

@majic31 majic31 commented Jun 10, 2019

link to https://issues.apache.org/jira/browse/KYLIN-4035
Support calculating column cardinality by using spark engine

@asfgit
Copy link

asfgit commented Jun 10, 2019

Can one of the admins verify this patch?

1 similar comment
@asfgit
Copy link

asfgit commented Jun 10, 2019

Can one of the admins verify this patch?

@codecov-io
Copy link

codecov-io commented Jun 10, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@5f5895d). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master    #680   +/-   ##
========================================
  Coverage          ?   25.7%           
  Complexity        ?    6011           
========================================
  Files             ?    1386           
  Lines             ?   82510           
  Branches          ?   11568           
========================================
  Hits              ?   21207           
  Misses            ?   59258           
  Partials          ?    2045
Impacted Files Coverage Δ Complexity Δ
...org/apache/kylin/engine/spark/SparkExecutable.java 0% <0%> (ø) 0 <0> (?)
.../java/org/apache/kylin/common/KylinConfigBase.java 12.92% <0%> (ø) 42 <0> (?)
...va/org/apache/kylin/rest/service/TableService.java 13.95% <0%> (ø) 9 <0> (?)
...che/kylin/engine/spark/SparkColumnCardinality.java 0% <0%> (ø) 0 <0> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f5895d...2148fec. Read the comment docs.

})
.sortByKey(true, 1);

if (resultRdd.count() == 0) {
Copy link
Member

@hit-lacus hit-lacus Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both count and saveAsNewAPIHadoopFile are action of RDD, I think here resultRdd should be cached to avoid recompute, am I right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's a good point. I forgot to cache it.
I will add cache, Thank you !

@hit-lacus
Copy link
Member

In my hadoop cluster(JDK8, hadoop2.6-cdh5.6 with spark-2.3.3-hadoop2.6), I have verfied this patch pass happy path with correct result.

By MR

image

By Spark

image

Yarn Successful Jobs

image

@@ -1430,6 +1430,10 @@ public boolean isSparkFactDistinctEnable() {
return Boolean.parseBoolean(getOptional("kylin.engine.spark-fact-distinct", "false"));
}

public boolean isSparkCardinalityEnabled(){
return Boolean.parseBoolean(getOptional("kylin.engin.spark-cardinality", "false"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"engin" should be "engine"

@coveralls
Copy link

Pull Request Test Coverage Report for Build 4615

  • 0 of 98 (0.0%) changed or added relevant lines in 4 files are covered.
  • 155 unchanged lines in 8 files lost coverage.
  • Overall coverage decreased (-0.04%) to 28.193%

Changes Missing Coverage Covered Lines Changed/Added Lines %
core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java 0 1 0.0%
engine-spark/src/main/java/org/apache/kylin/engine/spark/SparkExecutable.java 0 14 0.0%
server-base/src/main/java/org/apache/kylin/rest/service/TableService.java 0 14 0.0%
engine-spark/src/main/java/org/apache/kylin/engine/spark/SparkColumnCardinality.java 0 69 0.0%
Files with Coverage Reduction New Missed Lines %
core-job/src/main/java/org/apache/kylin/job/impl/curator/CuratorScheduler.java 1 68.64%
core-dictionary/src/main/java/org/apache/kylin/dict/lookup/cache/RocksDBLookupTable.java 1 81.08%
core-cube/src/main/java/org/apache/kylin/cube/cuboid/TreeCuboidScheduler.java 2 68.46%
core-job/src/main/java/org/apache/kylin/job/impl/threadpool/DefaultScheduler.java 2 80.23%
source-kafka/src/main/java/org/apache/kylin/source/kafka/util/KafkaClient.java 26 0.0%
engine-mr/src/main/java/org/apache/kylin/engine/mr/steps/UHCDictionaryJob.java 29 0.0%
query/src/main/java/org/apache/kylin/query/adhoc/PushDownRunnerJdbcImpl.java 46 0.0%
core-job/src/main/java/org/apache/kylin/job/lock/zookeeper/ZookeeperDistributedLock.java 48 0.0%
Totals Coverage Status
Change from base Build 4593: -0.04%
Covered Lines: 23254
Relevant Lines: 82481

💛 - Coveralls

Copy link

@nichunen nichunen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to me

@nichunen nichunen merged commit 760aefd into apache:master Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants