Skip to content

Conversation

@watermen
Copy link
Contributor

@watermen watermen commented Dec 7, 2015

Bug

I find this bug when I use cache table,

spark-sql> create table src_p(key int, value int) stored as parquet;
OK
Time taken: 3.144 seconds
spark-sql> cache table src_p;
Time taken: 1.452 seconds
spark-sql> explain extended select count(*) from src_p;

I got the wrong physical plan

== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#28L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#33L])
   Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]

and the right physical plan is

== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#47L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#62L])
   InMemoryColumnarTableScan (InMemoryRelation [key#45,value#46], true, 10000, StorageLevel(true, true, false, true, 1), (Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][key#9,value#10]), Some(src_p))

Reason

When the implementation classes of MultiInstanceRelation(eg. LogicalRelation, LocalRelation) are warpped with SubQueries, they can't invoke the right sameResult function in their own implementation. So we need to eliminate SubQueries first and then try to invoke sameResult function in their own implementation.
Like:
When plan is Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p], expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1)))), first eliminate SubQueries, and then will invoke the sameResult function in LogicalRelation instead of LogicalPlan.

@dilipbiswal
Copy link
Contributor

@watermen I believe SPARK-11246 was a very similar defect and was fixed in 1.5. Is this a different scenario that what was addressed in that defect ?
@yhuai had suggested the fix.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47250 has finished for PR 10169 at commit f1ef856.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@watermen
Copy link
Contributor Author

watermen commented Dec 7, 2015

@dilipbiswal Yes, SPARK-11246 has already fixed it, Thanks and I'll close this PR.

@watermen watermen closed this Dec 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants