Do filter push down when reading from views #160

diggerk · 2020-04-23T22:33:58Z

Here's a fix for filter push down for views. I've moved temp table creation to the moment when "buildScan" is invoked. The defaultTableDefinition type is changed to "TableDefinition" and a separate method "getNumBytes" is created to calculate the size of the table. The "getNumBytes" returns zero bytes for views, and that should be fine as the important part is that the connector calculates the number of partitions after the temp table is generated and by that time the getNumBytes will be called on the temp table, not the view.

davidrabinowitz · 2020-04-23T23:11:43Z

/gcbrun

davidrabinowitz · 2020-04-24T00:38:51Z

connector/src/main/scala/com/google/cloud/spark/bigquery/direct/DirectBigQueryRelation.scala

+  def getNumBytes(tableDefinition: TableDefinition): Long = {
+    val tableType = tableDefinition.getType
+    if (options.viewsEnabled && TableDefinition.Type.VIEW == tableType) {
+      0


Please replace the value with the default: sqlContext.conf.defaultSizeInBytes (taken from here)
Based on the documentation having a size of 0 may have side effects.

sqlContext.conf is not accessible outside of org.apache.spark.sql so I had to use sqlContext.sparkSession.sessionState.conf.defaultSizeInBytes instead

davidrabinowitz · 2020-04-24T00:39:43Z

Hi @diggerk , thanks for the contribution! Can you please have a look at the comment above?

Using 0 as Spark relation size estimate may lead to spark broadcasting the relation that potentially represets a large table

ghost · 2020-04-24T01:41:42Z

Hi @davidrabinowitz , thanks for the catch, I fixed that.

davidrabinowitz · 2020-04-24T01:48:00Z

/gcbrun

Do filter push down when reading from views

f5597ab

davidrabinowitz requested changes Apr 24, 2020

View reviewed changes

Use SQLConf.defaultSizeInBytes for views size estimate

4392754

Using 0 as Spark relation size estimate may lead to spark broadcasting the relation that potentially represets a large table

davidrabinowitz approved these changes Apr 24, 2020

View reviewed changes

davidrabinowitz merged commit 1095dfd into GoogleCloudDataproc:master Apr 24, 2020

davidrabinowitz mentioned this pull request Apr 27, 2020

Filters are not used when materializing views #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do filter push down when reading from views #160

Do filter push down when reading from views #160

diggerk commented Apr 23, 2020

davidrabinowitz commented Apr 23, 2020

davidrabinowitz Apr 24, 2020

ghost Apr 24, 2020

davidrabinowitz commented Apr 24, 2020

ghost commented Apr 24, 2020

davidrabinowitz commented Apr 24, 2020

Do filter push down when reading from views #160

Do filter push down when reading from views #160

Conversation

diggerk commented Apr 23, 2020

davidrabinowitz commented Apr 23, 2020

davidrabinowitz Apr 24, 2020

Choose a reason for hiding this comment

ghost Apr 24, 2020

Choose a reason for hiding this comment

davidrabinowitz commented Apr 24, 2020

ghost commented Apr 24, 2020

davidrabinowitz commented Apr 24, 2020