Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4166] Geo spatial Query Enhancements #4127

Closed
wants to merge 2 commits into from

Conversation

Indhumathi27
Copy link
Contributor

@Indhumathi27 Indhumathi27 commented Apr 29, 2021

Why is this PR needed?

  1. Currently, for IN_POLYGON_LIST and IN_POLYLINE_LIST udf’s, polygons need to be
    specified in SQL. If the polygon list grows in size, then the SQL will also be too long,
    which may affect query performance, as SQL analysing cost will be more.
  2. If Polygons are defined as a Column in a new dimension table, then, Spatial dimension
    table join can be supported in order to support aggregation on spatial table columns
    based on polygons.

What changes were proposed in this PR?

  1. Support IN_POLYGON_LIST and IN_POLYLINE_LIST with SELECT QUERY on the
    polygon table.
  2. Support IN_POLYGON filter as join condition for spatial JOIN queries.

Does this PR introduce any user interface change?

  • Yes.

Is any new testcase added?

  • Yes

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3542/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5289/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3545/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5290/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3549/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5294/

@Indhumathi27 Indhumathi27 changed the title [WIP] Geo spatial improvements [CARBONDATA-4166] Geo spatial Query Enhancements May 5, 2021
docs/spatial-index-guide.md Outdated Show resolved Hide resolved
docs/spatial-index-guide.md Show resolved Hide resolved
docs/spatial-index-guide.md Outdated Show resolved Hide resolved
@@ -30,6 +32,7 @@ object GeoUtilUDFs {
sparkSession.udf.register("LatLngToGeoId", new LatLngToGeoIdUDF)
sparkSession.udf.register("ToUpperLayerGeoId", new ToUpperLayerGeoIdUDF)
sparkSession.udf.register("ToRangeList", new ToRangeListUDF)
sparkSession.udf.register("ToRangeListAsString", new ToRangeListAsStringUDF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these UDF are exposed to user right ? can we update in the document ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This udf is not exposed to user. This is for internal purpose only.

val matchedStr = matcher.group
range = matchedStr
}
val ranges = PolygonRangeListExpression.getRangeListFromString(range)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need to check against null for ranges? some places we check and some places we don't. (example ToRangeListAsStringUDF) can we make it uniform? If not required, remove from other places also. also maybe extract a common method for matcher and find if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL check is already handled here in line:49. Handled some refactoring to extract common code to new method.

import org.apache.carbondata.geo.scan.expression.PolygonRangeListExpression
import org.apache.carbondata.spark.rdd.CarbonScanRDD

case class BroadCastPolygonFilterPushJoin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many things are common with BroadCastSIFilterPushJoin, we cannot extend the same class ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BroadCastPolygonFilterPushJoin implementation is not same as BroadCastSIFilterPushJoin in terms of filter and method definition. so, better to keep it seperate.

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5308/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3563/

@ajantha-bhat
Copy link
Member

LGTM

@asfgit asfgit closed this in c825730 May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants