Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-30] Add raster data support in Sedona SQL #523

Merged
merged 81 commits into from
Jun 17, 2021
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
913ad3c
ST_GeomFromRaster
shantanuaggarwal2695 Apr 28, 2021
d29eaa2
testcases for ST_GeomFromRaster
shantanuaggarwal2695 Apr 28, 2021
42ce86a
Test for raster constructor
shantanuaggarwal2695 Apr 28, 2021
54d189c
final changes for raster test
shantanuaggarwal2695 Apr 28, 2021
6153acd
ST_BandsFromRaster
shantanuaggarwal2695 May 2, 2021
1e9d54f
bands from raster
shantanuaggarwal2695 May 2, 2021
6b35f0b
ST_BandsFromRaster
shantanuaggarwal2695 May 3, 2021
d7ceb97
ST_RasterBands
shantanuaggarwal2695 May 3, 2021
7d285e2
combined constructor
shantanuaggarwal2695 May 4, 2021
5c436dd
Geotiff dataframe Loader
shantanuaggarwal2695 May 4, 2021
bcc72ca
small changes
shantanuaggarwal2695 May 7, 2021
57a44be
cleaning code
shantanuaggarwal2695 May 7, 2021
767bd80
ST_FeomFromRaster and ST_DataframeFromRaster
shantanuaggarwal2695 May 13, 2021
44e3bea
ST_GeomFromRaster and ST_DataframeFromRaster
shantanuaggarwal2695 May 13, 2021
7433c28
remove raster.csv
shantanuaggarwal2695 May 13, 2021
8d41782
spaces removal
shantanuaggarwal2695 May 13, 2021
c6c9b23
cleaning
shantanuaggarwal2695 May 13, 2021
bf821cc
.iml files
shantanuaggarwal2695 May 13, 2021
84c36c2
gitignore
shantanuaggarwal2695 May 13, 2021
1c8e1a7
main pom
shantanuaggarwal2695 May 13, 2021
6630188
geometry and dataframe
shantanuaggarwal2695 May 14, 2021
068ca92
final changes
shantanuaggarwal2695 May 14, 2021
9630d74
ST_DataframeFromRaster
shantanuaggarwal2695 May 14, 2021
f5c6a00
final changes
shantanuaggarwal2695 May 14, 2021
53663cb
final change
shantanuaggarwal2695 May 14, 2021
264b829
final changes
shantanuaggarwal2695 May 14, 2021
02afc7f
spaces
shantanuaggarwal2695 May 14, 2021
fa7d8b4
comments by Jia
shantanuaggarwal2695 May 14, 2021
9aad497
comment changes
shantanuaggarwal2695 May 14, 2021
7c67294
clean code
shantanuaggarwal2695 May 15, 2021
275dfa7
Merge branch 'apache:master' into main_thesis
shantanuaggarwal2695 May 15, 2021
bcee4e5
comment changes
shantanuaggarwal2695 May 15, 2021
2ee8363
Merge branch 'main_thesis' of https://github.com/asu-cse578-f2020/inc…
shantanuaggarwal2695 May 15, 2021
0d28f47
changes
shantanuaggarwal2695 May 15, 2021
2cd4e80
spaces and logic
shantanuaggarwal2695 May 15, 2021
25bffd1
functions.scala
shantanuaggarwal2695 May 15, 2021
de420a0
final changes
shantanuaggarwal2695 May 16, 2021
ec3ccc2
final_commit with documentation
shantanuaggarwal2695 May 17, 2021
931e2d7
final commit Documentation 2
shantanuaggarwal2695 May 17, 2021
47073ed
documentation code
shantanuaggarwal2695 May 17, 2021
c0fb98e
final commit
shantanuaggarwal2695 May 17, 2021
93fc155
documentation
shantanuaggarwal2695 May 17, 2021
629d764
raster algebra
shantanuaggarwal2695 May 19, 2021
523a16a
raster algebra functions
shantanuaggarwal2695 May 20, 2021
56f6117
raster algebra
shantanuaggarwal2695 May 20, 2021
af5cb92
Map algebra function testing
shantanuaggarwal2695 May 20, 2021
0218fb0
Fix bugs in raster.scala and speed up the performance
jiayuasu May 21, 2021
363f27b
Change the GeoTiff image to a more meaningful image
jiayuasu May 21, 2021
8f51c26
Change the doc name
jiayuasu May 21, 2021
84285a9
Merge branch 'master' into main_thesis
jiayuasu May 21, 2021
64e85cd
Jia changes
shantanuaggarwal2695 May 22, 2021
6d6a8af
Raster algebra testing
shantanuaggarwal2695 May 23, 2021
8983548
raster algebra test
shantanuaggarwal2695 May 23, 2021
4e5441f
adding testing for raster algebra operations
shantanuaggarwal2695 May 23, 2021
7cc6584
Adding some testcases for RS_FetchRegion
shantanuaggarwal2695 May 23, 2021
80dc4ca
Adding some extra testcases for map algebra operations
shantanuaggarwal2695 May 24, 2021
cfc8e57
Merge branch 'master' into main_thesis
jiayuasu May 26, 2021
996f3a3
Adding RS_Width and RS_Height accessors for a geotiff image
shantanuaggarwal2695 May 26, 2021
2a80aab
Reading geotiff images
shantanuaggarwal2695 May 26, 2021
bc2f8e2
Adding geotiff loader in sedona
shantanuaggarwal2695 May 27, 2021
10f98b6
Adding Data-Source register for geotiff loader
shantanuaggarwal2695 May 27, 2021
1262428
Final changes for Geotiff loader
shantanuaggarwal2695 May 27, 2021
908e7d6
Tests for raster algebra
shantanuaggarwal2695 May 27, 2021
2cb0a5a
Raster algebra changes
shantanuaggarwal2695 May 28, 2021
2b0f7ee
final commit for geotiff loader and map algebra operators
shantanuaggarwal2695 May 28, 2021
6f066c2
RS_Base64 and Adding anchor to GeotiffFileFormat.scala
shantanuaggarwal2695 May 28, 2021
9068d12
Adding GeotiffFileFormat to DataSource
shantanuaggarwal2695 May 28, 2021
d4f4853
Documentation for map algebra functions
shantanuaggarwal2695 May 29, 2021
356aa26
Documentation of map algebra operators
shantanuaggarwal2695 May 29, 2021
0bd7f8d
Refactor the code structure a little bit
jiayuasu May 29, 2021
85b0e2e
Update docs, change nChannel to nBands
jiayuasu May 29, 2021
28f3262
Improve the test
jiayuasu May 29, 2021
a5a5c49
RS_Encode and RS_HTML for displaying geotiff images in a dataframe
shantanuaggarwal2695 May 31, 2021
d3836c2
Add GeoTools dependency in Python adapter
jiayuasu May 31, 2021
2e9e54c
Fix pom
jiayuasu May 31, 2021
32e7025
RS_Normalize
shantanuaggarwal2695 Jun 2, 2021
54ae903
Adding Jupyter notebook for Geotiff Loader and map algebra operations
shantanuaggarwal2695 Jun 2, 2021
ee4fb46
RS_Normalize() and Jupyter notebook UDF
shantanuaggarwal2695 Jun 2, 2021
700fbd4
Final Additions to binder
shantanuaggarwal2695 Jun 2, 2021
aae478b
Fix the docs
jiayuasu Jun 17, 2021
b1e0496
Remove hadoop test dependency
jiayuasu Jun 17, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
/site/
/.bloop/
/.metals/
/.vscode/
/.vscode/
5 changes: 5 additions & 0 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move all geotools dependency to the parent pom. See here: https://github.com/apache/incubator-sedona/blob/master/pom.xml#L120

Make sure you use the geotools scope variable for the scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

<groupId>org.geotools</groupId>
<artifactId>gt-coverage</artifactId>
<version>${geotools.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
Expand Down
Binary file added core/src/test/resources/raster/image.tif
Binary file not shown.
7 changes: 7 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,13 @@
<version>0.1.0</version>
<scope>${dependency.scope}</scope>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-geotiff</artifactId>
<version>${geotools.version}</version>
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
</dependency>


<!--The following GeoTools dependencies use GNU Lesser General Public License and thus are excluded from the binary distribution-->
<!-- Users have to include them by themselves manually -->
<!-- See https://www.apache.org/legal/resolved.html#category-x -->
Expand Down
3 changes: 2 additions & 1 deletion python-adapter/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
bin
/.settings
/.classpath
/.project
/.project
*.iml
6 changes: 6 additions & 0 deletions sql/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-minicluster</artifactId>
<version>2.7.4</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,10 @@ object Catalog {
ST_IsRing,
ST_FlipCoordinates,
ST_LineSubstring,
ST_LineInterpolatePoint
ST_LineInterpolatePoint,
ST_GeomFromRaster,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the name to ST_GeomFromGeoTiff

ST_DataframeFromRaster,
ST_getBand
)

val aggregateExpressions: Seq[Aggregator[Geometry, Geometry, Geometry]] = Seq(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
import org.apache.spark.sql.catalyst.util.GenericArrayData
import org.apache.spark.sql.sedona_sql.UDT.GeometryUDT
import org.apache.spark.sql.types.{DataType, Decimal}
import org.apache.spark.sql.types._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this change is not necessary, do not commit this change. You can replace the "_" with the original content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

import org.apache.spark.unsafe.types.UTF8String
import org.locationtech.jts.geom.{Coordinate, GeometryFactory}

Expand Down Expand Up @@ -303,5 +303,4 @@ trait UserDataGeneratator {
}
return userData
}
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't commit any change to this file

Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,13 @@ import org.geotools.geometry.jts.JTS
import org.geotools.referencing.CRS
import org.locationtech.jts.algorithm.MinimumBoundingCircle
import org.locationtech.jts.geom.{PrecisionModel, _}
import org.locationtech.jts.io.WKBWriter
import org.locationtech.jts.linearref.LengthIndexedLine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. If this change is not necessary, do not commit this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't commit any change to this file

import org.locationtech.jts.operation.IsSimpleOp
import org.locationtech.jts.operation.buffer.BufferParameters
import org.locationtech.jts.operation.linemerge.LineMerger
import org.locationtech.jts.operation.valid.IsValidOp
import org.locationtech.jts.precision.GeometryPrecisionReducer
import org.locationtech.jts.simplify.TopologyPreservingSimplifier
import org.locationtech.jts.linearref.LengthIndexedLine
import org.opengis.referencing.operation.MathTransform

import java.util
Expand Down Expand Up @@ -1122,4 +1121,4 @@ case class ST_FlipCoordinates(inputExpressions: Seq[Expression])
override def dataType: DataType = GeometryUDT

override def children: Seq[Expression] = inputExpressions
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.spark.sql.sedona_sql.expressions

import org.apache.sedona.sql.utils.GeometrySerializer
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
import org.apache.spark.sql.catalyst.util.GenericArrayData
import org.apache.spark.sql.sedona_sql.UDT.GeometryUDT
import org.apache.spark.sql.sedona_sql.expressions.implicits.GeometryEnhancer
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String
import org.geotools.coverage.grid.io.{AbstractGridFormat, GridCoverage2DReader, GridFormatFinder, OverviewPolicy}
import org.geotools.coverage.grid.{GridCoordinates2D, GridCoverage2D}
import org.geotools.gce.geotiff.GeoTiffReader
import org.geotools.geometry.jts.JTS
import org.geotools.referencing.CRS
import org.geotools.util.factory.Hints
import org.locationtech.jts.geom.{Coordinate, Geometry, GeometryFactory}
import org.opengis.coverage.grid.{GridCoordinates, GridEnvelope}
import org.opengis.parameter.{GeneralParameterValue, ParameterValue}

import java.io.IOException
import java.util
import scala.collection.convert.ImplicitConversions.`collection AsScalaIterable`

// Fetches polygonal coordinates from a raster image

case class ST_GeomFromRaster(inputExpressions: Seq[Expression])
extends Expression with CodegenFallback with UserDataGeneratator {
override def nullable: Boolean = false

override def eval(inputRow: InternalRow): Any = {
// This is an expression which takes one input expressions
assert(inputExpressions.length == 1)
val geomString = inputExpressions(0).eval(inputRow).asInstanceOf[UTF8String].toString
val geometry = readGeometry(geomString)
new GenericArrayData(GeometrySerializer.serialize(geometry))
}

private def readGeometry(url: String): Geometry = {

val format = GridFormatFinder.findFormat(url)
val hints = new Hints(Hints.FORCE_LONGITUDE_FIRST_AXIS_ORDER, true)
val reader = format.getReader(url, hints)
var coverage:GridCoverage2D = null

try coverage = reader.read(null)
catch {
case giveUp: IOException =>
throw new RuntimeException(giveUp)
}
reader.dispose()
val source = coverage.getCoordinateReferenceSystem
val target = CRS.decode("EPSG:4326", true)
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
val targetCRS = CRS.findMathTransform(source, target)
val gridRange2D = coverage.getGridGeometry.getGridRange
val cords = Array(Array(gridRange2D.getLow(0), gridRange2D.getLow(1)), Array(gridRange2D.getLow(0), gridRange2D.getHigh(1)), Array(gridRange2D.getHigh(0), gridRange2D.getHigh(1)), Array(gridRange2D.getHigh(0), gridRange2D.getLow(1)))
val polyCoordinates = new Array[Coordinate](5)
var index = 0

for (point <- cords) {
val coordinate2D = new GridCoordinates2D(point(0), point(1))
val result = coverage.getGridGeometry.gridToWorld(coordinate2D)
polyCoordinates({
index += 1; index - 1
}) = new Coordinate(result.getOrdinate(0), result.getOrdinate(1))
}

polyCoordinates(index) = polyCoordinates(0)
val factory = new GeometryFactory
val polygon = JTS.transform(factory.createPolygon(polyCoordinates), targetCRS)

polygon

}
override def dataType: DataType = GeometryUDT

override def children: Seq[Expression] = inputExpressions
}


// Constructs a raster dataframe from a raster image which contains multiple columns such as Geometry, Band values etc
case class ST_DataframeFromRaster(inputExpressions: Seq[Expression])
extends Expression with CodegenFallback with UserDataGeneratator {
override def nullable: Boolean = false

override def eval(inputRow: InternalRow): Any = {
// This is an expression which takes one input expressions
assert(inputExpressions.length == 2)
val geomString = inputExpressions(0).eval(inputRow).asInstanceOf[UTF8String].toString
val totalBands = inputExpressions(1).eval(inputRow).asInstanceOf[Int]
val geometry = readGeometry(geomString)
val bandvalues = getBands(geomString, totalBands).toArray
returnValue(geometry.toGenericArrayData,bandvalues, 2)
}

private def readGeometry(url: String): Geometry = {
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved

val format = GridFormatFinder.findFormat(url)
val hints = new Hints(Hints.FORCE_LONGITUDE_FIRST_AXIS_ORDER, true)
val reader = format.getReader(url, hints)
var coverage: GridCoverage2D = null

try coverage = reader.read(null)
catch {
case giveUp: IOException =>
throw new RuntimeException(giveUp)
}
reader.dispose()
val source = coverage.getCoordinateReferenceSystem
val target = CRS.decode("EPSG:4326", true)
val targetCRS = CRS.findMathTransform(source, target)
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
val gridRange2D = coverage.getGridGeometry.getGridRange
val cords = Array(Array(gridRange2D.getLow(0), gridRange2D.getLow(1)), Array(gridRange2D.getLow(0), gridRange2D.getHigh(1)), Array(gridRange2D.getHigh(0), gridRange2D.getHigh(1)), Array(gridRange2D.getHigh(0), gridRange2D.getLow(1)))
val polyCoordinates = new Array[Coordinate](5)
var index = 0

for (point <- cords) {
val coordinate2D = new GridCoordinates2D(point(0), point(1))
val result = coverage.getGridGeometry.gridToWorld(coordinate2D)
polyCoordinates({
index += 1;
index - 1
}) = new Coordinate(result.getOrdinate(0), result.getOrdinate(1))
}

polyCoordinates(index) = polyCoordinates(0)
val factory = new GeometryFactory
val polygon = JTS.transform(factory.createPolygon(polyCoordinates), targetCRS)

polygon
}

private def getBands(url: String, bands:Int): List[Double] = {
val policy: ParameterValue[OverviewPolicy] = AbstractGridFormat.OVERVIEW_POLICY.createValue
policy.setValue(OverviewPolicy.IGNORE)

val gridsize: ParameterValue[String] = AbstractGridFormat.SUGGESTED_TILE_SIZE.createValue

val useJaiRead: ParameterValue[Boolean] = AbstractGridFormat.USE_JAI_IMAGEREAD.createValue.asInstanceOf[ParameterValue[Boolean]]
useJaiRead.setValue(true)


val reader: GridCoverage2DReader = new GeoTiffReader(url)
val coverage: GridCoverage2D = reader.read(Array[GeneralParameterValue](policy, gridsize, useJaiRead))

val dimensions: GridEnvelope = reader.getOriginalGridRange
val maxDimensions: GridCoordinates = dimensions.getHigh
val w: Int = maxDimensions.getCoordinateValue(0) + 1
val h: Int = maxDimensions.getCoordinateValue(1) + 1
val numBands: Int = bands

val bandValues: util.List[util.List[Double]] = new util.ArrayList[util.List[Double]](numBands)

for (i <- 0 until numBands) {
bandValues.add(new util.ArrayList[Double])
}

for (i <- 0 until w) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this loop and the next loop can be merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayuasu The current loop returns an array of size equal to the number of bands for every pixel and I use it to construct a 2D array of size number of bands * number of pixels while the second loop flattens this array. If I need to merge both the loops, I have to keep a separate starting index for every band which may vary.

for (j <- 0 until h) {
val vals: Array[Double] = new Array[Double](numBands)
coverage.evaluate(new GridCoordinates2D(i, j), vals)
var band: Int = 0
for (pixel <- vals) {
bandValues.get({
band += 1; band - 1
}).add(pixel)
}
}
}

bandValues.flatten.toList

}

// Dynamic results based on number of columns and type of structure
private def returnValue(geometry:GenericArrayData, bands:Array[Double], count:Int): InternalRow = {

val genData = new Array[GenericArrayData](count)
genData(0) = geometry
genData(1) = new GenericArrayData(bands)
val result = InternalRow(genData.toList : _*)
result
}

// Dynamic Schema generation using Number of Bands
private def getSchema():DataType = {
val mySchema = StructType(Array(StructField("Polygon", GeometryUDT, false),StructField("bands", ArrayType(DoubleType))))
mySchema
}

override def dataType: DataType = getSchema()

override def children: Seq[Expression] = inputExpressions
}

// get a particular band from a raster dataframe
case class ST_getBand(inputExpressions: Seq[Expression])
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
extends Expression with CodegenFallback with UserDataGeneratator {
override def nullable: Boolean = false

override def eval(inputRow: InternalRow): Any = {
// This is an expression which takes one input expressions
assert(inputExpressions.length == 3)
val bandInfo = inputExpressions(0).eval(inputRow).asInstanceOf[GenericArrayData].toDoubleArray()
val targetBand = inputExpressions(1).eval(inputRow).asInstanceOf[Int]
val totalBands = inputExpressions(2).eval(inputRow).asInstanceOf[Int]
val result = gettargetband(bandInfo, targetBand, totalBands)
new GenericArrayData(result)
}

// fetch target band from the given array of bands
private def gettargetband(bandinfo: Array[Double], targetband:Int, totalbands:Int): Array[Double] = {
val sizeOfBand = bandinfo.length/totalbands
val lowerBound = (targetband - 1)*sizeOfBand
val upperBound = targetband*sizeOfBand-1
assert(bandinfo.slice(lowerBound,upperBound).length + 1==sizeOfBand)
bandinfo.slice(lowerBound, upperBound)

}

override def dataType: DataType = ArrayType(DoubleType)

override def children: Seq[Expression] = inputExpressions
}

Loading