[SEDONA-21] Add extension classes for auto registration of UDFs/UDTs #513

alexott · 2021-03-07T12:24:01Z

Is this PR related to a proposed Issue?

What changes were proposed in this PR?

With this change we can use Sedona UDFs/UDTs from Spark SQL, for example, from spark-sql
or via Thrift server. Just need to add following to command-line:

--conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions \
--conf spark.kryo.registrator=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator

How was this patch tested?

manual test using spark-sql

Did this PR include necessary documentation updates?

yes

jiayuasu · 2021-03-08T01:45:23Z

docs/tutorial/sql-sql.md

+Start `spark-sql` as following (replace `<VERSION>` with actual version, like, `1.0.1-incubating`):
+
+```sh
+park-sql --packages org.apache.sedona:sedona-sql-3.0_2.12:<VERSION>,org.apache.sedona:sedona-viz-3.0_2.12:<VERSION>,org.locationtech.jts:jts-core:1.18.0,,org.apache.sedona:sedona-core-3.0_2.12:<VERSION>,org.geotools:gt-referencing:24.0 \


Could you please change the packages to the packages listed here: http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#spark-30-scala-212

sedona-python-adapter: a fat jar for Scala/Java/Python users that includes all dependencies except geotools

sedona-viz

org.datasyslab.geotools-wrapper: the geotools packages on Maven Central. The original Geotools jars are in OSGEO repo.

The individual Sedona jars are only for advanced users who know how to handle dependency conflicts. In addition, use OSGEO GeoTools jars will lead to 'jai-core missing' issue on some platforms

@jiayuasu done - you're right, it's much easier to use that coordinates...

jiayuasu · 2021-03-08T01:46:38Z

@alexott Thank you very much for your contribution. Please see my comment above.

With this change we can use Sedona UDFs/UDTs from Spark SQL, for example, from `spark-sql` or via Thrift server. Just need to add following to command-line: ``` --conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions \ --conf spark.kryo.registrator=org.apache.spark.serializer.KryoSerializer \ --conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator ```

jiayuasu · 2021-05-28T00:51:16Z

@alexott Thank you again for your contribution to Sedona.

After the release of Sedona 1.0.1, a user reported that they cannot use this PR in Azure Databricks workspace. His question is almost identical to this SO post: https://stackoverflow.com/questions/66721168/sparksessionextensions-injectfunction-in-databricks-environment

I wonder if you have any idea about how to fix it.

alexott · 2021-05-28T07:03:47Z

@jiayuasu The main problem is that if you add the library via UI, then it's loaded only after Spark is started, so all SparkExtensions already executed. That's an existing limitation of the platform. There is a workaround - copy all necessary jars to DBFS (the /tmp/sedona-jars/ in my example), and use the init scripts to copy jar files before Spark starts. Something like this:

cp /dbfs/tmp/sedona-jars/*.jar /databricks/jars

After that, Spark extensions are picked up, and you can use SQL commands:

P.S. One problem is also that you need to pull many jars to make it working. I was using following pom.xml to generate a jar with dependencies and use it in init script:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>net.alexott.demos.spark</groupId>
  <artifactId>sedona-all_3.0_2.12</artifactId>
  <version>1.0.1-incubating</version>
  <packaging>jar</packaging>

  <name>sedona-all</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <sedona.version>1.0.1-incubating</sedona.version>
    <spark.version>3.0_2.12</spark.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.apache.sedona</groupId>
      <artifactId>sedona-viz-${spark.version}</artifactId>
      <version>${sedona.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.sedona</groupId>
      <artifactId>sedona-python-adapter-${spark.version}</artifactId>
      <version>${sedona.version}</version>
    </dependency>
    <dependency>
      <groupId>org.datasyslab</groupId>
      <artifactId>geotools-wrapper</artifactId>
      <version>geotools-24.0</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.1</version>
        <configuration>
          <source>${java.version}</source>
          <target>${java.version}</target>
          <optimize>true</optimize>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.2.0</version>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  
</project>

jiayuasu reviewed Mar 8, 2021

View reviewed changes

alexott force-pushed the add-sql-extensions branch from 5f99468 to 66c81e2 Compare March 8, 2021 19:13

jiayuasu added affect public APIs sedona-sql sedona-viz improvement labels Mar 8, 2021

jiayuasu added this to the sedona-1.0.1 milestone Mar 8, 2021

jiayuasu merged commit 34fef45 into apache:master Mar 8, 2021

alexott deleted the add-sql-extensions branch May 28, 2021 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SEDONA-21] Add extension classes for auto registration of UDFs/UDTs #513

[SEDONA-21] Add extension classes for auto registration of UDFs/UDTs #513

alexott commented Mar 7, 2021

jiayuasu Mar 8, 2021

alexott Mar 8, 2021

jiayuasu commented Mar 8, 2021

jiayuasu commented May 28, 2021

alexott commented May 28, 2021

[SEDONA-21] Add extension classes for auto registration of UDFs/UDTs #513

[SEDONA-21] Add extension classes for auto registration of UDFs/UDTs #513

Conversation

alexott commented Mar 7, 2021

Is this PR related to a proposed Issue?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

jiayuasu Mar 8, 2021

Choose a reason for hiding this comment

alexott Mar 8, 2021

Choose a reason for hiding this comment

jiayuasu commented Mar 8, 2021

jiayuasu commented May 28, 2021

alexott commented May 28, 2021