Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.Exception: spark.databricks.labs.mosaic.raster.api when executing SQL functions #297

Closed
giohappy opened this issue Feb 6, 2023 · 4 comments · Fixed by #352
Closed

Comments

@giohappy
Copy link

giohappy commented Feb 6, 2023

Describe the bug
spark.sql("""show functions""").where("startswith(function, 'st_')").display() throws the following execption:

java.lang.Exception: spark.databricks.labs.mosaic.raster.api
	at org.apache.spark.sql.errors.QueryExecutionErrors$.noSuchElementExceptionError(QueryExecutionErrors.scala:1910)
	at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$4(SQLConf.scala:5096)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$1(SQLConf.scala:5096)
	at com.databricks.spark.DatabricksSparkConf$AioaLazyConfigConf$.recordStringConfigAccessDuringWarmUp(DatabricksSparkConf.scala:1092)
	at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:5096)
	at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:88)
	at com.databricks.labs.mosaic.sql.extensions.MosaicSQL.$anonfun$apply$1(MosaicSQL.scala:32)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildCheckRules$1(SparkSessionExtensions.scala:208)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildCheckRules(SparkSessionExtensions.scala:208)
	at com.databricks.sql.HiveDatabricksEdge.customCheckRules(HiveDatabricksEdge.scala:44)
	at com.databricks.sql.HiveDatabricksEdge.customCheckRules$(HiveDatabricksEdge.scala:37)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.customCheckRules(HiveSessionStateBuilder.scala:56)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:141)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:100)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$7(BaseSessionStateBuilder.scala:415)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:104)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:104)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:159)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:319)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$3(QueryExecution.scala:349)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:777)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:349)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:346)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:140)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:140)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:132)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:106)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:104)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:820)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:815)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$6(DriverLocal.scala:279)
	at org.apache.spark.SafeAddJarOrFile$.safe(SafeAddJarOrFile.scala:31)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$5(DriverLocal.scala:279)
	at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1758)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$4(DriverLocal.scala:278)
	at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$3(DriverLocal.scala:271)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$2(DriverLocal.scala:270)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at com.databricks.backend.daemon.driver.DriverLocal.<init>(DriverLocal.scala:257)
	at com.databricks.backend.daemon.driver.PythonDriverLocalBase.<init>(PythonDriverLocalBase.scala:168)
	at com.databricks.backend.daemon.driver.JupyterDriverLocal.<init>(JupyterDriverLocal.scala:370)
	at com.databricks.backend.daemon.driver.PythonDriverWrapper.instantiateDriver(DriverWrapper.scala:723)
	at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:342)
	at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:231)
	at java.lang.Thread.run(Thread.java:750)

To Reproduce

  1. Upload mosaic-0.3.6-jar-with-dependencies.jar and mosaic-0.3.6.jar to /dbfs/FileStore/mosaic/jars
  2. Create mosaic-init.sh init script with the following content:
#!/bin/bash
#
# File: mosaic-init.sh
# On cluster startup, this script will copy the Mosaic jars to the cluster's default jar directory.

cp /dbfs/FileStore/mosaic/jars/*.jar /databricks/jars
  1. Configure spark conf:
spark.databricks.cluster.profile singleNode
spark.sql.extensions com.databricks.labs.mosaic.sql.extensions.MosaicSQL
spark.databricks.labs.mosaic.index.system H3
spark.master local[*, 4]
spark.databricks.labs.mosaic.geometry.api JTS
  1. Execute the following Notebook cell:
from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils

spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

spark.sql("""show functions""").where("startswith(function, 'st_')").display()

Expected behavior
The SQL command should return the list of autoconfigured functions.

Additional context

  • Installing from the Notebook with %pip install databricks-mosaic works fine
  • Databricks Runtime 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)
@edurdevic
Copy link
Contributor

Thank you for reporting this @giohappy.
I noticed you are importing both the jars (with and without dependencies).
You only need one of the two, you can add the one with dependencies so that you don't need to install additional libraries.

Does the issue occur also when installing only one of the two?

@giohappy
Copy link
Author

giohappy commented Feb 7, 2023

Thanks @edurdevic for your reply.

I forgot to mention that my first attempt was with the "fat" jar only. It was unsuccesful, so I've tried with both.

@edobrynin-dodo
Copy link

I have the similar issue.
Created new cluster from scratch, 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12).

Uploaded jar-with-dependencies (0.3.8) into /dbfs/FileStore/mosaic/jars
Created init script, added spark configs as described below

spark.sql("""show functions""").where("startswith(function, 'st_')").display()

: java.util.NoSuchElementException: spark.databricks.labs.mosaic.raster.api
	at org.apache.spark.sql.errors.QueryExecutionErrors$.noSuchElementExceptionError(QueryExecutionErrors.scala:1918)
	at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$4(SQLConf.scala:5108)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$1(SQLConf.scala:5108)
	at com.databricks.spark.DatabricksSparkConf$AioaLazyConfigConf$.recordStringConfigAccessDuringWarmUp(DatabricksSparkConf.scala:1100)
	at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:5108)
	at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:88)
	at com.databricks.labs.mosaic.sql.extensions.MosaicSQL.$anonfun$apply$1(MosaicSQL.scala:32)
	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildCheckRules$1(SparkSessionExtensions.scala:208)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.sql.SparkSessionExtensions.buildCheckRules(SparkSessionExtensions.scala:208)
	at com.databricks.sql.HiveDatabricksEdge.customCheckRules(HiveDatabricksEdge.scala:44)
	at com.databricks.sql.HiveDatabricksEdge.customCheckRules$(HiveDatabricksEdge.scala:37)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.customCheckRules(HiveSessionStateBuilder.scala:56)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:141)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:100)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$7(BaseSessionStateBuilder.scala:415)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:104)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:104)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:147)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:319)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$3(QueryExecution.scala:337)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:763)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:337)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:334)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:141)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:141)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:133)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:106)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:104)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:820)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:815)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)

@milos-colic
Copy link
Contributor

@giohappy the issue occurs due to a wrong call to spark.conf.get without a default value for the key spark.databricks.labs.mosaic.raster.api. I will create a fix for this in the v0.3.10 (next release).

Temporary workaround is to add `spark.conf.set("spark.databricks.labs.mosaic.raster.api", "GDAL").
Please note: that if you didnt install GDAL that is fine for ST_ functions even if you manually set this key.
However just setting this key isnt enough for RST_ functions to work, you'd need to follow the GDAL installation.

I will add a docs page for versions lower than 0.3.10 for this workaround.

Screenshot 2023-04-20 at 21 08 57

@milos-colic milos-colic linked a pull request Apr 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants