Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DeltaTableBuilder.execute() sometimes throws "Could not find active SparkSession" #1475

Closed
1 of 3 tasks
moredatapls opened this issue Nov 3, 2022 · 6 comments
Closed
1 of 3 tasks
Assignees
Labels
bug Something isn't working
Milestone

Comments

@moredatapls
Copy link
Contributor

moredatapls commented Nov 3, 2022

Bug

Describe the problem

Steps to reproduce

import io.delta.tables.DeltaTable
import org.apache.spark.sql.{Encoders, SparkSession}

import java.nio.file.{Files, Path, Paths}

val tempDir: Path = Files.createTempDirectory("scratch")
val warehouseDir: Path = Paths.get(tempDir.toString, "warehouse")
val tableDir: Path = Paths.get(tempDir.toString, "table")

val spark = SparkSession.builder()
  .appName("scratch")
  .master("local[*]")
  .config("spark.local.ip", "127.0.0.1")
  .config("spark.driver.host", "127.0.0.1")
  .config("spark.driver.bindAddress", "127.0.0.1")
  .config("spark.sql.warehouse.dir", warehouseDir.toString)
  .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
  .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
  .getOrCreate()

case class Table(id: Long, text: String)

val schema = Encoders.product[Table].schema

val tableBuilder = DeltaTable.create(spark)
  .tableName("Table")
  .addColumns(schema)
  .location(tableDir.toString)

// sometimes throws org.apache.spark.sql.delta.DeltaIllegalArgumentException: Could not find active SparkSession
// even though a session has been passed into the DeltaTable.create(spark) statement
tableBuilder.execute()

Observed results

We are running some integration tests of our Spark jobs against some Delta tables which are created locally, on the fly, in the integration tests. The tables are stored on disk. The code above shows a (simplified) snippet of how we setup the Delta tables on disk for testing.

While running those test we sometimes (not always) observe that calling DeltaTableBuilder.execute() for the Delta table that we create in the tests sometimes throws an exception: org.apache.spark.sql.delta.DeltaIllegalArgumentException: Could not find active SparkSession.

This happens despite us passing the SparkSession into the DeltaTable.create(spark) method.

Expected results

If a SparkSession is passed into DeltaTable.create(spark), the exception I described above should never be thrown, since an active session exists and has been passed into the builder.

Further details

Stack trace:

org.apache.spark.sql.delta.DeltaIllegalArgumentException: Could not find active SparkSession
at org.apache.spark.sql.delta.DeltaErrorsBase.activeSparkSessionNotFound(DeltaErrors.scala:2038)
at org.apache.spark.sql.delta.DeltaErrorsBase.activeSparkSessionNotFound$(DeltaErrors.scala:2037)
at org.apache.spark.sql.delta.DeltaErrors$.activeSparkSessionNotFound(DeltaErrors.scala:2293)
at io.delta.tables.DeltaTable$.$anonfun$forName$1(DeltaTable.scala:680)
at scala.Option.getOrElse(Option.scala:189)
at io.delta.tables.DeltaTable$.forName(DeltaTable.scala:680)
at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:369)

From the stack trace we can see that the problem is here:

if (DeltaTableUtils.isValidPath(tableId)) {
DeltaTable.forPath(location.get)
} else {
DeltaTable.forName(this.identifier)
}

A possible solution could be to always forward the SparkSession object that is present in the DeltaTableBuilder to DeltaTable.forPath() and DeltaTable.forName() - they both have an optional parameter that we could use. Maybe it could be changed to something like this:

if (DeltaTableUtils.isValidPath(tableId)) {
  DeltaTable.forPath(spark, location.get)
} else {
  DeltaTable.forName(spark, this.identifier)
}

Environment information

  • Delta Lake version: 2.1.1
  • Spark version: 3.3.0
  • Scala version: 2.12.14

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@moredatapls moredatapls added the bug Something isn't working label Nov 3, 2022
@zsxwing
Copy link
Member

zsxwing commented Nov 3, 2022

Good catch. Yep we should just use the SparkSession object in DeltaTableBuilder to create DeltaTable. Feel free to submit a PR to fix that.

By the way, do you know why active SparkSession is not available in your tests?

@moredatapls
Copy link
Contributor Author

By the way, do you know why active SparkSession is not available in your tests?

No, I don't. It's very weird, when we are running the integration tests locally on our computers the tests work just fine. However, in CI (we're using Azure DevOps), the tests fail a lot, but sometimes succeed. I feel like it could be some race condition somewhere since SparkSession.getActiveSession() returns the active session per thread? Not sure.

@moredatapls
Copy link
Contributor Author

@zsxwing I published a draft PR here: #1476. It was easy to fix but maybe someone can provide me some guidance on the testing, is there any test I could add anywhere? It's my first PR here

@scottsand-db scottsand-db self-assigned this Nov 8, 2022
@pgrandjean
Copy link
Contributor

Same issue during tests. Looking forward to having this fixed!

@vkorukanti vkorukanti added this to the 2.2.0 milestone Dec 5, 2022
@moredatapls
Copy link
Contributor Author

@pgrandjean fyi Delta 2.2.0 was released which includes the fix. I upgraded the dependency and the tests now work in CI 🥳

@pgrandjean
Copy link
Contributor

awesome! thank you 👍

vkorukanti pushed a commit to vkorukanti/delta that referenced this issue Jan 5, 2023
(Cherry-pick 68c8e18 to branch-2.0)

Signed-off-by: Helge Bruegner <helge@bruegner.de>

- Forward the SparkSession when resolving the tables in the DeltaTableBuilder
- Remove some unused imports

Resolves delta-io#1475

/

No.

Closes delta-io#1476

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 880de0f55ee79289cecd19d74c4c177c76d30aeb
vkorukanti pushed a commit to vkorukanti/delta that referenced this issue Jan 6, 2023
(Cherry-pick 68c8e18 to branch-2.0)

Signed-off-by: Helge Bruegner <helge@bruegner.de>

- Forward the SparkSession when resolving the tables in the DeltaTableBuilder
- Remove some unused imports

Resolves delta-io#1475

/

No.

Closes delta-io#1476

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 880de0f55ee79289cecd19d74c4c177c76d30aeb
vkorukanti pushed a commit that referenced this issue Jan 24, 2023
(Cherry-pick 68c8e18 to branch-2.1)

Signed-off-by: Helge Bruegner <helge@bruegner.de>

- Forward the SparkSession when resolving the tables in the DeltaTableBuilder
- Remove some unused imports

Resolves #1475

/

No.

Closes #1476

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 880de0f55ee79289cecd19d74c4c177c76d30aeb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants