Skip to content

Commit

Permalink
Make create_temporary_dir work with pyspark-extension only (#222)
Browse files Browse the repository at this point in the history
Calling create_temporary_dir and install_pip_package (which uses the former) can now be called
in Python without the need to have the spark-extension jar in the classpath. This allows using
those methods with only pyspark-extension installed.
  • Loading branch information
EnricoMi committed Jan 4, 2024
1 parent 663aef4 commit 64dfc1e
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
3 changes: 3 additions & 0 deletions PYSPARK-DEPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ Such a deployment can be cumbersome, especially when running in an interactive n
The `spark-extension` package allows installing Python packages programmatically by the PySpark application itself (PySpark ≥ 3.1.0).
These packages are only accessible by that PySpark application, and they are removed on calling `spark.stop()`.

Either install the `spark-extension` Maven package, or the `pyspark-extension` PyPi package (on the driver only),
as described [here](README.md#using-spark-extension).

## Installing packages with `pip`

Python packages can be installed with `pip` as follows:
Expand Down
6 changes: 3 additions & 3 deletions python/gresearch/spark/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import shutil
import subprocess
import sys
import tempfile
import time
from contextlib import contextmanager
from pathlib import Path
Expand Down Expand Up @@ -426,9 +427,8 @@ def create_temporary_dir(spark: Union[SparkSession, SparkContext], prefix: str)
if isinstance(spark, SparkSession):
spark = spark.sparkContext

package = spark._jvm.uk.co.gresearch.spark.__getattr__("package$").__getattr__("MODULE$")
mktempdir = package.createTemporaryDir
return mktempdir(prefix)
root_dir = spark._jvm.org.apache.spark.SparkFiles.getRootDirectory()
return tempfile.mkdtemp(prefix=prefix, dir=root_dir)


SparkSession.create_temporary_dir = create_temporary_dir
Expand Down

0 comments on commit 64dfc1e

Please sign in to comment.