title

description

ms.reviewer

ms.author

author

ms.topic

ms.custom

ms.date

ms.search.form

Apache Spark job definition

An Apache Spark job definition is a Fabric code item that allows you to submit batch or streaming jobs to a Spark cluster.

snehagunda

qixwang

overview

build-2023

build-2023-dataai

build-2023-fabric

ignite-2023

10/20/2023

spark_job_definition

What is an Apache Spark job definition?

An Apache Spark job definition is a Microsoft Fabric code item that allows you to submit batch/streaming jobs to Spark clusters. By uploading the binary files from the compilation output of different languages (for example, .jar from Java), you can apply different transformation logic to the data hosted on a lakehouse. Besides the binary file, you can further customize the behavior of the job by uploading more libraries and command line arguments.

To run a Spark job definition, you must have at least one lakehouse associated with it. This default lakehouse context serves as the default file system for Spark runtime. For any Spark code using a relative path to read/write data, the data is served from the default lakehouse.

Tip

To run a Spark job definition item, you must have a main definition file and default lakehouse context. If you don't have a lakehouse, create one by following the steps in Create a lakehouse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-job-definition.md

spark-job-definition.md

What is an Apache Spark job definition?

Related content

Files

spark-job-definition.md

Latest commit

History

spark-job-definition.md

File metadata and controls

What is an Apache Spark job definition?

Related content