Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Nov 25, 2025

What changes were proposed in this pull request?

Fixes a bug that causes executing spark-pipelines with a spec file in a subdirectory to fail:

spark-pipelines run --spec subdir/spark-pipelines.yml

results in

pyspark.errors.exceptions.connect.AnalysisException: [RUN_EMPTY_PIPELINE] Pipelines are expected to have at least one non-temporary dataset defined (tables, persisted views) but no non-temporary datasets were found in your pipeline.

Why are the changes needed?

Fix a bug.

Does this PR introduce any user-facing change?

Fixes a user-facing bug

How was this patch tested?

Ran

spark-pipelines run --spec subdir/spark-pipelines.yml

Observed it fail before the change and succeed after.

With our current Pipelines CLI unit testing setup, there isn't a straightforward way to write a test for this, but I'm investigating whether we can augment it.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54508] spark-pipelines fails when spec file is in different di… [SPARK-54508][PYTHON] spark-pipelines fails when spec file is in different directory Nov 25, 2025
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54508][PYTHON] spark-pipelines fails when spec file is in different directory [SPARK-54508][PYTHON] Fix spark-pipelines to resolve spec file path more robustly Nov 25, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sryza .
Merged to master/4.1 for Apache Spark 4.1.0.

dongjoon-hyun pushed a commit that referenced this pull request Nov 25, 2025
…th more robustly

### What changes were proposed in this pull request?

Fixes a bug that causes executing `spark-pipelines` with a spec file in a subdirectory to fail:

```
spark-pipelines run --spec subdir/spark-pipelines.yml
```

results in

```
pyspark.errors.exceptions.connect.AnalysisException: [RUN_EMPTY_PIPELINE] Pipelines are expected to have at least one non-temporary dataset defined (tables, persisted views) but no non-temporary datasets were found in your pipeline.
```

### Why are the changes needed?

Fix a bug.

### Does this PR introduce _any_ user-facing change?

Fixes a user-facing bug

### How was this patch tested?

Ran
```
spark-pipelines run --spec subdir/spark-pipelines.yml
```

Observed it fail before the change and succeed after.

With our current Pipelines CLI unit testing setup, there isn't a straightforward way to write a test for this, but I'm investigating whether we can augment it.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #53219 from sryza/cli-relative-path.

Authored-by: Sandy Ryza <sandy.ryza@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c61d40c)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants