Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipesDataBricksClient not accepting a task definition with an existing cluster ID #22021

Open
mikolak-net opened this issue May 22, 2024 · 0 comments
Labels
area: dagster-pipes Related to Dagster Pipes type: bug Something isn't working

Comments

@mikolak-net
Copy link

Dagster version

1.7.6

What's the issue?

Using PipesDatabricksClient to run a pre-defined task (one with just a task key, e.g. jobs.SubmitTask(task_key="task_to_run")), the following error is occurring:

dagster_databricks/pipes.py", line 144, in run
    **submit_task_dict["new_cluster"].get("spark_env_vars", {}),

What did you expect to happen?

The task to run, even without a provided cluster definition for a new cluster.

How to reproduce?

  1. Create an asset or an op provided with a PipesDatabricksClient and a relevant context.
  2. For that asset/op, run code similar to:
task = jobs.SubmitTask(task_key="task_to_run")
pipes_databricks_client.run(
        task=task,
        context=context,
        extras={},
    )

Deployment type

None

Deployment details

No response

Additional information

  1. In general, the code looks like it's unable to run a pre-defined task (one that already exists on a Databricks cluster) – I wonder whether this is intentional. If so, not a great way to cooperate with Databricks-native deployment options.
  2. However, even if the above is intentional for some reason, even if the PipesDatabricksClient expects a "fresh" task definition, it should be able to accept an existing cluster ID. The way the code is written in dagster_pipes, the run method of the client always expect the new_cluster key to be present.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@mikolak-net mikolak-net added the type: bug Something isn't working label May 22, 2024
@garethbrickman garethbrickman added the integration: databricks Related to dagster-databricks label May 22, 2024
@garethbrickman garethbrickman changed the title PipesDataBricksClient not accepting a task definition without a provided PipesDataBricksClient not accepting a task definition without a provided existing cluster ID May 23, 2024
@garethbrickman garethbrickman changed the title PipesDataBricksClient not accepting a task definition without a provided existing cluster ID PipesDataBricksClient not accepting a task definition with an existing cluster ID May 23, 2024
@garethbrickman garethbrickman added area: dagster-pipes Related to Dagster Pipes and removed integration: databricks Related to dagster-databricks labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dagster-pipes Related to Dagster Pipes type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants