Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset bundle run_job_task fails #812

Closed
meretri opened this issue Sep 28, 2023 · 19 comments
Closed

Asset bundle run_job_task fails #812

meretri opened this issue Sep 28, 2023 · 19 comments
Assignees
Labels
Bug Something isn't working DABs DABs related issues

Comments

@meretri
Copy link

meretri commented Sep 28, 2023

I get an error when trying to deploy a workflow job wich contains a "Run Job" Task:

databricks.yml

 # yaml-language-server: $schema=bundle-settings-schema.json 
bundle:
  name: run_job_example
targets:
  development:
    workspace:
      host: https://xxx.azuredatabricks.net
      profile: xyz
resources:
  jobs:
    job1:
      name: job1
      tasks:       
      - task_key: STEP_1
        run_job_task:
          job_id: 12345

The validation step is successful, but when trying to deploy, I get the following error:

Starting resource deployment
Error: terraform apply: exit status 1

Error: cannot create job: Job 0 does not exist.

  with databricks_job.job1,
  on bundle.tf.json line 25, in resource.databricks_job.job1:
  25:       }

The job ID is missing in the bundle.tf.json:

{
  "terraform": {
    "required_providers": {
      "databricks": {
        "source": "databricks/databricks",
        "version": "1.23.0"
      }
    }
  },
  "provider": {
    "databricks": {}
  },
  "resource": {
    "databricks_job": {
      "job1": {
        "name": "job1",
        "task": [
          {
            "task_key": "STEP_1",
            "run_job_task": {
              "job_id": ""
            }
          }
        ]
      }
    }
  }
}

I assume the reason for this is, that terraform expects a string as job ID and not a int. But quoting the ID results in a different error:

Error: failed to load databricks.yml: error unmarshaling JSON: json: cannot unmarshal string into Go struct field RunJobTask.resources.jobs.tasks.run_job_task.job_id of type int
@ckelly
Copy link

ckelly commented Oct 2, 2023

same here - we're hoping to ideally:

  • reference another bundle/ job by name as a job_id here (via resources.jobs... or some variation)
  • reference by a defined job id (this seems to be how we more or less can do it via the GUI currently)
  • define the job in the same bundle and reference by resources.jobs...

@andrewnester
Copy link
Contributor

Thank for reporting the issue, We're aware of the problem and are working on a fix. We will keep this thread updated

@andrewnester andrewnester self-assigned this Oct 3, 2023
@kyleries
Copy link

We are in the exact same position, wanted to drop a note here in case you needed anyone else to test the solution. Happy to run a specific branch/commit if you'd like!

@timreddick-8451
Copy link

Same issue. Posted a description of the issue on the Databricks Community forum.

https://community.databricks.com/t5/data-engineering/using-run-job-task-in-databricks-asset-bundles/m-p/44807/highlight/true#M27716

@kyleries
Copy link

Thanks, Tim - I just posted this issue to the community forum and hadn't got back here yet!

@andrewnester
Copy link
Contributor

We're working on a fix for the issue and will keep this thread updated. Unfortunately, it's not as simple as a mere type change, due to where the type information for the bundle schema comes from. We expect this to take ~2 weeks. We'll keep you posted.

@BMeyn
Copy link

BMeyn commented Oct 26, 2023

@andrewnester, any updates on this?

@andrewnester andrewnester assigned pietern and unassigned andrewnester Oct 26, 2023
@pietern
Copy link
Contributor

pietern commented Oct 26, 2023

Work is in progress and coming along. I expect the fix for this to take another week or so.

What's underpinning this fix is a change in the way we're loading and processing the bundle configuration. Currently, we're loading it directly into the typed Go structures and this prevents us from storing a string (e.g. ${resources.other_job.id} in an integer field (e.g. the job_id). To address this, we're loading the bundle configuration into a dynamically typed structure that we mirror into the Go structures as needed. This means we can retain the string value for the integer field and pass it along to Terraform during deployment. This being a rather foundational change to the way we deal with bundle configuration, we work through this change systematically and diligently to try and avoid regressions when we enable this change.

@meretri
Copy link
Author

meretri commented Nov 15, 2023

@pietern thanks for working on this! any update on the timeline?

@hansh0801
Copy link

track

1 similar comment
@rooftopvalley
Copy link

track

@datacom-bozhu
Copy link

Can we please get an update on when the fix is expected?

@binnisb
Copy link

binnisb commented Jan 12, 2024

same here - we're hoping to ideally:

* reference another bundle/ job by name as a job_id here (via `resources.jobs...` or some variation)

* reference by a defined job id (this seems to be how we more or less can do it via the GUI currently)

* define the job in the same bundle and reference by `resources.jobs...`

Yes we are in the same boat. We need to be able to reference another job by name as well. Hope this issue is still alive.

@pietern
Copy link
Contributor

pietern commented Jan 12, 2024

Hey all, development for this issue is very much alive, though it's taking longer to finalize.

If you're keen to follow along, there are two places to look:

Note that because the change is fundamental (thus ~risky), we're making sure it doesn't regress existing use cases.

Thanks for your patience.

@andrewnester
Copy link
Contributor

Hey everyone! Exciting news: the change to make "run_job_task" work has been merged and released in CLI version 0.214.0.

As an example, it can be used in this form

resources:
  jobs:
    job_a:
      name: "job_a"
      tasks:
        - task_key: TestTask
          new_cluster:
            spark_version: "13.3.x-scala2.12"
            node_type_id: "i3.xlarge"
            num_workers: 2
          notebook_task:
            notebook_path: ./src/test.py
    job_b:
      name: "job_b"
      tasks:
        - task_key: TestTask
          run_job_task: 
            job_id: ${resources.jobs.job_a.id}

Closing the issue as this functionality now works in DABs

@virtualdvid
Copy link

Hey everyone! Exciting news: the change to make "run_job_task" work has been merged and released in CLI version 0.214.0.

As an example, it can be used in this form

resources:
  jobs:
    job_a:
      name: "job_a"
      tasks:
        - task_key: TestTask
          new_cluster:
            spark_version: "13.3.x-scala2.12"
            node_type_id: "i3.xlarge"
            num_workers: 2
          notebook_task:
            notebook_path: ./src/test.py
    job_b:
      name: "job_b"
      tasks:
        - task_key: TestTask
          run_job_task: 
            job_id: ${resources.jobs.job_a.id}

Closing the issue as this functionality now works in DABs

Does this work in the json file?

@andrewnester
Copy link
Contributor

@virtualdvid what do you mean by that?

@virtualdvid
Copy link

virtualdvid commented Mar 11, 2024

@virtualdvid what do you mean by that?

@andrewnester we use json files to deploy workflows in several environments. Here you guys are showing a yaml file example. I'm wondering if I can use ${resources.jobs.job_a.id} in a json file like:

"run_job_task": {
    "job_id": "{{resources.jobs.job_a.id}}"
},

@andrewnester
Copy link
Contributor

Ah, I see, if you're referring to JSON I guess you're using databricks jobs create command, correct?
Then no, such syntax is not supported. The feature in the thread is implemented for DABs (Databricks Assets Bundles) that's why you see YAML being used because that's how bundle configuration is defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working DABs DABs related issues
Projects
None yet
Development

No branches or pull requests