# Databricks Asset Bundles
Databricks Asset Bundles are the recommended approach to CI/CD on Databricks. Use Databricks Asset Bundles to describe Databricks resources such as jobs and pipelines as source files, and bundle them together with other assets to provide an end-to-end definition of a deployable project. These bundles of files can be source controlled, and you can use external CI/CD automation such as Github Actions to trigger deployments.

You can use Databricks Asset Bundles to define and programmatically manage your Databricks CI/CD implementation, which usually includes:

* **Notebooks**: Databricks notebooks are often a key part of data engineering and data science workflows. You can use version control for notebooks, and also validate and test them as part of a CI/CD pipeline. You can run automated tests against notebooks to check whether they are functioning as expected.
* **Libraries**: Manage the library dependencies required to run your deployed code. Use version control on libraries and include them in automated testing and validation.
* **Workflows**: Lakeflow Jobs are comprised of jobs that allow you to schedule and run automated tasks using notebooks or Spark jobs.
* **Data pipelines**: You can also include data pipelines in CI/CD automation, using Lakeflow Declarative Pipelines, the framework in Databricks for declaring data pipelines.
* **Infrastructure**: Infrastructure configuration includes definitions and provisioning information for clusters, workspaces, and storage for target environments. Infrastructure changes can be validated and tested as part of a CI/CD pipeline, ensuring that they are consistent and error-free.

## Example Development Workflow

<img src="https://docs.databricks.com/aws/en/assets/images/bundles-cicd-53be5f4860e8ebcedc2702f870290cda.png" style="display: block; margin-left: auto; margin-right: auto; max-width: 100%;" />


## The basics of `databricks.yml`
Each bundle must contain exactly one configuration file named `databricks.yml`, typically at the root of the project folder. The most simple `databricks.yml` you can create defines the bundle `name`, and a default `target`. For example:  
```yml
bundle:
  name: my_bundle

targets:
  dev:
    default: true
```

Resources you want to deploy can either be included directly in this `databricks.yml` or defined in additional yml files by using the `include` configuration. Here's an example of a simple asset bundle that deploys a job.

```yml
bundle:
  name: my_bundle

# Use include to break up bundle into multiple files. 
# Paths within a bundle are always relative to the yml they're defined in.
include:
  - resources/*.yml

# Targets that you can deploy into
# These can be different workspaces or different configurations of the pipeline
targets:
  dev:
    default: true
    workspace:
      host: https://company.cloud.databricks.com


# Resources can be defined in multiple files
resources:
  jobs:
    # The resource name used here is used to reference this job elsewhere in your DAB if needed
    default_python_job:
      # Use the REST API Documentation to understand what configuration options are available for each resource type
      name: default_python_job
      tasks:
        - task_key: notebook_task
          notebook_task:
            notebook_path: ../src/notebook.ipynb
```


# Using Databricks Asset Bundles with Lakeflow Declaritive Pipelines
For each of the resource types you can deploy using DABs, you can refer to the Databricks REST API documentation to see what fields are available. Here's a (incomplete) list of configuration options for pipelines today.
```yml
resources:
    pipelines:
        <pipeline-resource-name>: # DAB resource name for reference within asset bundle (this does not impact the deployed pipeline)
            name: <pipeline name> # Friendly identifier for this pipeline.
            catalog: <catalog> # A catalog in Unity Catalog to publish data from this pipeline to. 
            schema: <schema> # The default schema (database) where tables are read from or published to.
            root_path: <pipeline root> # Relative path for the root of this pipeline. This is used as the root directory when editing the pipeline in the Databricks user interface and it is added to sys.path when executing Python sources during pipeline execution.
            development: <true/false> # Whether the pipeline is in Development mode. Defaults to false.
            libraries:
                - glob:
                    include: <path to folder or file> # Files to include as part of the pipeline. Path can be a notebook, a sql or python file or a folder path that ends in /**
                - glob:
                    include: <path to folder or file> # Multiple paths can be included here

```

See all the available configurations for pipelines [here](https://docs.databricks.com/api/workspace/pipelines/create).

# Deploying this demo using DABs

1. Install the Databricks CLI and authenticate to your workspace. [See the Databricks documentation for instructions on how to set the CLI up.](https://docs.databricks.com/aws/en/dev-tools/cli/tutorial)

2. Download the pipeline-bike folder from your workspace onto your local computer. Use the "Download as Zip (Notebook Source + File)" option, and unzip the folder once it's downloaded.

3. Copy `databricks.yml` from the deployment folder into the pipeline-bike folder and update the catalog, schema and workspace host values to reflect your environment.
    ```yml
    ...

    variables:
    catalog:
      description: Default catalog that pipeline will publish assets to
      default: <replace with your catalog>
    schema:
      description: Default schema that pipeline will publish assets to when no schema is specified in code
      default: <replace with your schema>
    
    ...

    targets:
    dev:
      mode: development
      default: true
      workspace:
        host: <replace with your workspace>
    
    ...

    ```

4. From a terminal, run `databricks bundle validate`. If there are any errors, make sure to review them and address them as necessary.
    ```bash
      $ databricks bundle validate
  
      Name: pipeline-bike
      Target: dev
      Workspace:
        Host: https://company.cloud.databricks.com/
        User: user@company.com
        Path: /Workspace/Users/user@company.com/.bundle/pipeline-bike/dev
    ```

5. Run `databricks bundle deploy` to deploy the bundle.
    ```bash
      $ databricks bundle deploy

      Uploading bundle files to /Workspace/Users/user@company.com/.bundle/pipeline-bike/dev/files...
      Deploying resources...
      Updating deployment state...
      Deployment complete!
    ```

6. Run `databricks bundle run generate_bike_data` to kick off a job to populate the raw data and start the pipeline.
   ```bash
    databricks bundle run generate_bike_data
    Run URL: https://company.cloud.databricks.com/...

    2025-08-29 14:35:33 "[dev user_name] init-pipeline-bike" RUNNING
    2025-08-29 14:40:05 "[dev user_name] init-pipeline-bike" TERMINATED SUCCESS
   ```