Welcome to the Advanced Databricks Accelerator! This project provides a standardized approach to deploying Medallion Architecture (Bronze, Silver, Gold) pipelines on Databricks.
- Architectures Supported: Batch and Streaming
- Frameworks Supported: Delta Live Tables (DLT) and Traditional Workflows (non-DLT using Notebooks)
- Execution: Supports both Standard Databricks Compute and Serverless Compute
- Dynamic Configuration: Comes with an interactive deployment wizard (
deploy.py) that generates the associateddatabricks.ymlto bundle your codebase based on your environment configurations.
deploy.py: The deployment wizard that configures your pipelines and generates thedatabricks.ymlconfiguration.src/: Contains the pipeline code.pipelines/ldp_pipeline/: DLT transformation logic.pipelines/spark_batch/: Notebooks for non-DLT Batch ingestion and transformations.pipelines/structured_streaming/: Notebooks for non-DLT Structured Streaming ingestion and transformations.
databricks.yml: The dynamically generated Databricks Asset Bundle (DAB) file used for deployment.
-
Prerequisites:
- Databricks CLI installed and configured.
- A target Databricks Workspace.
-
Configure Pipeline Setup: Run the deployment wizard to configure your bundle properties:
python deploy.py
The interactive wizard will guide you through:
- Selecting Batch vs Streaming.
- Selecting DLT vs Non-DLT.
- Specifying source/target Catalogs, Schemas, and Volumes.
- Configuring Compute (Serverless or standard existing clusters).
-
Deployment: Once
deploy.pyhas successfully built yourdatabricks.yml, follow the prompt at the end of the script to deploy automatically via the Databricks CLI, or execute the deployment manually:databricks bundle deploy -t dev
Make sure you have raw sample data available in the designated source Volume path to properly process the files into your Bronze schema.