Advanced Databricks Accelerator

Welcome to the Advanced Databricks Accelerator! This project provides a standardized approach to deploying Medallion Architecture (Bronze, Silver, Gold) pipelines on Databricks.

Features

Architectures Supported: Batch and Streaming
Frameworks Supported: Delta Live Tables (DLT) and Traditional Workflows (non-DLT using Notebooks)
Execution: Supports both Standard Databricks Compute and Serverless Compute
Dynamic Configuration: Comes with an interactive deployment wizard (deploy.py) that generates the associated databricks.yml to bundle your codebase based on your environment configurations.

Project Structure

deploy.py: The deployment wizard that configures your pipelines and generates the databricks.yml configuration.
src/: Contains the pipeline code.
- pipelines/ldp_pipeline/: DLT transformation logic.
- pipelines/spark_batch/: Notebooks for non-DLT Batch ingestion and transformations.
- pipelines/structured_streaming/: Notebooks for non-DLT Structured Streaming ingestion and transformations.
databricks.yml: The dynamically generated Databricks Asset Bundle (DAB) file used for deployment.

Getting Started

Prerequisites:
- Databricks CLI installed and configured.
- A target Databricks Workspace.
Configure Pipeline Setup: Run the deployment wizard to configure your bundle properties:
```
python deploy.py
```
The interactive wizard will guide you through:
- Selecting Batch vs Streaming.
- Selecting DLT vs Non-DLT.
- Specifying source/target Catalogs, Schemas, and Volumes.
- Configuring Compute (Serverless or standard existing clusters).
Deployment: Once deploy.py has successfully built your databricks.yml, follow the prompt at the end of the script to deploy automatically via the Databricks CLI, or execute the deployment manually:
```
databricks bundle deploy -t dev
```

Development

Make sure you have raw sample data available in the designated source Volume path to properly process the files into your Bronze schema.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sample data		sample data
src		src
temp_folder		temp_folder
README.md		README.md
databricks.yml		databricks.yml
deploy.py		deploy.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Databricks Accelerator

Features

Project Structure

Getting Started

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced Databricks Accelerator

Features

Project Structure

Getting Started

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages