Skip to content

VinayKumarBuddhi/Accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Databricks Accelerator

Welcome to the Advanced Databricks Accelerator! This project provides a standardized approach to deploying Medallion Architecture (Bronze, Silver, Gold) pipelines on Databricks.

Features

  • Architectures Supported: Batch and Streaming
  • Frameworks Supported: Delta Live Tables (DLT) and Traditional Workflows (non-DLT using Notebooks)
  • Execution: Supports both Standard Databricks Compute and Serverless Compute
  • Dynamic Configuration: Comes with an interactive deployment wizard (deploy.py) that generates the associated databricks.yml to bundle your codebase based on your environment configurations.

Project Structure

  • deploy.py: The deployment wizard that configures your pipelines and generates the databricks.yml configuration.
  • src/: Contains the pipeline code.
    • pipelines/ldp_pipeline/: DLT transformation logic.
    • pipelines/spark_batch/: Notebooks for non-DLT Batch ingestion and transformations.
    • pipelines/structured_streaming/: Notebooks for non-DLT Structured Streaming ingestion and transformations.
  • databricks.yml: The dynamically generated Databricks Asset Bundle (DAB) file used for deployment.

Getting Started

  1. Prerequisites:

    • Databricks CLI installed and configured.
    • A target Databricks Workspace.
  2. Configure Pipeline Setup: Run the deployment wizard to configure your bundle properties:

    python deploy.py

    The interactive wizard will guide you through:

    • Selecting Batch vs Streaming.
    • Selecting DLT vs Non-DLT.
    • Specifying source/target Catalogs, Schemas, and Volumes.
    • Configuring Compute (Serverless or standard existing clusters).
  3. Deployment: Once deploy.py has successfully built your databricks.yml, follow the prompt at the end of the script to deploy automatically via the Databricks CLI, or execute the deployment manually:

    databricks bundle deploy -t dev

Development

Make sure you have raw sample data available in the designated source Volume path to properly process the files into your Bronze schema.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors