Skip to content
A guide of how to build good Data Pipelines with Databricks Connect using best practices
Python PowerShell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.vscode rework Sep 20, 2019
configs rework Sep 20, 2019
pipelines rework Sep 20, 2019
tests rework Sep 20, 2019
.gitignore rework Sep 20, 2019
Build.ps1 rework Sep 20, 2019
Deploy.ps1 get module Sep 20, 2019
README.md tiny typo that had a noob like me stuck for a bit (#3) Oct 23, 2019
azure-pipelines.yml Created AzureDevOps Public Build (#2) Sep 20, 2019
main.py rework Sep 20, 2019
requirements.txt Created AzureDevOps Public Build (#2) Sep 20, 2019
setup.py rework Sep 20, 2019
simpleExecute.py rework Sep 20, 2019

README.md

Build Status

Developing with Databricks-Connect & Azure DevOps

A guide of how to build good Data Pipelines with Databricks Connect using best practices. Details: https://datathirst.net/blog/2019/9/20/series-developing-a-pyspark-application

About

This is a sample Databricks-Connect PySpark application that is designed as a template for best practice and useability.

The project is designed for:

  • Python local development in an IDE (VSCode) using Databricks-Connect
  • Well structured PySpark application
  • Simple data pipelines with reusable code
  • Unit Testing with Pytest
  • Build into a Python Wheel
  • CI Build with Test results published
  • Automated deployments/promotions

Setup

Create a Conda Environment (open Conda prompt):

conda create --name dbconnectappdemo python=3.5

Activate the environment:

conda activate dbconnectappdemo

IMPORTANT: Open the requirements.txt in the root folder and ensure the version of databrick-connect matches your cluster runtime.

Install the requirements into your environments:

pip install -r requirements.txt

If you need to setup databricks-connect then run:

databricks-connect configure

More help here & here

Setup Deployment

If you would like to deploy from your local PC to Databricks create a file in the root called MyBearerToken.txt and paste in a bearer token from the Databricks UI.

Copyright Data Thirst Ltd (2019)

You can’t perform that action at this time.