Debussy Concert

Debussy is a free, open-source, opinionated Data Architecture and Engineering framework. It enables data analysts and engineers to build better data platforms through first class data pipelines, following a low-code and self-service approach.

Description · Key Features · Key Benefits · Quick Start · Integrations
Full Documentation · Communication · Contributions · License

Description

In the data engineering field, everyone is reinventing the wheel all the time – it's still rare to see the adoption of software engineering best practices, such as DRY, KISS or YAGNI. Despite the existence of several tools for data orchestration (e.g. Apache Airflow, Prefect, Dagster) and distributed data processing (e.g. Apache Spark, Apache Beam), every time a new data pipeline demand arises it usually implies lengthy development projects. Think of developing a web application without the help of a web framework such as Django or Flask!

What's even worse, although sharing key concepts, these data orchestration tools have very distinct syntaxes and features, making migrations a daunting task! Moreover, simply adopting these tools does not guarantee that best practices are being followed, including with regard to data architecture (think of data modeling, data management lifecycle, among others).

While lots of companies have faced these same issues, most of them have decided to develop their own in-house solutions, missing the opportunity for colaboration and wider adoption of data architecture and sofware engineering best practices.

With that in mind, we created Debussy! Debussy Concert is the core component of Debussy. It's a code generation engine for orchestration tools, currently supporting only Airflow, but with others on the Roadmap. It provides abstraction layers in the form of a musical themed semantic model, decoupling the pipeline logic to the underlying orchestration tool, and enabling a low-code approach to data engineering. We also provides pipelines templates (e.g. data ingestion, data transformation and reverse ETL) built with our engine, while always striving to offer the aforementioned best practices.

Key Features

Dynamic data pipeline generation from YAML configuration files or directly through Python
Provides a semantic model for data pipeline development, abstracting the inner orchestration engine
Enables seamless integration of first class data projects, such as Airflow, Spark, and dbt

Key Benefits

✔ It provides lower time to delivery and costs related to data pipeline development, while enabling higher ROI
✔ Avoid pipeline debt by following sound software engineering design principles
✔ Ensure your platform is following data architecture best practices

Quick Start

Debussy works on any installation of Apache Airflow 2.0, but since we currently support only GCP based data platforms as the target Data Lakehouse, we recommend a deployment to Cloud Composer.

In order to use Debussy, you first need to go through the following steps:

Select or create a Google Cloud Platform project.
Enable billing for your project.
Create a Cloud Composer 2 environment.
Install Debussy on your Cloud Composer instance: just upload the project to your plugins/ folder.
Check our User's Guide and examples to learn how to use it!

Integrations

Debussy works with the tools and systems that you're already using with your data, including:

Integration		Notes
	Apache Airflow	An open source orchestration engine
	Spark	Open source distributed processing engine, used for the data ingestion pipelines
	dbt	dbt is an open-source data transformation tool, used for the data transformation pipelines
	Google Cloud Storage	Cloud based blob storage, supported as data source or destination
	BigQuery	Google serverless massive-scale SQL analytics platform, supported as the analytical environment (aka. Data Lakehouse)
	MySQL	Leading open source database, supported as a data source or destination
	PostgreSQL	Leading open source database, supported as a data source or destination
	Other SQL Relational DBs	Most RDBMS are supported as data sources via JDBC drivers through Spark
	AWS S3	Cloud based blob storage, supported as data source or destination

Full Documentation

See the Wiki for full documentation, examples, operational details and other information.

Communication

GitHub Issues

Discord Server

Contributions

We welcome all community contributions!

In order to have a more open and welcoming community, Debussy adheres to a code of conduct adapted from Contributor Covenant.

Please read through our contributing guidelines. Included are directions for opening issues, coding standards, and notes on development.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 464 Commits
debussy_concert		debussy_concert
docs		docs
examples		examples
tests		tests
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

debussy-labs/debussy_concert

Folders and files

Latest commit

History

Repository files navigation

Debussy Concert

Description

Key Features

Key Benefits

Quick Start

Integrations

Full Documentation

Communication

Contributions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages