Skip to content
This repository has been archived by the owner on Jul 15, 2024. It is now read-only.

dataverbinders/nimbletl

Repository files navigation

nimbletl

Documentation Status

Lightweight Python ETL toolkit using Prefect.

Introduction

Flexible Python ETL toolkit for datawarehousing framework based on Dask, Prefect and the pydata stack. It follows the original design principles from these libraries, combined with a functional programming approach to data engineering.

Google Cloud Platform (GCP) is used as the core infrastructure, particularly BigQuery (GBQ) and Cloud Storage (GCS) as the main storage engines. We follow Google's recommendations on how to use BigQuery for data warehouse applications with four layers:

nimble (/ˈnɪmb(ə)l/): quick and light in movement or action; agile, wink at the godfather of the star-schema, Kimball

Usage

pip install -e git+https://github.com/dkapitan/nimbletl.git

A conda environment is included for convenience, containing most commonly use packages.

conda env create -f environment.yml

Try nl-open-data to see nimbletl in action and create a datawarehouse with Dutch open data from various sources.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Lightweight Python ETL toolkit using Prefect.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published