Skip to content

datacoon/awesome-dataops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Awesome DataOps Awesome

Awesome list of DataOps open source software, online services, courses and use cases

Table of contents

Opensource

Data Pipeline Orchestration

  • Apache Airlow - Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
  • Apache Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
  • Dagster - A Python library for building data applications: ETL, ML, Data Pipelines, and more.
  • DBT Cmd tool - the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
  • Reflow - A language and runtime for distributed, incremental data processing in the cloud

ETL tools

  • Apache Kafka - a distributed streaming platform.
  • Apache Nifi - an easy to use, powerful, and reliable system to process and distribute data.
  • Squirrel - a Python library for large-scale data loading, transforming and sharing.

Commercial products and services

Platforms

  • Astronomer - spin up and scale Apache Airflow clusters
  • Databand - Databand tracks your pipeline execution metadata, so you can evaluate changes in runtimes, code, data, and critical business KPIs.
  • DataKitchen - end-to-end DataOps platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing, and monitoring to development and deployment.
  • Prefect - is a new workflow management system, designed for modern infrastructure and powered by open-source software.
  • Saagie - Saagie DataOps Orchestrator integrates the commercial and open source data technologies to accelerate project delivery
  • Unravel - helps ops engineers, app developers, and enterprise architects reduce the complexity of delivering reliable application performance – providing unified visibility and operational intelligence to optimize your entire ecosystem

Cloud ETL

  • AWS Glue - is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
  • Azure Data Factory - a hybrid data integration service, simplified ETL operations
  • Google Cloud Dataflow - unified stream and batch data processing that's serverless, fast, and cost-effective.
  • ETLWorks - a cloud-first, any-to-any data integration platform

Data catalogs

Testing and monitoring

  • RightData - is a data testing, reconciliation, validation suite that allows stakeholders in identifying issues related to data consistency, quality, completeness, and gaps.

Releases

No releases published

Packages

No packages published