Skip to content

awesome-mlops/awesome-data-management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commits
Β 
Β 
Β 
Β 

Repository files navigation

awesome-data-management

A curated list of awesome open source tools and commercial products to catalog, version, and manage data πŸš€

  • Amundsen: Data discovery and metadata engine for improving the productivity when interacting with data.
  • Apache Atlas: Provides open metadata management and governance capabilities to build a data catalog.
  • CKAN: Open-source DMS (data management system) for powering data hubs and data portals.
  • DataHub: LinkedIn's generalized metadata search & discovery tool.
  • Datatile: A library for managing, validating, summarizing, and visualizing data.
  • Delta Lake: Storage layer that brings scalable, ACID transactions to Apache Spark and other engines.
  • Dolt: SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.
  • DVC: Management and versioning of datasets and machine learning models.
  • Hub: A dataset format for creating, storing, and collaborating on AI datasets of any size.
  • Intake: A lightweight package for finding, investigating, loading and disseminating data.
  • Quilt: A self-organizing data hub with S3 support.
  • lakeFS: Repeatable, atomic and versioned data lake on top of object storage.
  • Magda: A federated, open-source data catalog for all your big data and small data.
  • Marquez: Collect, aggregate, and visualize a data ecosystem's metadata.
  • Metacat: Unified metadata exploration API service for Hive, RDS, Teradata, Redshift, S3 and Cassandra.
  • Milvus: An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy.
  • OpenMetadata: A Single place to discover, collaborate and get your data right.
  • Spark: Unified analytics engine for large-scale data processing.