Overview

Versatile Data Kit (VDK) is a data framework that enables Data Engineers to

🧑‍💻develop,
▶️run,
📊and manage data workloads, aka data jobs

Its Lego-like design consists of lightweight Python modules installed via pip package manager. All VDK plugins are easy to combine.

VDK CLI can generate a data job and run your Python code and SQL queries.

🎯VDK SDK makes your code shorter, more readable, and faster to create.
🚦Ready-to-use data ETL/ELT patterns make Data Engineering with VDK efficient.

Data Engineers use VDK to implement automatic pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database or any other data storage.

Data Journey and Versatile Data Kit

VDK creates data processing workflows to:

Ingest data (extract)
Transform data (transform)
Export data (load)

Solve common data engineering problems

Ingest data from different sources, including CSV files, JSON objects, and data from REST API services.
Use Python/SQL and VDK templates to transform data.
Ensure data applications are packaged, versioned, and deployed correctly while dealing with credentials, retries, and reconnects.
Provide built-in monitoring and smart notification capabilities.
Track both code and data modifications and the relationship between them, allowing quicker troubleshooting and version rollback.

What VDK can do

Getting Started

Create and run data jobs locally

pip install quickstart-vdk

This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment.

See also the Getting Started section of the wiki

Run the Control Service locally with Docker and Kubernetes

Using Kubernetes for your data jobs workflow provides additional benefits, such as continuous delivery, easier collaboration, streamlined data job orchestration, high availability, security, and job runtime isolation

More info https://kubernetes.io/docs/concepts/overview/

Prerequisites

helm
docker
kind (version 0.11.1 or later)

vdk server --install

You can then use the vdk cli to create and deploy jobs and the UI to manage them.

Next Steps

▶️ Getting started with VDK Operations UI
📖 Use case examples that show how VDK fits into the data workflow.
📖 VDK with Trino DB.
🗣️ Get to know us and ask questions at our community meeting

Additional Resources

📖 Running in production
📖 Documentation for VDK.
▶️ VDK Operations UI Overview

Contributing

Create an issue or pull request on GitHub to submit suggestions or changes. If you are interested in contributing as a developer, visit the contributing page.

Contacts

Connect on Slack by:
- Joining the CNCF Slack workspace.
- Joining the #versatile-data-kit channel.
Follow us on Twitter.
Subscribe to the Versatile Data Kit YouTube Channel.
Join our development mailing list, used by developers and maintainers of VDK.

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 1,519 Commits
.github		.github
cicd		cicd
events		events
examples		examples
projects		projects
specs		specs
support		support
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitlint		.gitlint
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
NOTICE.txt		NOTICE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Data Journey and Versatile Data Kit

Solve common data engineering problems

What VDK can do

Getting Started

Create and run data jobs locally

Run the Control Service locally with Docker and Kubernetes

Next Steps

Additional Resources

Contributing

Contacts

Code of Conduct

About

Releases

Packages

Languages

License

aaalzya/versatile-data-kit

Folders and files

Latest commit

History

Repository files navigation

Overview

Data Journey and Versatile Data Kit

Solve common data engineering problems

What VDK can do

Getting Started

Create and run data jobs locally

Run the Control Service locally with Docker and Kubernetes

Next Steps

Additional Resources

Contributing

Contacts

Code of Conduct

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages