Skip to content

Convert parquet files from Data Lake in AWS S3 into Delta Lake

Notifications You must be signed in to change notification settings

cj-zhukov/delta-convertor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

delta-convertor

delta-convertor is a Rust library for converting parquet files from Data Lake in AWS S3 into Delta Lake.

Description

delta-convertor reads parquet files from provided source, converts them to delta format and writes into delta table. Delta tables will be available in AWS Glue catalog and can be queried with AWS Athena.

Installation

Use the package manager cargo or docker to install delta-convertor.

Usage:

  1. run delta-convertor with mode="init" for initialization empty delta table in AWS S3 (you will see only delta_log folder)
  2. run AWS Crawler with delta source option for creating delta table in AWS Glue catalog (need to do it only once)
  3. run delta-convertor with mode="append" for writing parquet files from Data Lake to Delta Lake

About

Convert parquet files from Data Lake in AWS S3 into Delta Lake

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published