Permalink
Fetching contributors…
Cannot retrieve contributors at this time
54 lines (40 sloc) 1.31 KB

Dataduct

Dataduct - DataPipeline for humans

Dataduct is a wrapper built on top of AWS Datapipeline which makes it easy to create ETL jobs. All jobs can be specified as a series of steps in a YAML file and would automatically be translated into datapipeline with appropriate pipeline objects.

Features include:

  • Visualizing pipeline activities
  • Extracting data from different sources such as RDS, S3, local files
  • Transforming data using EC2 and EMR
  • Loading data into redshift
  • Transforming data inside redshift
  • QA data between the source system and warehouse

It is easy to create custom steps to augment the DSL as per the requirements. As well as running a backfill with the command line interface.

Contents:

.. toctree::
   :maxdepth: 2

   introduction
   installation
   commands
   config
   creating_an_etl
   steps
   input_output
   hooks
   dataduct

Indices and tables