Skip to content

dashbook/target-iceberg

Repository files navigation

Singer Target for Iceberg Tables

This is a Singer target that loads data from Singer streams into Iceberg tables. This reposotory provides mulitple singer targets, one for each iceberg catalog. These are:

Features

  • Creates Iceberg tables automatically if they don't exist
  • Incrementally loads versions into Iceberg tables
  • Converts Singer stream schemas into Iceberg table schemas
  • Validates records against the Singer stream schema
  • Generates metadata about syncs for data governance

Usage

The target ingests singer messages into the Icberg tables and stores the state in the singer-bookmark property.

Sync mode

To run:

target-iceberg-sql --config config.json

Configuration

Example:

{
    "streams": {
      "inventory-orders": { 
        "identifier": "bronze.inventory.orders",
        "replicationMethod": "LOG_BASED"
      },
      "inventory-customers": { 
        "identifier": "bronze.inventory.customers",
        "replicationMethod": "LOG_BASED"
      },
      "inventory-products": { 
        "identifier": "bronze.inventory.products",
        "replicationMethod": "LOG_BASED"
      }
    },
    "bucket": "s3://example-postgres",
    "catalogName": "bronze",
    "catalogUrl": "postgres://postgres:postgres@postgres:5432",
    "awsRegion": "us-east-1",
    "awsAccessKeyId": "AKIAIOSFODNN7EXAMPLE",
    "awsSecretAccessKey": "$AWS_SECRET_ACCESS_KEY",
    "awsEndpoint": "http://localstack:4566",
    "awsAllowHttp": "true"
}

The configuration consists of 3 parts:

  • General parameters
  • Catalog parameters
  • Object store parameters

General parameters

The general parameters apply to each catalog and object store.

Parameter Description
streams A map of streams to replicate. Each stream is a map with the fields: identifier, replicationMethod(optional)
bucket (optional) Object store bucket where the iceberg tables should be stored (optional)

Catalog parameters

Only one set of catalog parameters should be used in the configuration. Choose which catalog you want to use.

SQL catalog

Parameter Description
catalogName The name of the catalog
catalogUrl The connection url of the catalog

Object store parameters

Only one set of object store parameters should be used in the configuration. Choose which object store you want to use.

AWS S3

Parameter Description
awsRegion The region of the bucket
awsAccessKeyId The access key id
awsSecretAccessKey The secret access key
awsEndpoint (optional) The endpoint of the object store
awsAllowHttp (optional) Allow http connections to the object store

Docker containers

Contributing

Feel free to open issues for any feedback or ideas! PRs are welcome.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages