Skip to content

fujiwara/Rin

Repository files navigation

Rin

Rin is a Redshift data Importer by SQS messaging.

Architecture

  1. (Someone) creates a S3 object.
  2. S3 event notifications will send to a message to SQS.
  3. Rin will fetch messages from SQS, and publish a "COPY" query to Redshift.

Installation

Binary packages

Releases

Homebrew

$ brew install fujiwara/tap/rin

Docker

GitHub Packages

$ docker pull ghcr.io/fujiwara/rin:v1.1.3

Configuration

Configuring Amazon S3 Event Notifications.

  1. Create SQS queue.
  2. Attach SQS access policy to the queue. Example Walkthrough 1:
  3. Enable Event Notifications on a S3 bucket.
  4. Run rin process with configuration for using the SQS and S3.

config.yaml

queue_name: my_queue_name    # SQS queue name

credentials:
  aws_region: ap-northeast-1

redshift:
  host: localhost
  port: 5439
  dbname: test
  user: test_user
  password: '{{ must_env "REDSHIFT_PASSWORD" }}'
  schema: public
  reconnect_on_error: true # disconnect Redshift on error occurred

s3:
  bucket: test.bucket.test
  region: ap-northeast-1

sql_option: "JSON 'auto' GZIP"       # COPY SQL option

# define import target mappings
targets:
  - s3:
      key_prefix: test/foo/ignore
    discard: true  # Do not import and do not try following targets. Matches only.

- redshift:
      table: foo
    s3:
      key_prefix: test/foo

  - redshift:
      schema: xxx
      table: bar
    s3:
      key_prefix: test/bar
    break: true       # Do not try following targets.

  - redshift:
      schema: $1      # expand by key_regexp captured value.
      table: $2
    s3:
      key_regexp: test/schema-([a-z]+)/table-([a-z]+)/

  - redshift:
      host: redshift.example.com       # override default section in this target
      port: 5439
      dbname: example
      user: example_user
      password: example_pass
      schema: public
      table: example
    s3:
      bucket: redshift.example.com
      region: ap-northeast-1
      key_prefix: logs/example/
    sql_option: "CSV DELIMITER ',' ESCAPE"

A configuration file is parsed by kayac/go-config.

go-config expands environment variables using syntax {{ env "FOO" }} or {{ must_env "FOO" }} in a configuration file.

When the password for Redshift is empty, Rin will try call GetClusterCredentials API to get a temporary password for the cluster.

Credentials

Rin requires credentials for SQS and Redshift.

  1. credentials.aws_access_key_id and credentials.aws_secret_access_key
  • used for SQS and Redshift(COPY query and Data API access).
  1. credentials.aws_iam_role
  • used for Redshift COPY query only.
  • for SQS and Redshift Data API, Rin will try to get a instance credentials.

Run

daemon mode

Rin waits new SQS messages and processing it continually.

$ rin -config config.yaml [-debug]

-config also accepts HTTP/S3/File URL to specify the location of configuration file. For example,

$ rin -config s3://rin-config.my-bucket/config.yaml

batch mode

Rin process new SQS messages and exit.

$ rin -config config.yaml -batch [-debug]

Set max execution time

A CLI option -max-execution-time is set max execution time for running SQS worker and batch process.

SQL Drivers

Rin has two ways to connect to Redshift.

postgres driver

postgres driver is the default. Rin connects to Redshift with PostgreSQL protocol over TCP in the VPC network.

host, port, user and password fields are required in the redshift section.

redshift:
  driver: postgres # default
  host: localhost
  port: 5439
  user: test_user
  password: '{{ must_env "REDSHIFT_PASSWORD" }}'

redshift-data driver

redshift-data driver connects to Redshift via Redshift Data API.

Redshift Data API does not require a VPC network.

With provisoned cluster, driver, cluster and user are required.

redshift:
  driver: redshift-data
  cluster: your-cluster-name
  user: test_user

With Redshift serverless, driver, workgroup are required.

redshift:
  driver: redshift-data
  workgroup: your-workgroup-name

See also github.com/mashiike/redshift-data-sql-driver.