Skip to content

An input plugin for Embulk (https://github.com/embulk/embulk/) that unions all data loaded by your defined embulk input & filters plugin configuration.

License

Notifications You must be signed in to change notification settings

civitaspo/embulk-input-union

Repository files navigation

embulk-input-union

Release CI Status Badge Test CI Status Badge

An input plugin for Embulk that unions all data loaded by your defined embulk input & filters plugin configuration.

Overview

  • Plugin type: input
  • Resume supported: no
  • Cleanup supported: yes
  • Guess supported: no

Configuration

  • union: Embulk configurations for input data. (array of config, required)
    • name: The name of the bulk load. (string, optional)
    • exec: The embulk execution configuration. (config, optional)
      • max_threads: Maximum number of threads to run concurrently. (int, default: The number of available CPU cores)
    • in: The embulk input plugin configuration. (config, required)
    • filters: The embulk filter plugin configurations. (array of config, default: [])

Example

in:
  type: union
  union:
    - in:
        type: file
        path_prefix: ./example/data01.tsv
        parser:
          type: csv
          delimiter: "\t"
          skip_header_lines: 0
          null_string: ""
          columns:
            - { name: id, type: long }
            - { name: description, type: string }
            - { name: name, type: string }
            - { name: t, type: timestamp, format: "%Y-%m-%d %H:%M:%S %z" }
            - { name: payload, type: json }
          stop_on_invalid_record: true
      filters:
        - type: column
          add_columns:
            - { name: group_name, type: string, default: "data01" }
    - name: example
      in:
        type: file
        path_prefix: ./example/data02.tsv
        parser:
          type: csv
          delimiter: "\t"
          skip_header_lines: 0
          null_string: ""
          columns:
            - { name: id, type: long }
            - { name: description, type: string }
            - { name: name, type: string }
            - { name: t, type: timestamp, format: "%Y-%m-%d %H:%M:%S %z" }
            - { name: payload, type: json }
          stop_on_invalid_record: true
      filters:
        - type: column
          add_columns:
            - { name: group_name, type: string, default: "data02" }

out:
  type: stdout

Development

Run examples

$ ./gradlew gem --write-locks
$ embulk bundle install --gemfile ./example/Gemfile
$ embulk run example/config.yml -I build/gemContents/lib -b example

Run tests

$ ./gradlew scalatest

Run the formatter

## Just check the format violations
$ ./gradlew spotlessCheck

## Fix the all format violations
$ ./gradlew spotlessApply

Build

$ ./gradlew gem --write-locks  # -t to watch change of files and rebuild continuously

Release a new gem

$ ./gradlew gemPush

CHANGELOG

CHANGELOG.md

License

MIT LICENSE

About

An input plugin for Embulk (https://github.com/embulk/embulk/) that unions all data loaded by your defined embulk input & filters plugin configuration.

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published