Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Join File filter plugin for Embulk

This plugin combine rows from file having data format like a table, based on a common field between them.

Overview

  • Plugin type: filter

Configuration

  • base_column: a column name of data embulk loaded (hash, required)
    • name: name of the column
    • type: type of the column (see below)
    • format: format of the timestamp if type is timestamp
  • counter_column: a column name of data loaded from file (string, default: {name: id, type: long})
    • name: name of the column
    • type: type of the column (see below)
    • format: format of the timestamp if type is timestamp
  • joined_column_prefix: prefix added to joined data columns (string, default: "_joined_by_embulk_")
  • file_path: path of file (string, required)
  • file_format: file format (string, required, supported: csv, tsv, yaml, json)
  • columns: required columns of data from the file (array of hash, required)
    • name: name of the column
    • type: type of the column (see below)
    • format: format of the timestamp if type is timestamp

type of the column

name description
boolean true or false
long 64-bit signed integers
timestamp Date and time with nano-seconds precision
double 64-bit floating point numbers
string Strings

Example

filters:
  - type: join_file
    base_column: {name: name_id, type: long}
    counter_column: {name: id, type: long}
    joined_column_prefix: _joined_by_embulk_
    file_path: master.json
    file_format: json
    columns:
      - {name: id, type: long}
      - {name: name, type: string}

Run Example

$ ./gradlew classpath
$ embulk run -I lib example/config.yml

Supported Data Format

  • csv ( not implemented )
  • tsv ( not implemented )
  • yaml ( not implemented )
  • json

Supported Data Format Example

CSV

id,name
0,civitaspo
2,mori.ogai
5,natsume.soseki

TSV

Since the representation is difficult, it represents the tab as \t.

id\tname
0\tcivitaspo
2\tmori.ogai
5\tnatsume.soseki

YAML

- id: 0
  name: civitaspo
- id: 2
  name: mori.ogai
- id: 5
  name: natsume.soseki

JSON

[
  {
    "id": 0,
    "name": "civitaspo"
  },
  {
    "id": 2,
    "name": "moriogai"
  },
  {
    "id": 5,
    "name": "natsume.soseki"
  }
]

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

Now only support json format...

Resources

License

Packages

No packages published