An Embulk filter plugin to mask columns with asterisks in a variety of patterns.
Java Ruby
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
config/checkstyle
gradle/wrapper
lib/embulk/filter
src
.gitignore
.travis.yml
LICENSE.txt
README.md
build.gradle
gradlew
gradlew.bat

README.md

Mask filter plugin for Embulk

Coverage Status

Mask columns with asterisks in a variety of patterns (still in initial development phase and missing basic features to use in production).

Overview

  • Plugin type: filter

Configuration

Caution : Now we use type to specify mask types such as all and email, instead of pattern which was used in version 0.1.1 or earlier.

  • columns: target columns which would be replaced with asterisks (string, required)
    • name: name of the column (string, required)
    • type: mask type, all, email, regex or substring (string, default: all)
    • paths: list of JSON path and type, works if the column type is JSON
      • [{key: $.json_path1}, {key: $.json_path2}] would mask both $.json_path1 and $.json_path2 nodes
      • Elements under the nodes would be converted to string and then masked (e.g., [0,1,2] -> *******)
    • length: if specified, this filter replaces the column with fixed number of asterisks (integer, optional. supported only in all, email, substring.)
    • pattern: Regex pattern such as "[0-9]+" (string, required for regex type)
    • start: The beginning index for substring type. The value starts from 0 and inclusive (integer, default: 0)
    • end: The ending index for substring type. The value is exclusive (integer, default: length of the target column)

Example

If you have below data in csv or other format file,

first_name last_name gender age contact
Benjamin Bell male 30 bell.benjamin_dummy@example.com
Lucas Duncan male 20 lucas.duncan_dummy@example.com
Elizabeth May female 25 elizabeth.may_dummy@example.com
Christian Reid male 15 christian.reid_dummy@example.com
Amy Avery female 40 amy.avercy_dummy@example.com

below filter configuration

filters:
  - type: mask
    columns:
      - { name: last_name}
      - { name: age}
      - { name: contact, type: email, length: 5}

would produce

first_name last_name gender age contact
Benjamin **** male ** *****@example.com
Lucas ****** male ** *****@example.com
Elizabeth *** female ** *****@example.com
Christian **** male ** *****@example.com
Amy ***** female ** *****@example.com

If you use regex and/or substring types,

filters:
  - type: mask
    columns:
      - { name: last_name, type: regex, pattern: "[a-z]"}
      - { name: contact, type: substring, start: 5, length: 5}

would produce

first_name last_name gender age contact
B******* Bell male 30 bell.*****
L**** Duncan male 20 lucas*****
E******* May female 25 eliza*****
C******** Reid male 15 chris*****
A** Avery female 40 amy.a*****

JSON type column is also partially supported.

If you have a user column with this JSON data structure

{
  "full_name": {
    "first_name": "Benjamin",
    "last_name": "Bell"
  },
  "gender": "male",
  "age": 30,
  "email": "test_mail@example.com"
}

below filter configuration

filters:
  - type: mask
    columns:
      - { name: user, paths: [{key: $.full_name.first_name}, {key: $.email, type: email}]}    

would produce

{
  "full_name": {
    "first_name": "********",
    "last_name": "Bell"
  },
  "gender": "male",
  "age": 30,
  "email": "*********@example.com"
}

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously