Mask filter plugin for Embulk

Mask columns with asterisks in a variety of patterns (still in initial development phase and missing basic features to use in production).

Overview

Plugin type: filter

Configuration

Caution : Now we use type to specify mask types such as all and email, instead of pattern which was used in version 0.1.1 or earlier.

columns: target columns which would be replaced with asterisks (string, required)
- name: name of the column (string, required)
- type: mask type, all, email, regex or substring (string, default: all)
- paths: list of JSON path and type, works if the column type is JSON
  - [{key: $.json_path1}, {key: $.json_path2}] would mask both $.json_path1 and $.json_path2 nodes
  - Elements under the nodes would be converted to string and then masked (e.g., [0,1,2] -> *******)
- length: if specified, this filter replaces the column with fixed number of asterisks (integer, optional. supported only in all, email, substring.)
- pattern: Regex pattern such as "[0-9]+" (string, required for regex type)
- start: The beginning index for substring type. The value starts from 0 and inclusive (integer, default: 0)
- end: The ending index for substring type. The value is exclusive (integer, default: length of the target column)

Example

If you have below data in csv or other format file,

first_name	last_name	gender	age	contact
Benjamin	Bell	male	30	bell.benjamin_dummy@example.com
Lucas	Duncan	male	20	lucas.duncan_dummy@example.com
Elizabeth	May	female	25	elizabeth.may_dummy@example.com
Christian	Reid	male	15	christian.reid_dummy@example.com
Amy	Avery	female	40	amy.avercy_dummy@example.com

below filter configuration

filters:
  - type: mask
    columns:
      - { name: last_name}
      - { name: age}
      - { name: contact, type: email, length: 5}

would produce

first_name	last_name	gender	age	contact
Benjamin	****	male	**	*****@example.com
Lucas	******	male	**	*****@example.com
Elizabeth	***	female	**	*****@example.com
Christian	****	male	**	*****@example.com
Amy	*****	female	**	*****@example.com

If you use regex and/or substring types,

filters:
  - type: mask
    columns:
      - { name: first_name, type: regex, pattern: "[a-z]"}
      - { name: contact, type: substring, start: 5, length: 5}

would produce

first_name	last_name	gender	age	contact
B*******	Bell	male	30	bell.*****
L****	Duncan	male	20	lucas*****
E*******	May	female	25	eliza*****
C********	Reid	male	15	chris*****
A**	Avery	female	40	amy.a*****

JSON type column is also partially supported.

If you have a user column with this JSON data structure

{
  "full_name": {
    "first_name": "Benjamin",
    "last_name": "Bell"
  },
  "gender": "male",
  "age": 30,
  "email": "test_mail@example.com"
}

below filter configuration

filters:
  - type: mask
    columns:
      - { name: user, paths: [{key: $.full_name.first_name}, {key: $.email, type: email}]}

would produce

{
  "full_name": {
    "first_name": "********",
    "last_name": "Bell"
  },
  "gender": "male",
  "age": 30,
  "email": "*********@example.com"
}

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
config/checkstyle		config/checkstyle
gradle/wrapper		gradle/wrapper
lib/embulk/filter		lib/embulk/filter
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mask filter plugin for Embulk

Overview

Configuration

Example

Build

About

Releases

Packages

Contributors 2

Languages

License

beniyama/embulk-filter-mask

Folders and files

Latest commit

History

Repository files navigation

Mask filter plugin for Embulk

Overview

Configuration

Example

Build

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages