embulk filter to strip HTML tags into plain texts
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config/checkstyle
gradle/wrapper
lib/embulk/filter
src
.gitignore
.ruby-version
LICENSE.txt
README.md
build.gradle
gradlew
gradlew.bat

README.md

Strip Html Tags filter plugin for Embulk

This plugin strips HTML tags from values of specified columns.

Overview

  • Plugin type: filter

Configuration

  • columns: column names (array, default: [])

Example

This settings strips tags on column foo and bar, leaves other columns untouched.

in:
  type: file
  path_prefix: ./test.csv
  parser:
    type: csv
    charset: UTF-8
    delimiter: ","
    columns:
      - {name: foo, type: string}
      - {name: bar, type: string}
      - {name: baz, type: string}

filters:
  - type: strip_html_tags
    columns:
      - foo
      - bar

out:
  type: stdout

it converts a CSV record like this:

<a>foo</a>,<div>bar</div>,<p>baz</p>

into:

foo,bar,<p>baz</p>

Build

$ ./gradlew gem