Convert string to timestamp at high speed.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config/checkstyle
gradle/wrapper
lib/embulk/filter
performance
src
.gitignore
.travis.yml
LICENSE.txt
README.md
build.gradle
gradlew
gradlew.bat

README.md

Timestamp Hs filter plugin for Embulk

Gem Version Build Status

Convert string to timestamp at high speed.

Overview

  • Plugin type: filter

Configuration

  • default_timezone: Default timezone (string, default: UTC)
  • default_timestamp_format: Default timezone format by SimpleDateFormat style (string, default: yyyy-MM-dd hh:mm:ss)
  • column_options: Timestamp column options (hash, required)
    • timezone: Timezone (hash, default: default_timezone)
    • format: Timestamp format (hash, default: default_timestamp_format)

Example

2016-01-01 10:02:30.100,2016-01-01 10:02:30.111
2016-01-02 10:02:30.200,2016-01-02 10:02:30.211
2016-01-03 10:02:30.300,2016-01-03 10:02:30.311
2016-01-04 10:02:30.400,2016-01-04 10:02:30.411
2016-01-05 10:02:30.500,2016-01-05 10:02:30.511
in:
  type: file
  path_prefix: applog
  parser:
    type: csv
    delimiter: ","
    columns:
    - {name: standardTimestamp, type: timestamp, format: '%Y-%m-%d %H:%M:%S.%L'}
    - {name: highSpeedTimestamp, type: string}

filters:
  - type: timestamp_hs
    default_timezone: 'UTC'
    column_options:
      highSpeedTimestamp: {format: 'yyyy-MM-dd hh:mm:ss.SSS'}
+-----------------------------+------------------------------+
| standardTimestamp:timestamp | highSpeedTimestamp:timestamp |
+-----------------------------+------------------------------+
| 2016-01-01 10:02:30.100 UTC |  2016-01-01 10:02:30.111 UTC |
| 2016-01-02 10:02:30.200 UTC |  2016-01-02 10:02:30.211 UTC |
| 2016-01-03 10:02:30.300 UTC |  2016-01-03 10:02:30.311 UTC |
| 2016-01-04 10:02:30.400 UTC |  2016-01-04 10:02:30.411 UTC |
| 2016-01-05 10:02:30.500 UTC |  2016-01-05 10:02:30.511 UTC |
+-----------------------------+------------------------------+

Performance

Input file is 1,000,000 lines with one timestamp column.

  • OS: Windows 10
  • CPU: Core i5 2.67GHz
  • Embulk: 0.8.6

Embulk standard timestamp: 240.910s

With timestamp_hs filter: 1.902s

Install

$ embulk gem install embulk-filter-timestamp_hs