Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
pkg
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 

README.md

UTF8Parquet output plugin for Embulk

** This is actually a clone of https://github.com/choplin/embulk-output-parquet/

  • We have added support for UTF-8 instead of binary fields.
  • It converts json objects to string parquet format.

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: no

Install

embulk gem install embulk-output-utf8parquet

Configuration

  • path_prefix: A prefix of output path. This is hadoop Path URI, and you can also include scheme and authority within this parameter. (string, required)
  • file_ext: An extension of output path. (string, default: .parquet)
  • sequence_format: (string, default: .%03d)
  • block_size: A block size of parquet file. (int, default: 134217728(128M))
  • page_size: A page size of parquet file. (int, default: 1048576(1M))
  • compression_codec: A compression codec. available: UNCOMPRESSED, SNAPPY, GZIP (string, default: UNCOMPRESSED)
  • default_timezone: Time zone of timestamp columns. This can be overwritten for each column using column_options
  • default_timestamp_format: Format of timestamp columns. This can be overwritten for each column using column_options
  • column_options: Specify timezone and timestamp format for each column. Format of this option is the same as the official csv formatter. See document.
  • extra_configurations: Add extra entries to Configuration which will be passed to ParquetWriter
  • overwrite: Overwrite if output files already exist. (default: fail if files exist)
  • addUTF8: If true, string columns are stored with OriginalType.UTF8 (boolean, default false)

Example

out:
  type: parquet
  path_prefix: file:///data/output

How to write parquet files into S3

out:
  type: parquet
  path_prefix: s3a://bucket/keys
  extra_configurations:
    fs.s3a.access.key: 'your_access_key'
    fs.s3a.secret.key: 'your_secret_access_key'

Build

$ ./gradlew gem

About

Expand official embulk-output-parquet plugin to support UTF8

Resources

License

Releases

No releases published

Packages

No packages published