Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@
/classpath/
build/
.idea
*.iml
.ruby-version

19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
# Hdfs output plugin for Embulk
# Hdfs file output plugin for Embulk

A File Output Plugin for Embulk to write HDFS.

## Overview

* **Plugin type**: file output
* **Load all or nothing**: no
* **Load all or nothing**: yes
* **Resume supported**: no
* **Cleanup supported**: no

## Configuration

- **config_files** list of paths to Hadoop's configuration files (array of strings, default: `[]`)
- **config** overwrites configuration parameters (hash, default: `{}`)
- **output_path** the path finally stored files. (string, default: `"/tmp/embulk.output.hdfs_output.%Y%m%d_%s"`)
- **working_path** the path temporary stored files. (string, default: `"/tmp/embulk.working.hdfs_output.%Y%m%d_%s"`)
- **path_prefix** prefix of target files (string, required)
- **file_ext** suffix of target files (string, required)
- **sequence_format** format for sequence part of target files (string, default: `'.%03d.%02d'`)
- **rewind_seconds** When you use Date format in path_prefix property(like `/tmp/embulk/%Y-%m-%d/out`), the format is interpreted by using the time which is Now minus this property. (int, default: `0`)
- **overwrite** overwrite files when the same filenames already exists (boolean, default: `false`)
- *caution*: even if this property is `true`, this does not mean ensuring the idempotence. if you want to ensure the idempotence, you need the procedures to remove output files after or before running.

## Example

Expand All @@ -24,14 +28,13 @@ out:
config_files:
- /etc/hadoop/conf/core-site.xml
- /etc/hadoop/conf/hdfs-site.xml
- /etc/hadoop/conf/mapred-site.xml
- /etc/hadoop/conf/yarn-site.xml
config:
fs.defaultFS: 'hdfs://hdp-nn1:8020'
dfs.replication: 1
mapreduce.client.submit.file.replication: 1
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
path_prefix: '/tmp/embulk/hdfs_output/%Y-%m-%d/out'
file_ext: 'txt'
overwrite: true
formatter:
type: csv
encoding: UTF-8
Expand Down
10 changes: 5 additions & 5 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ configurations {
provided
}

version = "0.1.2"
version = "0.2.0"

sourceCompatibility = 1.7
targetCompatibility = 1.7
Expand All @@ -22,7 +22,7 @@ dependencies {
provided "org.embulk:embulk-core:0.7.0"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
compile 'org.apache.hadoop:hadoop-client:2.6.0'
compile 'com.google.guava:guava:14.0'
compile 'com.google.guava:guava:15.0'
testCompile "junit:junit:4.+"
}

Expand Down Expand Up @@ -57,9 +57,9 @@ task gemspec {
Gem::Specification.new do |spec|
spec.name = "${project.name}"
spec.version = "${project.version}"
spec.authors = ["takahiro.nakayama"]
spec.summary = %[Hdfs output plugin for Embulk]
spec.description = %[Dumps records to Hdfs.]
spec.authors = ["Civitaspo"]
spec.summary = %[Hdfs file output plugin for Embulk]
spec.description = %[Stores files on Hdfs.]
spec.email = ["civitaspo@gmail.com"]
spec.licenses = ["MIT"]
spec.homepage = "https://github.com/civitaspo/embulk-output-hdfs"
Expand Down
2 changes: 1 addition & 1 deletion lib/embulk/output/hdfs.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Embulk::JavaPlugin.register_output(
"hdfs", "org.embulk.output.HdfsOutputPlugin",
"hdfs", "org.embulk.output.hdfs.HdfsFileOutputPlugin",
File.expand_path('../../../../classpath', __FILE__))
219 changes: 0 additions & 219 deletions src/main/java/org/embulk/output/HdfsOutputPlugin.java

This file was deleted.

Loading