Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
build/
.idea
*.iml
.ruby-version

13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Read files on Hdfs.
- **config** overwrites configuration parameters (hash, default: `{}`)
- **input_path** file path on Hdfs. you can use glob and Date format like `%Y%m%d/%s`.
- **rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.
- **partition** when this is true, partition input files and increase task count. (default: `true`)

## Example

Expand All @@ -24,12 +25,13 @@ in:
- /opt/analytics/etc/hadoop/conf/core-site.xml
- /opt/analytics/etc/hadoop/conf/hdfs-site.xml
config:
fs.defaultFS: 'hdfs://hdp-nn1:8020'
fs.defaultFS: 'hdfs://hadoop-nn1:8020'
dfs.replication: 1
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
input_path: /user/embulk/test/%Y-%m-%d/*
rewind_seconds: 86400
partition: true
decoders:
- {type: gzip}
parser:
Expand All @@ -50,6 +52,15 @@ in:
- {name: c3, type: long}
```

## Note
- the feature of the partition supports only 3 line terminators.
- `\n`
- `\r`
- `\r\n`

## The Reference Implementation
- [hito4t/embulk-input-filesplit](https://github.com/hito4t/embulk-input-filesplit)

## Build

```
Expand Down
4 changes: 2 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ configurations {
provided
}

version = "0.0.3"
version = "0.1.0"

sourceCompatibility = 1.7
targetCompatibility = 1.7
Expand All @@ -22,7 +22,7 @@ dependencies {
provided "org.embulk:embulk-core:0.7.0"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
compile 'org.apache.hadoop:hadoop-client:2.6.0'
compile 'com.google.guava:guava:14.0'
compile 'com.google.guava:guava:15.0'
testCompile "junit:junit:4.+"
}

Expand Down
2 changes: 1 addition & 1 deletion lib/embulk/input/hdfs.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Embulk::JavaPlugin.register_input(
"hdfs", "org.embulk.input.HdfsFileInputPlugin",
"hdfs", "org.embulk.input.hdfs.HdfsFileInputPlugin",
File.expand_path('../../../../classpath', __FILE__))
231 changes: 0 additions & 231 deletions src/main/java/org/embulk/input/HdfsFileInputPlugin.java

This file was deleted.

Loading