Skip to content

Conversation

civitaspo
Copy link
Owner

When loading small files and using ScatterExecutor, sometimes this plugin creates lots of empty files.
This behaviour may become the cause that the namenode heap size increases.
So, change the behaviour to NOT create when no data is added.

The log is below. Because it is difficult to write the test, I added only example. (TODO. SHOULD ADD THE TEST)

[embulk-output-hdfs] embulk run example/config_avoid_create_0byte_file.yml -Ilib
2016-04-27 10:50:00.436 +0900: Embulk v0.8.8
2016-04-27 10:50:02.495 +0900 [INFO] (0001:transaction): Loaded plugin embulk/output/hdfs from a load path
2016-04-27 10:50:02.578 +0900 [INFO] (0001:transaction): Listing local files at directory 'example' filtering filename by prefix 'data'
2016-04-27 10:50:02.587 +0900 [INFO] (0001:transaction): Loading files [example/data.csv]
2016-04-27 10:50:02.695 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 10 = input tasks 1 * 10
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2016-04-27 10:50:03.421 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2016-04-27 10:50:03.631 +0900 [INFO] (0016:task-0000): Uploading '/tmp/embulk-output-hdfs_example/file_000.00.csv'
2016-04-27 10:50:03.633 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2016-04-27 10:50:03.639 +0900 [INFO] (main): Committed.
2016-04-27 10:50:03.639 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"example/data.csv"},"out":{}}

@civitaspo
Copy link
Owner Author

TODO. when users set header_line: true to the formatter, this plugin still create files having only a header line.

@coveralls
Copy link

coveralls commented Apr 27, 2016

Coverage Status

Coverage increased (+2.9%) to 80.46% when pulling 29283f9 on avoid_create_0byte_files into 6fa4524 on master.

@civitaspo civitaspo merged commit a5a0034 into master Apr 27, 2016
@civitaspo civitaspo deleted the avoid_create_0byte_files branch April 27, 2016 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants