Permalink
Browse files

Document how the Hadoop Reducer implementation impact the DoFn#proces…

…s() semantics
  • Loading branch information...
tzolov committed Jun 28, 2012
1 parent 25f3280 commit 1d020be259b6b23b1a5ebd0613637f50bf291dc2
Showing with 12 additions and 2 deletions.
  1. +12 −2 src/main/java/com/cloudera/crunch/DoFn.java
@@ -52,8 +52,18 @@ public void configure(Configuration conf) {
/**
* Processes the records from a {@link PCollection}.
*
- * @param input The input record
- * @param emitter The emitter to send the output to
+ * <br/>
+ * <br/>
+ * <b>Note:</b> Crunch can reuse a single input record object whose content
+ * changes on each {@link #process(Object, Emitter)} method call. This
+ * functionality is imposed by Hadoop's <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reducer.html">Reducer</a> implementation:
+ * <i>The framework will reuse the key and value objects that are passed into the reduce, therefore the application
+ * should clone the objects they want to keep a copy of.</i>
+ *
+ * @param input
+ * The input record.
+ * @param emitter
+ * The emitter to send the output to
*/
public abstract void process(S input, Emitter<T> emitter);

0 comments on commit 1d020be

Please sign in to comment.