Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Document how the Hadoop Reducer implementation impact the DoFn#proces…

…s() semantics
  • Loading branch information...
commit 1d020be259b6b23b1a5ebd0613637f50bf291dc2 1 parent 25f3280
@tzolov tzolov authored
Showing with 12 additions and 2 deletions.
  1. +12 −2 src/main/java/com/cloudera/crunch/DoFn.java
View
14 src/main/java/com/cloudera/crunch/DoFn.java
@@ -52,8 +52,18 @@ public void configure(Configuration conf) {
/**
* Processes the records from a {@link PCollection}.
*
- * @param input The input record
- * @param emitter The emitter to send the output to
+ * <br/>
+ * <br/>
+ * <b>Note:</b> Crunch can reuse a single input record object whose content
+ * changes on each {@link #process(Object, Emitter)} method call. This
+ * functionality is imposed by Hadoop's <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reducer.html">Reducer</a> implementation:
+ * <i>The framework will reuse the key and value objects that are passed into the reduce, therefore the application
+ * should clone the objects they want to keep a copy of.</i>
+ *
+ * @param input
+ * The input record.
+ * @param emitter
+ * The emitter to send the output to
*/
public abstract void process(S input, Emitter<T> emitter);
Please sign in to comment.
Something went wrong with that request. Please try again.