Skip to content
Chris Lu edited this page Nov 11, 2015 · 3 revisions

Reduce() is another important function for MapReduce.

Reduce(), LocalReduce(), MergeReduce()

A Reduce() step is divided into 2 steps:

  1. Reduce locally for each dataset shard.
  2. Merge results from step 1, and reduced to a single value.

Here are the actual Reduce() source code:

func (d *Dataset) Reduce(f interface{}) (ret *Dataset) {
	return d.LocalReduce(f).MergeReduce(f)
}

Reduce(), LocalReduce(), MergeReduce() accept the same kind of function as parameter:

  func(Value, Value)Value  // Value is any user defined type

When this reducer function runs, it takes 2 values of "Value" type, and returns 1 value of the same type.

The initial value would be the zero value for type "Value".

ReduceByKey(), LocalReduceByKey()

Here are the actual ReduceByKey() source code:

func (d *Dataset) ReduceByKey(f interface{}) *Dataset {
	return d.LocalSort(nil).LocalReduceByKey(f).MergeSorted(nil).LocalReduceByKey(f)
}

It will perform these steps:

  1. LocalSort() sorts the (Key,Value) pairs by Key on each local dataset shard
  2. LocalReduceByKey() locally reduce Values from the same Key, into (Key, Value) pairs. Now the Key within each local dataset shard will be unique.
  3. MergeSorted() merges each locally sorted reduced results
  4. LocalReduceByKey() runs local reduce again, but this time there is only one dataset shard after the merge.

ReduceByKey(), LocalReduceByKey() accept still the same kind of function as parameter:

  func(Value, Value)Value  // Value is any user defined type

ReduceByUserDefinedKey()

You may notice ReduceByKey() is using nil, which is a default lessThanFunc to compare keys. It will work when keys are integer, string, or float, but not work for user defined key type. ReduceByUserDefinedKey() is to customize lessThanFunc for user defined keys

func (d *Dataset) ReduceByUserDefinedKey(lessThanFunc interface{}, reducer interface{}) *Dataset {
	return d.LocalSort(lessThanFunc).LocalReduceByKey(reducer).MergeSorted(lessThanFunc).LocalReduceByKey(reducer)
}

Here are the function types:

  lessThanFunc: func(Key, Key) bool        // Key is any user defined type for key
  reducer:      func(Value, Value) Value   // Value is any user defined type for value
Clone this wiki locally