Skip to content
This repository has been archived by the owner on Oct 18, 2019. It is now read-only.

Clean up and simplify Go pipeline example. #31

Merged
merged 1 commit into from Dec 3, 2015

Conversation

matttproud
Copy link
Contributor

This commit cleans up a few things about the Go pipeline:

  1. idiom: use sync.WaitGroup instead of atomic counter for
    rendezvous.
  2. throughput: switch to buffered channels since message handout
    needn't be synchronous.
  3. cruft removal: removed dead/non-used functions.
  4. throughput: kick off reading from all buffers in separate Go
    routines since the scheduler will self-bound their operation
    per the maximum processors.
  5. simplify/throughput: drop pre-processing of tallies into maps
    and replace it with a channel that merely contains the
    neighborhood name. Neighborhood processing occurs in-flight
    with the mappers.
  6. throughput: the regexp/substring match can be made more
    efficient by prima facie rejecting candidates whose values are
    too short to contain the needle.
  7. readability: separate mapper and reducers into individual
    routines.
  8. tidying: remedy lint errors.

Overall this change is performance neutral to being an improvement.
In my local runs, I was able to shave off about 2.5 seconds on the
substring run with this.

This commit cleans up a few things about the Go pipeline:

  1. idiom: use `sync.WaitGroup` instead of atomic counter for
     rendezvous.

  2. throughput: switch to buffered channels since message handout
     needn't be synchronous.

  3. cruft removal: removed dead/non-used functions.

  4. throughput: kick off reading from all buffers in separate Go
     routines since the scheduler will self-bound their operation
     per the maximum processors.

  5. simplify/throughput: drop pre-processing of tallies into maps
     and replace it with a channel that merely contains the
     neighborhood name.  Neighborhood processing occurs in-flight
     with the mappers.

  6. throughput: the regexp/substring match can be made more
     efficient by prima facie rejecting candidates whose values are
     too short to contain the needle.

  7. readability: separate mapper and reducers into individual
     routines.

  8. tidying: remedy lint errors.

  9. throughput: swap `strings.Split` for `strings.SplitN` since the
     former operates `O(2n)` due to counting the separator and the
     latter `O(n)`.

Overall this change is performance neutral to being an improvement.
In my local runs, I was able to shave off about 2.5 seconds on the
substring run with this.
dimroc added a commit that referenced this pull request Dec 3, 2015
Clean up and simplify Go pipeline example.
@dimroc dimroc merged commit c12e228 into dimroc:master Dec 3, 2015
@dimroc
Copy link
Owner

dimroc commented Dec 3, 2015

Great changes. Thanks! 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants