public
Description: brain-dead simple parallel processing for ruby
Homepage:
Clone URL: git://github.com/ahoward/forkoff.git
ahoward (author)
Mon Oct 12 09:34:40 -0700 2009
commit  7671b5abf108e8107304b16f80b8395b088ffdff
tree    959cc80c0bfdcaa6c3665cfe3d78f4bdcabb4167
parent  b63f58a16dd3c12b13177af1e1c51af05278322e
forkoff / README
100644 255 lines (171 sloc) 6.169 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
NAME
 
  forkoff
 
SYNOPSIS
 
  brain-dead simple parallel processing for ruby
 
URI
 
  http://rubyforge.org/projects/codeforpeople
  http://github.com/ahoward/forkoff
 
INSTALL
 
  gem install forkoff
 
DESCRIPTION
 
  forkoff works for any enumerable object, iterating a code block to run in a
  child process and collecting the results. forkoff can limit the number of
  child processes which is, by default, 2.
 
SAMPLES
 
  
  <========< samples/a.rb >========>
 
  ~ > cat samples/a.rb
 
    # forkoff makes it trivial to do parallel processing with ruby, the following
    # prints out each word in a separate process
    #
    
      require 'forkoff'
    
      %w( hey you ).forkoff!{|word| puts "#{ word } from #{ Process.pid }"}
 
  ~ > ruby samples/a.rb
 
    hey from 7907
    you from 7908
 
 
  <========< samples/b.rb >========>
 
  ~ > cat samples/b.rb
 
    # for example, this takes only 4 seconds or so to complete (8 iterations
    # running in two processes = twice as fast)
    #
    
      require 'forkoff'
    
      a = Time.now.to_f
    
      results =
        (0..7).forkoff do |i|
          sleep 1
          i ** 2
        end
    
      b = Time.now.to_f
    
      elapsed = b - a
    
      puts "elapsed: #{ elapsed }"
      puts "results: #{ results.inspect }"
 
  ~ > ruby samples/b.rb
 
    elapsed: 4.19184589385986
    results: [0, 1, 4, 9, 16, 25, 36, 49]
 
 
  <========< samples/c.rb >========>
 
  ~ > cat samples/c.rb
 
    # forkoff does *NOT* spawn processes in batches, waiting for each batch to
    # complete. rather, it keeps a certain number of processes busy until all
    # results have been gathered. in otherwords the following will ensure that 3
    # processes are running at all times, until the list is complete. note that
    # the following will take about 3 seconds to run (3 sets of 3 @ 1 second).
    #
    
    require 'forkoff'
    
    pid = Process.pid
    
    a = Time.now.to_f
    
    pstrees =
      %w( a b c d e f g h i ).forkoff! :processes => 3 do |letter|
        sleep 1
        { letter => ` pstree -l 2 #{ pid } ` }
      end
    
    
    b = Time.now.to_f
    
    puts
    puts "pid: #{ pid }"
    puts "elapsed: #{ b - a }"
    puts
    
    require 'yaml'
    
    pstrees.each do |pstree|
      y pstree
    end
 
  ~ > ruby samples/c.rb
 
    
    pid: 7922
    elapsed: 3.37899208068848
    
    ---
    a: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07923 ahoward ruby -Ilib samples/c.rb
       |-+- 07924 ahoward (ruby)
       \-+- 07925 ahoward ruby -Ilib samples/c.rb
    
    ---
    b: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07923 ahoward ruby -Ilib samples/c.rb
       |-+- 07924 ahoward ruby -Ilib samples/c.rb
       \-+- 07925 ahoward ruby -Ilib samples/c.rb
    
    ---
    c: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07923 ahoward ruby -Ilib samples/c.rb
       |-+- 07924 ahoward (ruby)
       \-+- 07925 ahoward ruby -Ilib samples/c.rb
    
    ---
    d: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07932 ahoward ruby -Ilib samples/c.rb
       |--- 07933 ahoward ruby -Ilib samples/c.rb
       \--- 07934 ahoward ruby -Ilib samples/c.rb
    
    ---
    e: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |--- 07932 ahoward (ruby)
       |-+- 07933 ahoward ruby -Ilib samples/c.rb
       \-+- 07934 ahoward (ruby)
    
    ---
    f: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |--- 07932 ahoward (ruby)
       |-+- 07933 ahoward ruby -Ilib samples/c.rb
       \-+- 07934 ahoward ruby -Ilib samples/c.rb
    
    ---
    g: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07941 ahoward ruby -Ilib samples/c.rb
       |--- 07942 ahoward ruby -Ilib samples/c.rb
       \--- 07943 ahoward ruby -Ilib samples/c.rb
    
    ---
    h: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |-+- 07941 ahoward (ruby)
       |-+- 07942 ahoward ruby -Ilib samples/c.rb
       \--- 07943 ahoward ruby -Ilib samples/c.rb
    
    ---
    i: |
      -+- 07922 ahoward ruby -Ilib samples/c.rb
       |--- 07942 ahoward (ruby)
       \-+- 07943 ahoward ruby -Ilib samples/c.rb
    
 
 
  <========< samples/d.rb >========>
 
  ~ > cat samples/d.rb
 
    # forkoff supports two strategies of reading the result from the child: via
    # pipe (the default) or via file. you can select which to use using the
    # :strategy option.
    #
    
      require 'forkoff'
    
      %w( hey you guys ).forkoff :strategy => :file do |word|
        puts "#{ word } from #{ Process.pid }"
      end
 
  ~ > ruby samples/d.rb
 
    hey from 7953
    you from 7954
    guys from 7955
 
 
 
HISTORY
  1.1.0
    - move to a model with one work queue and signals sent from consumers to
    producer to noitify ready state. this let's smaller jobs race through a
    single process even while a larger job may have one sub-process bound up.
    incorporates a fix from http://github.com/fredrikj/forkoff which meant
    some processes would lag behind when jobs didn't have similar execution
    times.
 
  1.0.0
    - move to github
 
  0.0.4
    - code re-org
    - add :strategy option
    - default number of processes is 2, not 8
 
  0.0.1
 
    - updated to use producer threds pushing onto a SizedQueue for each consumer
      channel. in this way the producers do not build up a massize parllel data
      structure but provide data to the consumers only as fast as they can fork
      and proccess it. basically for a 4 process run you'll end up with 4
      channels of size 1 between 4 produces and 4 consumers, each consumer is a
      thread popping of jobs, forking, and yielding results.
 
    - removed use of Queue for capturing the output. now it's simply an array
      of arrays which removed some sync overhead.
 
    - you can configure the number of processes globally with
 
        Forkoff.default['proccess'] = 4
 
    - you can now pass either an options hash
 
        forkoff( :processes => 2 ) ...
 
      or plain vanilla number
 
        forkoff( 2 ) ...
 
      to the forkoff call
 
    - default number of processes is 8, not 2
        
 
  0.0.0
 
    initial version