Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 394 lines (313 sloc) 15.581 kB
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
1 <blockquote>
2 Simple things should be simple. Complex things should be possible.
3 <br/><em>Alan Kay</em>
4 </blockquote>
5
14a0189 @bjouhier fixed tutorial
bjouhier authored
6 ## Installation
7
aacd3be @bjouhier added link to ez-streams in tutorial
bjouhier authored
8 To run this tutorial you need to install streamline.js and its companion [ez-streams](https://github.com/Sage/ez-streams) library, which is the default streaming library for streamline.
14a0189 @bjouhier fixed tutorial
bjouhier authored
9
10 ``` sh
11 npm install -g streamline
12 npm install ez-streams
13 ```
14
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
15 ## [Hello world!](tuto1-hello._js)
16
17 Let us start with streamline's version of node's hello world:
18
19 ```javascript
20 "use strict";
14a0189 @bjouhier fixed tutorial
bjouhier authored
21 var ez = require('ez-streams');
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
22
14a0189 @bjouhier fixed tutorial
bjouhier authored
23 ez.devices.http.server(function(request, response, _) {
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
24 response.writeHead(200, {
25 'Content-Type': 'text/plain; charset=utf8'
26 });
27 response.end("Hello world!");
28
29 }).listen(_, 1337);
30 console.log('Server running at http://127.0.0.1:1337/');
31 ```
32
33 To run it, save this source as `tuto1._js` and start it with:
34
35 ```javascript
36 _node tuto1
37 ```
38
39 Now, point your browser to http://127.0.0.1:1337/. You should get a `"hello world"` message.
40
41 This code is very close to the original version. Just a few differences:
42
14a0189 @bjouhier fixed tutorial
bjouhier authored
43 * The server is created with streamline's `ez.devices.http.server` rather than with node's `http.createServer` call.
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
44 * The server callback takes an additional `_` parameter. This parameter is streamline's _callback stub_. This is the magic token that we will pass to all asynchronous calls that expect a node.js callback.
45 * The `request` and `response` parameters are streamline wrappers around node's request and response streams. These wrappers don't make a difference for now but they will make it easier to read and write from these streams later.
14a0189 @bjouhier fixed tutorial
bjouhier authored
46 * `listen` is called with an `_` argument. This is because `listen` is an asynchronous call. The streamline version prints the `'Server running ...'` message after receiving the `listening` event, while the original node version prints the message without waiting for the `listening` event. This is a really minor difference though, and streamline makes it easy to avoid the wait if you don't care: just call `listen` as a _future_ by passing `!_` instead of `_`. If you're discovering _streamline.js_ don't worry about all this now. I'll talk more about futures at the end of this tutorial.
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
47 * The source file extension is `._js` instead of `.js` and you run it with `_node` rather than `node`. This is because _streamline.js_ extends the JavaScript language and the code needs to be transformed before being passed the JavaScript engine (note: `_node` has a `--cache` option which speeds up load time by shortcircuiting the transformation when files don't change).
48
49 ## [Setting up a simple search form](tuto2-form._js)
50
51 Now, we are going to be a bit more ambitious and turn our page into a simple search form:
52
53 ```javascript
54 "use strict";
14a0189 @bjouhier fixed tutorial
bjouhier authored
55 var ez = require('ez-streams');
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
56 var url = require('url');
57 var qs = require('querystring');
58
59 var begPage = '<html><head><title>My Search</title></head></body>' + //
60 '<form action="/">Search: ' + //
61 '<input name="q" value="{q}"/>' + //
62 '<input type="submit"/>' + //
63 '</form><hr/>';
64 var endPage = '<hr/>generated in {ms}ms</body></html>';
65
14a0189 @bjouhier fixed tutorial
bjouhier authored
66 ez.devices.http.server(function(request, response, _) {
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
67 var query = qs.parse(url.parse(request.url).query),
68 t0 = new Date();
69 response.writeHead(200, {
70 'Content-Type': 'text/html; charset=utf8'
71 });
72 response.write(_, begPage.replace('{q}', query.q || ''));
73 response.write(_, search(_, query.q));
74 response.write(_, endPage.replace('{ms}', new Date() - t0));
75 response.end();
76 }).listen(_, 1337);
77 console.log('Server running at http://127.0.0.1:1337/');
78
79 function search(_, q) {
80 return "NIY: " + q;
81 }
82 ```
83
84 Nothing difficult here. We are using node's `url` and `querystring` helper modules to parse the URL and its query string component. We are now writing the response in 3 chunks with the asynchronous `write` method of the wrapped response stream.
85
86 We are going to implement the `search` function next. For now we are just returning a `NIY` message. Note that we pass `_` as first parameter to our `search` function. We need this parameter because `search` will be an asynchronous function.
87
88 ## [Calling Google](tuto3-google._js)
89
90 Now we are going to implement the `search` function by passing our search string to Google. Here is the code:
91
92 ```javascript
93 function search(_, q) {
94 if (!q || /^\s*$/.test(q)) return "Please enter a text to search";
95 // pass it to Google
14a0189 @bjouhier fixed tutorial
bjouhier authored
96 var json = ez.devices.http.client({
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
97 url: 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=' + q,
98 proxy: process.env.http_proxy
99 }).end().response(_).checkStatus(200).readAll(_);
100 // parse JSON response
101 var parsed = JSON.parse(json);
102 // Google may refuse our request. Return the message then.
103 if (!parsed.responseData) return "GOOGLE ERROR: " + parsed.responseDetails;
104 // format result in HTML
105 return '<ul>' + parsed.responseData.results.map(function(entry) {
106 return '<li><a href="' + entry.url + '">' + entry.titleNoFormatting + '</a></li>';
107 }).join('') + '</ul>';
108 }
109 ```
110
14a0189 @bjouhier fixed tutorial
bjouhier authored
111 `ez.devices.http.client` is a small wrapper around node's `http.request` call. It allows us to obtain the response with a simple `response(_)` asynchronous call, and to read from this response with a simple asynchronous `readAll(_)` call (there is also an asynchronous `read` call which would let us read one chunk at a time, or read up to a given length). Notice how the calls can be naturally chained to obtain the response data.
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
112
113 In this example we do not need to post any data to the remote URL. But this would not be difficult either. It would just be a matter of calling asynchronous `write(_, data)` methods before calling the `end()` method.
114
115 ## [Dealing with errors](tuto4-catch._js)
116
14a0189 @bjouhier fixed tutorial
bjouhier authored
117 If our `search` function fails, an exception will be propagated. If we don't do anything special, the exception will bubble up to the request dispatcher created by `ez.devices.http.server(...)`. This dispatcher will catch it and generate a 500 response with the error message.
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
118
119 This is probably a bit rude to our users. But we can do a better job by trapping the error and injecting the error message into our HTML page. All we need is a `try/catch` inside our `search` function:
120
121 ```javascript
122 function search(_, q) {
123 if (!q || /^\s*$/.test(q)) return "Please enter a text to search";
124 // pass it to Google
125 try {
14a0189 @bjouhier fixed tutorial
bjouhier authored
126 var json = ez.devices.http.client({
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
127 url: 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=' + q,
128 proxy: process.env.http_proxy
129 }).end().response(_).checkStatus(200).readAll(_);
130 // parse JSON response
131 var parsed = JSON.parse(json);
132 // Google may refuse our request. Return the message then.
133 if (!parsed.responseData) return "GOOGLE ERROR: " + parsed.responseDetails;
134 // format result in HTML
135 return '<ul>' + parsed.responseData.results.map(function(entry) {
136 return '<li><a href="' + entry.url + '">' + entry.titleNoFormatting + '</a></li>';
137 }).join('') + '</ul>';
138 } catch (ex) {
139 return 'an error occured. Retry or contact the site admin: ' + ex.message;
140 }
141 }
142 ```
143
144 ## [Searching through files](tuto5-files._js)
145
146 Now, we are going to extend our search to also search the text in local files. Our `search` function becomes:
147
148 ```javascript
149 function search(_, q) {
150 if (!q || /^\s*$/.test(q)) return "Please enter a text to search";
151 try {
152 return '<h2>Web</h2>' + googleSearch(_, q) + '<hr/><h2>Files</h2>' + fileSearch(_, q);
153 } catch (ex) {
154 return 'an error occured. Retry or contact the site admin: ' + ex.stack;
155 }
156 }
157
158 function googleSearch(_, q) {
14a0189 @bjouhier fixed tutorial
bjouhier authored
159 var json = ez.devices.http.client(...
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
160 ...
161 return '<ul>' + ...
162 }
163
164 function fileSearch(_, q) {
165 var t0 = new Date();
166 var results = '';
167
168 function doDir(_, dir) {
169 fs.readdir(dir, _).forEach_(_, function(_, file) {
170 var f = dir + '/' + file;
171 var stat = fs.stat(f, _);
172 if (stat.isFile()) {
173 fs.readFile(f, 'utf8', _).split('\n').forEach(function(line, i) {
174 if (line.indexOf(q) >= 0) results += '<br/>' + f + ':' + i + ':' + line;
175 });
176 } else if (stat.isDirectory()) {
177 doDir(_, f);
178 }
179 });
180 }
181 doDir(_, __dirname);
182 return results + '<br/>completed in ' + (new Date() - t0) + ' ms';;
183 }
184 ```
185
186 The `forEach_` function is streamline's asynchronous variant of the standard EcmaScript 5 `forEach` array function. It is needed here because the body of the loop contains asynchronous calls. And steamline would give us an error if we were to use the synchronous `forEach` with an asynchronous loop body. Note that streamline also provides asynchronous variants for the other ES5 array functions: `map`, `some`, `every`, `filter`, `reduce` and `reduceRight`.
187
188 Otherwise, there is not much to say about `fileSearch`. It uses a simple recursive directory traversal logic.
189
190 ## [Searching in MongoDB](tuto6-mongo._js)
191
192 Now, we are going to extend our search to a MongoDB database.
193
194 To run this you need to install MongoDB and start the mongod daemon. You also have to install the node MongoDB driver:
195
196 ```sh
197 npm install mongodb
198 ```
199
200 We have to modify our `search` function again:
201
202 ```javascript
203 function search(_, q) {
204 if (!q || /^\s*$/.test(q)) return "Please enter a text to search";
205 // pass it to Google
206 try {
207 return '<h2>Web</h2>' + googleSearch(_, q) //
208 + '<hr/><h2>Files</h2>' + fileSearch(_, q) //
209 + '<hr/><h2>Mongo</h2>' + mongoSearch(_, q);
210 } catch (ex) {
211 return 'an error occured. Retry or contact the site admin: ' + ex.stack;
212 }
213 }
214 ```
215
216 Here comes `mongoSearch`:
217
218 ``` javascript
219 var mongodb = require('mongodb');
220
221 function mongoSearch(_, q) {
222 var t0 = new Date();
223 var db = new mongodb.Db('tutorial', new mongodb.Server("127.0.0.1", 27017, {}));
224 db.open(_);
225 try {
226 var coln = db.collection('movies', _);
227 if (coln.count(_) === 0) coln.insert(MOVIES, _);
228 var re = new RegExp(".*" + q + ".*");
229 return coln.find({
230 $or: [{
231 title: re
232 }, {
233 director: re
234 }]
235 }, _).toArray(_).map(function(movie) {
236 return movie.title + ': ' + movie.director;
237 }).join('<br/>') + '<br/>completed in ' + (new Date() - t0) + ' ms';;
238 } finally {
239 db.close();
240 }
241 }
242 ```
243
244 where `MOVIES` is used to initialize our little movies database:
245
246 ```javascript
247 var MOVIES = [{
248 title: 'To be or not to be',
249 director: 'Ernst Lubitsch'
250 }, {
251 title: 'La Strada',
252 director: 'Federico Fellini'
253 }, {
254 ...
255 }];
256 ```
257
258 The `mongoSearch` function is rather straightforwards once you know the mongodb API. The `try/finally` is rather interesting: it guarantees that the database will be closed regardless of whether the `try` block completes successfully or throws an exception.
259
260 ## [Parallelizing](tuto7-parallel._js)
261
262 So far so good. But the code that we have written executes completely sequentially. So we only start the directory search after having obtained the response from Google and we only start the Mongo search after having completed the directory search. This is very inefficient. We should run these 3 independent search operations in parallel.
263
14a0189 @bjouhier fixed tutorial
bjouhier authored
264 This is where _futures_ come into play. The principle is simple: if you call an asynchronous function with `!_` instead of `_`, the function returns a _future_ `f` that you can call later as `f(_)` to obtain the result.
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
265
266 So, to parallelize, we just need a small change to our `search` function:
267
268 ```javascript
269 function search(_, q) {
270 if (!q || /^\s*$/.test(q)) return "Please enter a text to search";
271 try {
272 // start the 3 futures
14a0189 @bjouhier fixed tutorial
bjouhier authored
273 var googleFuture = googleSearch(!_, q);
274 var fileFuture = fileSearch(!_, q);
275 var mongoFuture = mongoSearch(!_, q);
1696cc7 @bjouhier added tutorial to v0.4 branch
bjouhier authored
276 // join the results
277 return '<h2>Web</h2>' + googleFuture(_) //
278 + '<hr/><h2>Files</h2>' + fileFuture(_) //
279 + '<hr/><h2>Mongo</h2>' + mongoFuture(_);
280 } catch (ex) {
281 return 'an error occured. Retry or contact the site admin: ' + ex.stack;
282 }
283 }
284 ```
285
286 We can also go further and parallelize the directory traversal. This could be done with futures but there is a simpler way to do it: passing the number of parallel operations as second argument to the `forEach_` call:
287
288 ```javascript
289 function doDir(_, dir) {
290 fs.readdir(dir, _).forEach_(_, 4, function(_, file) {
291 var stat = fs.stat(dir + '/' + file, _);
292 ...
293 });
294 }
295 ```
296
297 We could pass -1 instead of 4 to execute all iterations in parallel. But then we would have a risk of running out of file descriptors when traversing large trees. The best way to do it then is to pass -1 and use the `flows.funnel` function to limit concurrency in the low level function. Here is the modified function:
298
299 ```javascript
300 var fs = require('fs'),
301 flows = require('streamline/lib/util/flows');
302
303 function fileSearch(_, q) {
304 var t0 = new Date();
305 var results = '';
306 // allocate a funnel for 20 concurrent executions
307 var filesFunnel = flows.funnel(20);
308
309 function doDir(_, dir) {
310 fs.readdir(dir, _).forEach_(_, -1, function(_, file) {
311 var stat = fs.stat(dir + '/' + file, _);
312 if (stat.isFile()) {
313 // use the funnel to limit the number of open files
314 filesFunnel(_, function(_) {
315 fs.readFile(dir + '/' + file, 'utf8', _).split('\n').forEach(function(line, i) {
316 if (line.indexOf(q) >= 0) results += '<br/>' + dir + '/' + file + ':' + i + ':' + line;
317 });
318 });
319 } else if (stat.isDirectory()) {
320 doDir(_, dir + '/' + file);
321 }
322 });
323 }
324 doDir(_, __dirname);
325 return results + '<br/>completed in ' + (new Date() - t0) + ' ms';;
326 }
327 ```
328
329 The `filesFunnel` function acts like a semaphore. It limits the number of concurrent entries in its inner function to 20.
330
331 With this implementation, each call to `fileSearch` opens 20 files at most but we could still run out of file descriptors when lots of requests are handled concurrently. The fix is simple though: move the `filesFunnel` declaration one level up, just after the declaration of `flows`. And also bump the limit to 100 because this is now a global funnel:
332
333 ```javascript
334 var fs = require('fs'),
335 flows = require('streamline/lib/util/flows');
336 // allocate a funnel for 100 concurrent open files
337 var filesFunnel = flows.funnel(100);
338
339 function fileSearch(_, q) {
340 // same as above, without the filesFunnel var declaration
341 }
342 ```
343
344 ## Fixing race conditions
345
346 And, last but not least, there is a concurrency bug in this code! Let's fix it.
347
348 The problem is in the code that initializes the movies collection in MongoDB:
349
350 ```javascript
351 if (coln.count(_) === 0) coln.insert(MOVIES, _);
352 ```
353
354 The problem is that the code yields everywhere we have an `_` in the code. So this code can get interrupted between the `coln.count(_)` call and the `coln.insert(MOVIES, _)` call. And we can get into the unfortunate situation where two requests or more will get a count of 0, which would lead to multiple insertions of the `MOVIES` list.
355
356 This is easy to fix, though. All we need is a little funnel to restrict access to this critical section:
357
358 ```javascript
359 var mongodb = require('mongodb'),
360 mongoFunnel = flows.funnel(1);
361
362 function mongoSearch(_, q) {
363 ...
364 db.open(_);
365 try {
366 var coln = db.collection('movies', _);
367 mongoFunnel(_, function(_) {
368 if (coln.count(_) === 0) coln.insert(MOVIES, _);
369 });
370 var re = new RegExp(".*" + q + ".*");
371 return ...
372 } finally {
373 db.close();
374 }
375 }
376 ```
377
378 ## Wrapping up
379
380 In this tutorial we have done the following:
381
382 * [Create a simple web server](tuto1-hello._js)
383 * [Set up a little search form](tuto2-form._js)
384 * [Call a Google API to handle the search](tuto3-google._js)
385 * [Handle errors](tuto4-catch._js)
386 * [Search a tree of files](tuto5-files._js)
387 * [Search inside MongoDB](tuto6-mongo._js)
388 * [Parallelize and fix race conditions](tuto7-parallel._js)
389
390 This should give you a flavor of what _streamline.js_ programming looks like. Don't forget to read the [README](../README.md) and the [FAQ](../FAQ.md).
391
392
393
Something went wrong with that request. Please try again.