Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

458 lines (268 sloc) 18.1 KB

Formaline for NodeJS

formaline is a low-level (nodeJS) module for handling form requests ( HTTP POST ) and for fast parsing of file uploads, it is also ready to use with connect middleware.


with npm:

 $ npm install formaline

with git:

 $ git clone git://

if you want to use nodeJS, only for testing purpose, together with Apache, a simple way to do this is to enable apache mod-proxy and add this lines to your apache virtualhost:

ProxyPass /test/ http://localhost:3000/test/
ProxyPassReverse /test/ http://localhost:3000/test/

change the path and the port with yours.

Please always update to latest release . This library is in active development .


  • Real-time parsing of file uploads, also supports the "multiple" attribute, for HTML5 capable browsers .
  • It works with HTML5 AJAX-powered multiple file uploads .
  • It is Possible to create module instances with a configuration object .
  • Some Useful configuration parameters ( listeners, uploadThreshold, logging .. ) .
  • Many events for control of the module execution .
  • Very Fast and Simple Parser (see parser-benchmarks directory) .
  • It Handles filename collisions ( the filenames are translated to a 40 hex string builded with SHA1 )
  • It is also possible to return the SHA1 data checksum of received files .
  • Exceptions handling is Fluid .
  • It supports duplicate names for fields .
  • It is possible to preserve or auto-remove uploaded files if they are not completed, due to exceeding of the upload total threshold .
  • It easily integrates with connect middleware.
  • It is possible to track the progress ratio ( also chunks and bytes ) of data received .
  • It Works !
  • etc..


formaline is succesfully tested ( for now ) against :

browsers like:

  • Firefox 3+, Chrome 9+, Safari 4+, Opera 10+ and IE5.5+ ..

  • Links2, Lynx

    some different kinds of client-side POSTs:

( multiple files selection is available only for HTML5 capable browsers )

the library is capable of handling the receiving of multiple files that were uploaded with a single or multiple POSTs, indipendently of what kind of client code was used .

Simple Usage

 var formaline = require('formaline'),
     form = new formaline( { } );           // <-- empty config object

add events listener:

 form.on( 'filereceived', function( sha1filename, origfilename, filedir, filetype, filesize, filefield, filesha1sum ){ .. }  )  

the listed params ( sha1filename, origfilename, filedir, .. ) are already attached to the function callback!

for example, if I write an anonymous function myListener:

 var myListener = function( ){ console.log( arguments ); }
 form.on( 'filereceived', myListener ); <---- myListener function gets sha1filename, origfilename, etc.. as arguments

see Event & Listeners section for a complete list of callbacks signatures!

then, parse request:

 form.parse( req, res, next ); // next is a callback  function( .. ){ .. }

Configuration Options

You could create a formaline instance with some configuration options :

  • 'uploadRootDir' : ( string ) the default root directory for files uploads is '/tmp/' .

    • it is the root directory for file uploads, must already exist! ( formaline will try to use '/tmp/', otherwise it throws an exception )
    • a new sub-directory with a random name is created for every upload request .
  • 'uploadThreshold' : ( integer ) default value is 1024 * 1024 * 1024 bytes (1GB).

    • it indicates the upload threshold in bytes for file uploads (multipart/form-data) before of stopping writing to disk .
    • it also limits data received with serialized fields (x-www-urlencoded) .
  • 'holdFilesExtensions' : ( boolean ) default value is true .

    • it indicates to maintain or not, the extensions of uploaded files ( like .jpg, .txt, etc.. ) .
  • 'checkContentLength' : ( boolean ) default value is false .

    • formaline, for default, doesn't stop if it finds that the header Content-Length > uploadThreshold, it will try to receive all data for request, and will write to disk the bytes received, until they reaches the upload threshold .
    • if value is set to true, if the header Content-Length exceeds uploadThreshold, It stops before receiving data payload .
  • 'removeIncompleteFiles' : ( boolean ) the default value is true.

    • if true, formaline auto-removes files not completed because of exceeded upload threshold limit, then it emits a 'fileremoved' event,
    • if false, no event is emitted, but the incomplete files list is passed to the 'end' listener in the form of an array of paths.
  • 'sha1sum' : ( boolean ) default value is false.

    • it is possible to check the file data integrity calculating the sha1 checksum ( 40 hex string )
    • it is calculated iteratively when file data is received
  • 'logging' : ( string ) the default value is 'debug:off,1:on,2:on,3:on' (debug is off).

    • it enables various logging levels, it is possible to switch on or off one or more level at the same time.
    • debug: 'off' turns off logging, to see parser stats you have to enable the 2nd level.
  • 'emitDataProgress' : ( boolean or integer > 1 ) the default value is false.

    • when it is true, it emits a 'dataprogress' event on every chunk. If you need to change the emitting factor ,( you could specify an integer > 1 ).
    • If you set it for example to an integer k, 'dataprogress' is emitted every k data chunks received, starting from the first. ( it emits events on indexes: 1 + ( 0 * k ), 1 + ( 1 * k ), 1 + ( 2 * k ), 1 + ( 3 * k ), etc..
  • 'listeners' : ( config object ) It is possible to specify here a configuration object for listeners or adding them in normal way, with 'addListener' / 'on' .

    • See below

Events & Listeners

Exception Types:

  • fatal : headersexception, pathexception, bufferexception, streamexception ( the data transmission is interrupted, and the 'end' event is thrown ).

  • informational : filereceived, field, dataprogress, end

  • need attention : fileremoved, warning

Listeners are called with following listed arguments, they are already attached to the callbacks :

  • 'exception': function ( etype, isupload, errmsg, isfatal ) { .. },

  • 'field': function ( fname, fvalue ) { .. },

  • 'filereceived': function ( sha1filename, origfilename, filedir, filetype, filesize, filefield, filesha1sum ) { .. },

  • 'fileremoved': function ( sha1filename, origfilename, filedir, filetype, filesize, filefield ) { .. },

  • 'dataprogress': function ( bytesReceived, chunksReceived, ratio ) { .. },

  • 'end': function ( incompleteFiles, stats, res, next ) { .. }

Advanced Usage

require the module:

 var formaline = require('formaline');

build a config object:

 var config = { 

     logging: 'debug:on,1:on,2:on,3:off'

     uploadRootDir: '/var/www/upload/',

     checkContentLength: false,

     uploadThreshold: 3949000,  

     removeIncompleteFiles: true,

     emitDataProgress: false, 

     sha1sum: true,

     listeners: {

         'exception': function ( etype, isupload, errmsg, isfatal ) {
         'field': function ( fname, fvalue ) { 
         'filereceived': function ( sha1filename, origfilename, filedir, filetype, filesize, filefield, filesha1sum ) { 
         'fileremoved': function ( sha1filename, origfilename, filedir, filetype, filesize, filefield ) { 
         'dataprogress': function ( bytesReceived, chunksReceived, ratio ) {
         'end': function ( incompleteFiles, stats, res, next ) {
            res.writeHead(200, {'content-type': 'text/plain'});
     }//end listener config

create an instance with config, then parse the request:

 new formaline( config ).parse( req, res, next );


 var form = new formaline(config); 
 form.parse( req, res, next);

See Also :

File Uploads

When a file is found in the data stream:

  • this is directly written to disk, chunk to chunk, until the end of file is reached.

  • a directory with a random integer name is created in the path of upload directory (default is /tmp/), for example: /tmp/123456789098/, it assures no collisions on file names, for every different POST .

  • the file name is cleaned of weird chars, then converted to an hash string with SHA1.
  • when two files with the same name are uploaded through the same post action, the resulting string (calculated with SHA1) is the same, for not causing a collision, the SHA1 string is regenerated with adding a seed in the file name (current time in millis);

    In this way, It assures us that the first file will not overwritten.

    • when a file reaches the upload threshold allowed:

      • if removeIncompleteFiles === true, the file is auto-removed and a 'fileremoved' event is emitted;
    • if removeIncompleteFiles === false, the file is kept in the filesystem, 'end' event is emitted, an array with paths ( which lists incomplete files ) is passed to 'end' callback.
    • when a file is totally received, a 'filereceived' event is emitted.

    • the filereceived and fileremoved events are emitted with these parameters attached: sha1filename, origfilename, filedir, filetype, filesize, filefield, and sha1sum ( only when the file was received ) .

    • When the mime type is not recognized by the file extension, the default value for filetype will be 'application/octet-stream' .

    Parser Implementation & Performance

A Note about Parsing Data Rate vs Network Throughput

Overall parsing data-rate depends on many factors, it is generally possible to reach a ( parsing ) data rate of 700 MB/s and more with a real Buffer totally loaded in RAM ( searching a basic ~60 bytes string, like Firefox uses; see parser-benchmarks ), but in my opinion, this parsing test only emulates an high Throughput network with only one chunk for all data, therefore not a real case.

Unfortunately, sending data over the cloud is sometime a long-time task, the data is chopped in many chunks, and the chunk size may change because of (underneath) TCP flow control ( typically the chunk size is between ~ 4K and ~ 64K ). Now, the point is that the parser is called for every chunk of data received, the total delay of calling it becomes more perceptible with a lot of chunks.

I try to explain me:

( using a super-fast Boyer-Moore parser )

In the world of Fairies :

  • the data received is not chopped,
  • there is a low repetition of pattern strings in the received data, ( this gets the result of n/m comparisons )
    • network throughput == network bandwidth (wow),

reaches a time complexity (in the best case) of :

 O( ( data chunk length ) / ( pattern length ) ) * O( time to do a single comparison ) 
  or  for simplicity  
 O( n / m ) * O(t) = O( n / m )

t is considered to be a constant value. It doesn't add anything in terms of complexity, but it still is a non zero value.

(for the purists, O stands for Theta, Complexity).

Anyway, I set T = (average time to execute the parser on a single chunk ) then :

T = ( average number of comparisons ) * ( average time to do a single comparison ) ~= ( n / m ) * ( t )

In real world, Murphy Laws assures that the best case doesn't exists: :O

  • data is chopped,
  • in some cases (a very large CSV file) there is a big number of comparisons between chars ( it decreases the data rate ), however for optimism and for simplicity, I'll take the previous calculated time complexity O(n/m) for good, and then also the time T, altough it's not totally correct .
  • network throughput < network bandwidth,
  • the time 't' to do a single comparison, depends on how the comparison is implemented

the average time will becomes something like:

( average time to execute the parser on a single chunk ) * ( average number of data chunks ) * ( average number of parser calls per data chunk * average delay time of a single call )

or for simplify it, a number like:

( T ) * ( k ) * ( c * d ) ~= ( n / m ) * ( t ) * ( k ) * ( c * d )

When k, the number of data chunks, increases, the value ( k ) * ( c * d ) becomes a considerable weigth in terms of time consumption; I think it's obvious that, for the system, call 10^4 times a function , is an heavy job compared to call it only 1 time.

A single GB of data transferred, with a data chunk size of 40K, is typically splitted (on average) in ~ 26000 chunks!

However, in a general case:

  • we can do very little about reducing the time delay (d) of parser calls, and for reducing the number (k) of chunks ( or manually increasing their size ), these thinks don't totally depend on us.
  • we could minimize the number 'c' of parser calls to a single call for every chunk, or c = 1.
  • we could still minimize the time 't' to do a single char comparison , it obviously reduces the overall execution time.

For these reasons:

  • instead of building a complex state-machine, I have written a simple implementation of the QuickSearch algorithm, using only high performance for-cycles.

  • I have tried to don't use long switch( .. ){ .. } statements or a long chain of if(..){..} else {..},

  • for minimizing the time 't' to do a single comparison, I have used two simple char lookup tables, 255 bytes long, implemented with nodeJS Buffers. (one for boundary pattern string to match, one for CRLFCRLF sequence).

The only limit in this implementation is that it doesn't support a boundary length more than 254 bytes, it doesn't seem to be a real problem with all major browsers I have tested, they are all using a boundary totally made of ASCII chars, typically ~60bytes in length. -->

RFC2046 (page19) excerpt:

Boundary delimiters must not appear within the encapsulated material, and must be no longer than 70 characters, not counting the two leading hyphens.


  • HTTP 1.1: RFC2616

    • 19.4 Differences Between HTTP Entities and RFC 2045 Entities.
    • 19.4.1 MIME-Version ( .. HTTP is not a MIME-compliant protocol .. )
    • 19.4.5 No Content-Transfer-Encoding ( .. HTTP does not use the Content-Transfer-Encoding (CTE) field of RFC 2045.. )
  • (MIME) Part Two: Media Types: RFC2046

    • 5.1.1 Common Syntax


for Italian GitHubbers -> LinkedIn Group

Special thanks to dvv and mrxot87 for supporting me.

Future Releases

  • more performance modifications in quickSearch.js .
  • add others examples with AJAX, writing about tested client-side uploader .
  • add choice to build JSON response .
  • add a readable stream from files while they are uploaded .
  • add some other server-side security checks, and write about it .
  • give choice to changing the parser with a custom one .
  • find and test some weird boundary string types .
  • add some unit tests .
  • Restify ?


(The MIT License)

Copyright (c) 2011 Guglielmo Ferri <>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


Jump to Line
Something went wrong with that request. Please try again.