Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
formaline is a module for handling form requests ( HTTP POSTs / PUTs ) and for fast parsing of file uploads.
JavaScript Shell

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
examples
lib
parser-benchmarks
.npmignore
History.md
Readme.md
index.js
package.json

Readme.md

Formaline for NodeJS

formaline is a new (nodeJS) module for handling simple form posts and for fast parsing of file uploads (multipart/form-data* and x-www-urlencoded)*, it is also ready to use with connect middleware.

Installation

with npm: $ npm install formaline

with git: $ git clone git://github.com/rootslab/formaline.git

if you want to use nodeJS, only for testing purpose, together with Apache, a simple way to do this is to enable apache mod-proxy and add this lines to your apache virtualhost:

ProxyPass /test/ http://localhost:3000/test/
ProxyPassReverse /test/ http://localhost:3000/test/

change the path and the port with yours.

Features

  • Real-time parsing of file uploads, also supports HTML5 multiple files upload.
  • It is possible to create instances via configuration object.
  • Useful configuration parameters ( like listeners, maxBytes, auto remove of incomplete files.. ).
  • Fluid exceptions handling.
  • Many events for total control of parsing flow.
  • Very Fast and Simple Parser (see parser-benchmarks).
  • It is possible to preserve or auto-remove incomplete files upload, due to exceeding of a max bytes limit.
  • It easily integrates with connect middleware.
  • Works!
  • etc..

Simple Usage

var formaline = require('formaline'),
    form = new formaline( { } );           // <-- empty config object

add events listener:

...
form.on( 'filereceived', function( filename, filedir, filetype, filesize, filefield ){ .. }  ) 
...

parse request:

form.parse( req, res, next ); // next is a callback  function( .. ){ .. }

Events

Type of events:

  • 'fatal' exceptions : headersexception, filepathexception, exception (the data transmission is interrupted).
  • informational : filereceived, field, dataprogress, end
  • warning : fileremoved, warning

Listeners callbacks with params:

  • 'warning': function( msg ){ ... },

  • 'headersexception': function ( isUpload, errmsg, res, next ) { .. },

  • 'exception': function ( isUpload, errmsg, res, next ) { .. },

  • 'filepathexception': function ( path, errmsg, res, next ) { .. },

  • 'field': function ( fname, fvalue ) { .. },

  • 'filereceived': function ( filename, filedir, filetype, filesize, filefield ) { .. },

  • 'fileremoved': function ( filename, filedir, filetype, filesize, filefield ) { .. },

  • 'dataprogress': function ( bytesReceived, chunksReceived ) { .. },

  • 'end': function ( incompleteFiles, response, next ) { .. }

Advanced Usage

require the module:

var formaline = require('formaline');

build a config object:

var config = { 

        /*
         temporary upload directory for file uploads
         for every upload request a subdirectory is created that contains receiving files 
         default is /tmp/ -->
        */

    tmpUploadDir: '/var/www/upload/',

        /*
         boolean: default is false; when true, it emits 'dataprogress' every chunk 
         integer chunk factor: emits 'dataprogress' event every k chunks starting from first chunk ( 1+(0*k) 1+(1*k),1+(2*k),1+(3*k) ),
         minimum chunk factor value is 2 -->
        */

    emitDataProgress: !true, //true, false, 3, 10, 100.. (every k chunks)

        /*
         max bytes allowed, this is the max bytes writed to disk before stopping 
         this is also true for serialzed fields not only for files upload  -->
        */

    maxBytes: 3949000, //bytes ex.: 1024*1024*1024 , 512

        /*
         default false, bypass headers value, continue to write to disk  
         until maxBytes bytes are writed. 
         if true -> stop receiving data, when headers Content-Length exceeds maxBytes
        */

    blockOnReqHeaderContentLength: !true,

        /*
         remove file not completed due to maxBytes, 
         if true, formaline emit 'fileremoved' event, 
         otherwise return a path array of incomplete files when 'end' event is emitted 
        */

    removeIncompleteFiles: true,

        /*
         enable various logging levels
         it is possible to switch on/off one or more levels at the same time
         debug: 'off' turn off logging,to see parser stats enable 2nd level
        */

    logging: 'debug:on,1:on,2:on,3:off' //string ex.: 'debug:on,1:off,2:on,3:off'

        /*
         it is possible to specify here a configuration object for listeners
         or adding them in normal way, with 'addListener' or 'on' functions

         events:
            - headersexception, filepathexception, exception: indicates a closed request caused by a 'fatal' exception
            - warning: indicates a value/operation not ammitted (it doesn't block data receiving)  
            - fileremoved: indicates that a file was removed because it exceeded max allowed bytes 
            - dataprogress: emitted on every (k) chunk(s) received   
        */

    listeners: {
        'warning': function(msg){
            ...
        },
        'headersexception': function ( isUpload, errmsg, res, next ) {
            ...
            next();               
        },
        'exception': function ( isUpload, errmsg, res, next ) {
            ...
            next();
        },
        'filepathexception': function ( path, errmsg, res, next ) {
            ...
            next();
        },
        'field': function ( fname, fvalue ) {
            ...
        },
        'filereceived': function ( filename, filedir, filetype, filesize, filefield ) {
            ...
        },
        'fileremoved': function ( filename, filedir, filetype, filesize, filefield ) {
            ...
        },
        'dataprogress': function ( bytesReceived, chunksReceived ) {
            ...
        },
        'end': function ( incompleteFiles, response, next ) {
            response.writeHead(200, {'content-type': 'text/plain'});
            response.end();
            //next();
        }
    }
};

create an instance with config, then parse request:

new formaline( config ).parse( req, res, next );

or

var form = new formaline(config); 
form.parse( req, res, next);

See Also :

File Creation

When a file is founded in the data stream:

  • this is directly writed to disk chunk per chunk, until end of file is reached.

  • a directory with a random integer name is created in the upload path directory (default is /tmp/), for example: /tmp/123456789098/, it assures no file name collisions for every different post.

  • when two files with the same name are uploaded through the same form post action, the file that causes the collision is renamed with a prefix equal to current time in millis;

    for example: we are uploading two files with same name, like hello.jpg, the first one is received and writed to disk with its original name, the second one is received but its name causes a collision, then it is also writed to disk, but with a name something like 1300465416185_hello.jpg. It assures that the first file is not overwritten.

  • when a file reaches the max bytes allowed:

    • if removeIncompleteFiles === true, the file is auto-removed and a 'fileremoved' event is emitted;
    • if removeIncompleteFiles === false, the file is kept in the filesystem, 'end' event is emitted and an array of paths ( that lists incomplete files ), is passed to callback.
  • when a file is totally received, a 'filereceived' event is emitted.

  • the filereceived and fileremoved events are emiited together with this params: filename, filedir, filetype, filesize, filefield.

    Parser

A Note about Parsing Data Rate vs Network Throughput


Overall parsing data-rate depends on many factors, it is generally possible to reach 700 MB/s and more ( searching a basic ~60 bytes boundary string, like Firefox uses ) with a real data Buffer totally loaded in RAM, but in my opinion, this parsing test emulates more a network with an high-level Throughput, than a real case.

Unfortunately, sending data over the cloud is sometime a long-time task, the data is chunked, and the chunk size may change because of underneath TCP flow control ( typically chunk size is ~ 8K to ~ 1024K ). Now, the point is that the parser is called for every chunk of data received, the total delay of calling the method becomes more perceptible with a lot of chunks.

I try to explain me:

In the world of Fairies, a super-fast Booyer-Moore parser in the best case:

  • data is not chunked,
  • there is a low pattern repetition, ( this get the result of n/m comparison )
    • network throughput == network bandwidth (not real),

reaches a time complexity of :

O( ( data chunk length ) / ( pattern length ) ) * ( time to do a single comparison ) = T
  or  for simplicity  
 O( n / m ) * t = T

(for the purists, O stands for Theta).

In real world, Murphy Laws assures that the best case doesn't exists: :O

  • data is chunked,
  • in some cases (very large CSV file) there is a big number of char comparisons ( it decreases the parser data-rate), however for optimism and simplicity, we use previous time result T = O( n / m ) * t.
  • network throughput < network bandwidth,
  • time 't' to do a single comparison, depends on how the comparison is implemented,

the time complexity becomes to look something like:

( T ) *  ( number of chunks ) * ( average number of parser calls per chunk * average delay time of a single call )  
  or
( T ) * ( k * d ) => ( O( n / m ) * t ) * ( c * k * d ) 

When the number k of chunks increases, the value ( c * k * d ) becomes to have a considerable weigth in terms of time consumption; I think it's obvious that, for the system, calling a function 10^4 times, is an heavier job than calling it only 1 time.

A single GB of data transferred, with a http chunk size of 40K, is typically splitted (on average) in ~ 26000 chunks!

However, in a general case,

  • we can do very little about reducing time delay of calling the parser and the number of chunks ( increasing their size ), it doesn't totally depend on us.
  • we could minimize the number of parser calls 'c', a single call for every chunk, c = 1.
  • we could minimize the time 't' to do a single char comparison , it obviously reduces the overall execution time.

For this reasons:

  • I try to not use long switch( .. ){ .. } statements or a long chain of if(..){..} else {..},
  • instead of building a complex state-machineI write a simple implementation of QuickSearch algorithm, using only high performance for-cycles,
  • for miminizing the time 't' to do a comparison, I have used two simple char lookup tables, 255 bytes long, implemented with nodeJS Buffers. (one for boundary pattern to match, one for CRLFCRLF sequence).

The only limit in this implementation is that it doesn't support a boundary length more than 254 bytes, for now it doesn't seem a real problem with all major browsers I have tested, they are all using a boundary totally made of ASCII chars, typically ~60bytes in length.

TODO

  • some code performance modifications in quickSearch.js and formaline.js
  • some code variables cleaning in formaline.js
  • change the core parser with a custom one
  • add some other server-side security checks, and write about it
  • in progress..

License

(The MIT License)

Copyright (c) 2011 Guglielmo Ferri <44gatti@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Something went wrong with that request. Please try again.