Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

initial commit

  • Loading branch information...
commit 1bc5776d3543640b4d9aeec13fe23e2b4ed5912f 0 parents
@andrewffff authored
22 LICENSE
@@ -0,0 +1,22 @@
+(The MIT License)
+
+Copyright (c) 2012 Andrew Francis <andrew@sullust.net>
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+'Software'), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
205 README.md
@@ -0,0 +1,205 @@
+
+# node-webmon
+
+node-webmon logs data from the Internet into a Postgres database. It can log data from raw sockets, socket.io connections, or a HTTP/HTTPS url which is polled on a regular basis.
+
+
+## WARNING
+
+node-webmon can be configured (on purpose, or by accident) to connect to an information source far more
+often than the source's owner is comfortable with. It doesn't even respect robots.txt! Be careful, and
+pay attention when you run it.
+
+
+## Installation
+
+You need Postgres! Setting up and connecting to Postgres is beyond the scope of this README. node-webmon has
+been tested with Postgres 9.1.
+
+Install node-webmon for all users using npm:
+
+ sudo npm install node-webmon -g
+
+node-webmon needs to be supplied with a connection string such as *tcp://andrew@127.0.0.1/webmon* (meaning to connect
+to the "webmon" database on the local machine as the "andrew" user). Once you have the database ready, you should
+run the SQL to reset the schema, as found below (or run node-webmon --print-schema).
+
+The data sources to poll are configured in the database itself.
+
+
+### Web source
+
+Add a web source like this:
+
+ INSERT INTO sources(normal_wait_secs, error_wait_secs, protocol, address)
+ VALUES (60, 600, 'web', 'http://example.com/webaddr/');
+
+This will cause node-webmon to fetch and log the contents of http://example.com/webaddr/ every 60 seconds. It will
+wait for 600 seconds if an error (including any non-2xx response code) occurs. URLs which redirect are not supported.
+
+If you omit error_wait_secs, the value for normal_wait_secs will be used instead.
+
+### Raw socket source
+
+Add a raw socket like this:
+
+ INSERT INTO sources(normal_wait_secs, protocol, address)
+ VALUES (60, 'raw', '127.0.0.1:9000');
+
+node-webmon will connect to 127.0.0.1:9000 and each line (separated by a newline character) which is sent to it
+over the socket will be logged.
+
+If the socket connection fails, or the socket is closed from the other end, node-webmon will reconnect after the
+specified period.
+
+
+### socket.io source
+
+A socket.io source is a lot like a raw socket source, but instead of connecting with a raw TCP socket, the
+<a href="http://socket.io/">socket.io</a> v.9 protocol is used. node-webmon only listens for "message" type
+notifications from the server at this time. Example:
+
+ INSERT INTO sources(normal_wait_secs, error_wait_secs, protocol, address)
+ VALUES (10, 60, 'socket.io', 'https://127.0.0.1/socket.io/');
+
+
+### Removing a source
+
+Logged data references entries in the sources table. To deactivate a source without deleting previously
+collected log data, set the active column for its row to FALSE.
+
+
+## Running
+
+node-webmon should be started with a URI-style postgresql connection string as its only parameter. For
+example:
+
+ node-webmon tcp://user@127.0.0.1/webmondb
+
+The configuration is read from the database's sources table at startup. node-webmon will immediately
+start connecting to sources and logging.
+
+node-webmon should tolerate network and server errors. Local errors, including problems with the
+database, will result in an exception being thrown - meaning you're dumped out to the command line
+with a stack trace and error message. If you want to keep node-webmon running persistently, you may
+want to run it under a monitor program such as Forever.
+
+If you change the configuration in the database, you will have to restart node-webmon for changes
+to take effect.
+
+
+## Logging
+
+Web sources' data is logged into the log_http table. Socket.io and raw socket sources' data
+goes into log_stream. When network/source errors occur, they will be logged into log_error.
+
+
+## Caveats
+
+- Node.js processes network traffic before our code gets it. It's also single threaded and
+ prone to garbage collection pauses etc. So don't assume logged timestamps are too accurate.
+
+- Has not been extensively tested under error conditions.
+
+- No compression of logged data.
+
+- node-webmon is young and hasn't seen a lot of testing in the wild.
+
+- No real effort is made to write log data into Postgres in an efficient manner.
+
+
+## Database schema
+
+You can see the database schema by running "node-webmon --print-schema".
+
+ BEGIN;
+
+ DROP TABLE IF EXISTS log_http;
+ DROP TABLE IF EXISTS log_stream;
+ DROP TABLE IF EXISTS log_error;
+ DROP TABLE IF EXISTS source;
+ DROP TYPE IF EXISTS source_type;
+
+
+
+ CREATE TYPE source_type AS ENUM ('web', 'raw', 'socket.io');
+
+ CREATE TABLE source(
+ id SERIAL PRIMARY KEY,
+ active BOOLEAN NOT NULL DEFAULT true,
+ normal_wait_secs INTEGER NOT NULL,
+ error_wait_secs INTEGER DEFAULT NULL,
+ protocol source_type NOT NULL,
+ address TEXT NOT NULL
+ );
+
+
+
+ CREATE TABLE log_error(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+
+ error TEXT NOT NULL
+ );
+
+
+ CREATE TABLE log_stream(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts_connection TIMESTAMP WITHOUT TIME ZONE,
+
+ ts_received TIMESTAMP WITHOUT TIME ZONE,
+
+ line_received TEXT NOT NULL
+ );
+
+
+ CREATE TABLE log_http(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts_request_made TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_headers_start TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_content_start TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_content_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+
+ resp_code INTEGER,
+ resp_headers TEXT,
+ resp_content TEXT
+ );
+
+
+
+ COMMIT;
+
+## License
+
+(The MIT License)
+
+Copyright (c) 2012 Andrew Francis <andrew@sullust.net>
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+'Software'), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
80 libs/ConfigFromDb.js
@@ -0,0 +1,80 @@
+
+var http = require('http'),
+ https = require('https'),
+ url = require('url');
+
+// given a database row from the config table, do some basic checking and
+// return a stream config structure, or raise an exception
+exports.configFromDb = function(row) {
+ var streamConfig = {
+ src_id: row.id,
+ protocol: row.protocol,
+ name: '[' + row.id + '] ' + row.address,
+ normal_wait_secs: row.normal_wait_secs,
+ error_wait_secs: row.error_wait_secs !== null ? row.error_wait_secs : row.normal_wait_secs
+ };
+
+ switch(row.protocol) {
+ case "web":
+ // parse url, augment with unparsed URL and reference to http or https module
+ streamConfig.httpOptions = url.parse(row.address)
+
+ switch(streamConfig.httpOptions.protocol) {
+ case 'http:':
+ streamConfig.httpImpl = http;
+ break;
+ case 'https:':
+ streamConfig.httpImpl = https;
+ break;
+ default:
+ throw "Not a valid HTTP or HTTPS url: " + row.address;
+ }
+
+ streamConfig.httpOptions.headers = { 'Connection': 'keep-alive' };
+
+ break;
+
+ case "raw":
+ var splitAddress = row.address.split(':');
+ if(splitAddress.length == 2) {
+ var port = parseInt(splitAddress[1], 10);
+ if(port && splitAddress[1] === ''+port && port < 65536) {
+ streamConfig.netOptions = { host: splitAddress[0], port: port };
+ }
+ }
+
+ if(!streamConfig.netOptions)
+ throw "Not a valid hostname:port address: " + row.address;
+
+ streamConfig.timeout = null;
+
+ break;
+
+ case "socket.io":
+ // broadly check that we have a http/https url to pass to socket.io
+ switch(url.parse(row.address).protocol) {
+ case "http:":
+ case "https:":
+ break;
+ default:
+ throw "Not a valid HTTP or HTTPS socket.io url: " + row.address;
+ }
+
+ streamConfig.ioAddress = row.address;
+ streamConfig.ioOptions = {
+ 'reconnection delay': 1000 * row.normal_wait_secs,
+ 'max reconnection attempts': 2,
+ 'force new connection': true
+ };
+
+ break;
+
+ default:
+ throw "Not a recognised protocol: " + row.protocol;
+ break;
+ }
+
+ return streamConfig;
+}
+
+
77 libs/HttpResponseDecoder.js
@@ -0,0 +1,77 @@
+
+var Buffer = require('buffer').Buffer,
+ Iconv = require('iconv').Iconv;
+
+/**
+ * Turns the raw bytes from a good (2xx status, and headers received) HTTP
+ * response into a string, after checking for certain errors. If Content-Type
+ * indicates a particular character encoding, we honor it, otherwise we
+ * assume the content is encoded in UTF-8.
+ *
+ * From your HTTP response: pass in the numeric response code, an object containing
+ * the received headers (with keys lowercased), and an array of Buffer objects
+ * containing the received data.
+ *
+ * In terms of the example at http://nodejs.org/api/http.html#http_http_request_options_callback
+ * this is res.statusCode, res.headers, and an array of all the buffers emitted with
+ * the "data" events. (You must _not_ call res.setEncoding!)
+ *
+ * This function will return the response body as a string, if the body seems correct
+ * given the received headers. If not an exception will be raised.
+ *
+ * Errors currently checked for:
+ *
+ * - Non 2xx status code.
+ *
+ * - Status code or headers not received.
+ *
+ * - The headers contain Content-Length, and we have received more or
+ * less bytes than we should have.
+ *
+ * - The headers indicate a character encoding which we do not understand.
+ *
+ * - The body content is not well-formed according to the appropriate
+ * character encoding.
+ *
+ */
+exports.checkAndDecode = function(resp_code, resp_headers, resp_content) {
+ // check response code and header
+ if(!resp_code || !resp_headers) {
+ throw new Error("No headers received in response to HTTP request");
+ }
+
+ if(resp_code < 200 || resp_code >= 300) {
+ throw new Error("Received non-2xx HTTP response: " + resp_code);
+ }
+
+ // grab encoding from the content-type header. default to utf-8
+ var encoding = 'utf-8';
+ if(resp_headers.hasOwnProperty('content-type')) {
+ var ct = resp_headers['content-type'];
+ var startIdx = ct.lastIndexOf('charset=');
+
+ if(startIdx >= 0) {
+ startIdx += 8;
+ var endIdx = ct.indexOf(';', startIdx);
+ encoding = ct.substring(startIdx, endIdx >= 0 ? endIdx : undefined).trim();
+ }
+ }
+
+ // combine buffers, and complain if we have contradictory content-length information
+ resp_content = Buffer.concat(resp_content);
+ if(resp_headers && resp_headers.hasOwnProperty('content-length')) {
+ if(resp_content.length !== +resp_headers['content-length']) {
+ throw new Error("content-length header does not match content received");
+ }
+ }
+
+ // convert buffer to utf-8. if it is already ostensibly utf-8, this will validate it
+ try {
+ var codec = new Iconv(encoding, 'utf-8');
+ return codec.convert(resp_content).toString();
+ } catch(e) {
+ throw new Error("Character encoding error in received content: " + e.toString());
+ }
+}
+
+
285 node-webmon.js
@@ -0,0 +1,285 @@
+
+var net = require('net'),
+ fs = require('fs');
+
+var pg = require('pg'),
+ carrier = require('carrier'),
+ commander = require('commander'),
+ ioclient = require('socket.io-client');
+
+var HttpResponseDecoder = require('./libs/HttpResponseDecoder'),
+ configFromDb = require('./libs/ConfigFromDb').configFromDb;
+
+
+
+//
+// Stuff to log content and errors into the database
+//
+
+function DbWriter(db,errorLogObject,successLogObject) {
+ this.db = db;
+ this.errorLogObject = errorLogObject;
+ this.successLogObject = successLogObject;
+}
+
+DbWriter.prototype.logHttp = function(data) {
+ // convert resp_headers from a key-value object back to a string
+ var headers = "";
+ Object.getOwnPropertyNames(data.resp_headers).forEach(function(k) {
+ headers += k + ": " + data.resp_headers[k] + "\r\n";
+ });
+
+ this.db.query({
+ name: 'loghttp',
+ text: "INSERT INTO log_http(src_id," +
+ "ts_request_made, ts_headers_start, ts_content_start, ts_content_end," +
+ "resp_code, resp_headers, resp_content) " +
+ "VALUES ($1," +
+ "to_timestamp( $2 /1000.0), to_timestamp( $3 / 1000.0), to_timestamp( $4 /1000.0), to_timestamp( $5 / 1000.0), " +
+ "$6, $7, $8)",
+ values: [
+ data.src_id,
+ data.ts_request_made, data.ts_headers_start, data.ts_content_start, data.ts_content_end,
+ data.resp_code, headers, data.resp_content]
+ });
+
+ if(this.successLogObject) this.successLogObject.log(data);
+}
+
+
+DbWriter.prototype.logStream = function(data) {
+ this.db.query({
+ name: 'logstream',
+ text: "INSERT INTO log_stream(src_id,ts_connection,ts_received,line_received) " +
+ "VALUES ($1, to_timestamp($2 / 1000.0), to_timestamp( $3 /1000.0), $4)",
+ values: [data.src_id, data.ts_connection, data.ts_received, data.line_received]
+ });
+
+ if(this.successLogObject) this.successLogObject.log(data);
+}
+
+
+DbWriter.prototype.logError = function(data) {
+ this.db.query({
+ name: 'logerror',
+ text: "INSERT INTO log_error(src_id,ts,error) " +
+ "VALUES ($1, to_timestamp($2 /1000.0), $3)",
+ values: [data.src_id, data.ts, data.error.toString()]
+ });
+
+ if(this.errorLogObject) this.errorLogObject.log(data);
+}
+
+
+
+//
+// the actual program
+//
+function run(pgClient) {
+ var writer = new DbWriter(pgClient, console, console);
+
+ function doWeb(streamConfig) {
+ // accumulate stuff to log
+ var receivedData = {
+ src_id: streamConfig.src_id,
+
+ ts_request_made: null,
+ ts_headers_start: null,
+ ts_content_start: null,
+ ts_content_end: null,
+
+ resp_code: null,
+ resp_headers: null,
+ resp_content: []
+ };
+
+ // request object
+ var req = streamConfig.httpImpl.request(streamConfig.httpOptions);
+
+ // make sure we don't act more than once as the result of errors
+ var haveFinished = false;
+ function finish(err) {
+ if(!haveFinished) {
+ haveFinished = true;
+
+ if(err) {
+ req.abort();
+ writer.logError({
+ src_id: receivedData.src_id,
+ ts: Date.now(),
+ error: err
+ });
+ } else {
+ writer.logHttp(receivedData);
+ }
+
+ setTimeout(function() { doWeb(streamConfig); }, 1000*streamConfig.normal_wait_secs);
+ }
+ }
+
+ req.on('response', function(res) {
+ receivedData.resp_code = res.statusCode;
+ receivedData.resp_headers = res.headers;
+
+ res.on('data', function(chunk) {
+ if(!receivedData.ts_content_start)
+ receivedData.ts_content_start = Date.now();
+ receivedData.resp_content.push(chunk);
+ });
+
+ res.on('end', function() {
+ receivedData.ts_content_end = Date.now();
+
+ try {
+ // check and convert from raw bytes to a string
+ receivedData.resp_content = HttpResponseDecoder.checkAndDecode(
+ receivedData.resp_code, receivedData.resp_headers, receivedData.resp_content);
+ finish();
+ } catch(err) {
+ finish(err);
+ }
+ });
+ });
+
+ req.once('socket', function() {
+ // HTTP request should have just been sent
+ receivedData.ts_request_made = Date.now();
+
+ // next byte received = start of the server's HTTP response
+ req.socket.once('data', function() {
+ receivedData.ts_headers_start = Date.now();
+ });
+ });
+
+ req.on('error', finish); // finish will be called with error object
+
+ req.end();
+ }
+
+
+ function doRawStream(streamConfig) {
+ var socket = net.connect(streamConfig.netOptions);
+ var connectedAt = null;
+
+ socket.on('connect', function() {
+ connectedAt = Date.now();
+ });
+
+ socket.setTimeout(+streamConfig.timeout, function() {
+ //console.log(streamConfig.name, ' closing due to timeout');
+ socket.close();
+ });
+
+ carrier.carry(socket, function(line) {
+ var d = Date.now();
+
+ writer.logStream({
+ src_id: streamConfig.src_id,
+ ts_connection: connectedAt,
+ ts_received: d,
+ line_received: line
+ });
+ });
+
+ socket.on('end', function(hadError) {
+ var errText = streamConfig.name + ' socket closed ' +
+ (hadError ? '(due to underlying error)' : '(healthily)') +
+ ', reconnecting in ' + streamConfig.reconnect_secs + 's';
+
+ writer.logError({
+ src_id: streamConfig.src_id,
+ ts: Date.now(),
+ error: errText
+ });
+
+ setTimeout(function() { doRawStream(streamConfig); }, streamConfig.reconnect_secs*1000);
+ });
+ }
+
+
+ function doIoStream(streamConfig) {
+ var conn = ioclient.connect(streamConfig.ioAddress, streamConfig.ioOptions);
+ var connectedAt = null;
+
+ // we regard disconnection as an "error" for logging purposes, and a reconnect
+ // to be a "new connection". we use the "normal wait" time for the reconnection
+ // attempts that socket.io itself performs. when socket.io gives up after a few
+ // attempts, we use the "error wait" time before forcing a retry
+
+ function handleError(note,needsReconnect) {
+ writer.logError({
+ src_id: streamConfig.src_id,
+ ts: Date.now(),
+ error: note
+ });
+
+ if(needsReconnect) {
+ setTimeout(function() { doIoStream(streamConfig); }, 1000*streamConfig.error_wait_secs);
+ }
+ }
+
+ // https://github.com/LearnBoost/socket.io/wiki/Exposed-events - would be
+ // nice to capture the "anything" bit somehow!
+
+ conn.on('connect', function() { connectedAt = Date.now(); });
+ conn.on('reconnect', function() { connectedAt = Date.now(); });
+
+ conn.on('connect_failed' , function() { handleError("connect_failed", true); });
+ conn.on('reconnect_failed', function() { handleError("reconnect_failed", true); });
+ conn.on('error', function() { handleError("unspecified error", false); });
+
+ conn.on('message', function(data) {
+ var d = Date.now();
+
+ writer.logStream({
+ src_id: streamConfig.src_id,
+ ts_connection: connectedAt,
+ ts_received: d,
+ line_received: JSON.stringify(data)
+ });
+ });
+ }
+
+ pgClient.query("SELECT id, normal_wait_secs, error_wait_secs, protocol, address FROM source WHERE active IS TRUE", function(err,result) {
+ if(err)
+ throw "Could not fetch configuration from database: " + err.toString();
+
+ if(!result.rows.length)
+ throw "No active sources in source table";
+
+ result.rows.map(configFromDb).forEach(function(streamConfig) {
+ switch(streamConfig.protocol) {
+ case "web": doWeb(streamConfig); break;
+ case "raw": doRawStream(streamConfig); break;
+ case "socket.io": doIoStream(streamConfig); break;
+ }
+ });
+ });
+}
+
+
+//
+// parse and act on command line
+//
+commander
+ .version('0.1.0')
+ .usage('--print-schema | <database connection string>')
+ .option('--print-schema', 'Print SQL that will (destructively!) set up the database with a blank config')
+ .parse(process.argv);
+
+if(commander.args.length + (commander.printSchema?1:0) != 1) {
+ console.log("You must provide --print-schema or a connection string. Never both!");
+ process.exit(1);
+} else if(commander.printSchema) {
+ console.log("%s", fs.readFileSync('./schema.sql', 'utf-8'));
+ process.exit(0);
+} else {
+ var pgClient = new pg.Client(commander.args[0]);
+ pgClient.connect(function(err) {
+ if(err)
+ throw "Could not connect to database: " + err.toString();
+ run(pgClient);
+ });
+}
+
+
31 package.json
@@ -0,0 +1,31 @@
+{
+ "name": "node-webmon",
+ "description": "A utility to poll webpages regularly, and/or stream data from a raw socket or socket.io server. All received data is logged into postgresql.",
+ "homepage": "http://github.com/andrewffff/node-webmon",
+ "author": "Andrew Francis (http://github.com/andrewffff)",
+ "bin": {
+ "node-webmon": "./node-webmon.js"
+ },
+ "repository": {
+ "type": "git",
+ "url": "http://github.com/andrewffff/node-webmon.git"
+ },
+ "keywords": [
+ "cli",
+ "monitor",
+ "web",
+ "archive",
+ "socket.io"
+ ],
+ "version": "0.1.0",
+ "preferGlobal": "true",
+ "license": "MIT",
+ "dependencies": {
+ "carrier": "0.1.7",
+ "commander": "1.0.4",
+ "iconv": "1.2.3",
+ "pg": "0.8.4",
+ "socket.io-client": "0.9.10"
+ },
+ "main": "node-webmon.js"
+}
67 schema.sql
@@ -0,0 +1,67 @@
+
+BEGIN;
+
+DROP TABLE IF EXISTS log_http;
+DROP TABLE IF EXISTS log_stream;
+DROP TABLE IF EXISTS log_error;
+DROP TABLE IF EXISTS source;
+DROP TYPE IF EXISTS source_type;
+
+
+
+CREATE TYPE source_type AS ENUM ('web', 'raw', 'socket.io');
+
+CREATE TABLE source(
+ id SERIAL PRIMARY KEY,
+ active BOOLEAN NOT NULL DEFAULT true,
+ normal_wait_secs INTEGER NOT NULL,
+ error_wait_secs INTEGER DEFAULT NULL,
+ protocol source_type NOT NULL,
+ address TEXT NOT NULL
+);
+
+
+
+CREATE TABLE log_error(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+
+ error TEXT NOT NULL
+);
+
+
+CREATE TABLE log_stream(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts_connection TIMESTAMP WITHOUT TIME ZONE,
+
+ ts_received TIMESTAMP WITHOUT TIME ZONE,
+
+ line_received TEXT NOT NULL
+);
+
+
+CREATE TABLE log_http(
+ id SERIAL PRIMARY KEY,
+
+ src_id INTEGER NOT NULL REFERENCES source(id),
+
+ ts_request_made TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_headers_start TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_content_start TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+ ts_content_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+
+ resp_code INTEGER,
+ resp_headers TEXT,
+ resp_content TEXT
+);
+
+
+
+COMMIT;
+
168 tests/testConfigFromDb.js
@@ -0,0 +1,168 @@
+
+var assert = require('assert'),
+ http = require('http'),
+ https = require('https');
+
+
+var configFromDb = require('../libs/ConfigFromDb.js').configFromDb;
+
+
+function shouldRaiseError(errorSubstring, configRow) {
+ var err = null;
+ try {
+ configFromDb(configRow);
+ } catch(e) {
+ err = e.toString();
+ }
+
+ assert.notStrictEqual(err, null);
+ assert(err.indexOf(errorSubstring) >= 0);
+}
+
+
+shouldRaiseError("Not a valid HTTP or HTTPS url", {
+ id: 123,
+ protocol: 'web',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'ftp://example.com/'
+});
+
+
+shouldRaiseError("Not a valid HTTP or HTTPS url", {
+ id: 123,
+ protocol: 'web',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'example.com/'
+});
+
+
+shouldRaiseError("Not a valid hostname:port address", {
+ id: 123,
+ protocol: 'raw',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'ftp://example.com/'
+});
+
+
+shouldRaiseError("Not a valid HTTP or HTTPS socket.io url", {
+ id: 123,
+ protocol: 'socket.io',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'ftp://example.com/'
+});
+
+
+shouldRaiseError("Not a recognised protocol", {
+ id: 123,
+ protocol: 'blahblah',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'ftp://example.com/'
+});
+
+
+shouldRaiseError("Not a recognised protocol", {
+ id: 123,
+ protocol: 'https', // NOPE! should be web
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'https://example.com/'
+});
+
+
+assert.deepEqual(
+configFromDb({
+ id: 123,
+ protocol: 'web',
+ normal_wait_secs: 650,
+ error_wait_secs: null,
+ address: 'http://example.com/'
+}),
+{ src_id: 123,
+ protocol: 'web',
+ name: '[123] http://example.com/',
+ normal_wait_secs: 650,
+ error_wait_secs: 650,
+ httpOptions:
+ { protocol: 'http:',
+ slashes: true,
+ host: 'example.com',
+ hostname: 'example.com',
+ href: 'http://example.com/',
+ pathname: '/',
+ path: '/',
+ headers: { Connection: 'keep-alive' } },
+ httpImpl: http
+});
+
+
+assert.deepEqual(
+configFromDb({
+ id: 123,
+ protocol: 'web',
+ normal_wait_secs: 650,
+ error_wait_secs: 222,
+ address: 'https://example.com/'
+}),
+{ src_id: 123,
+ protocol: 'web',
+ name: '[123] https://example.com/',
+ normal_wait_secs: 650,
+ error_wait_secs: 222,
+ httpOptions:
+ { protocol: 'https:',
+ slashes: true,
+ host: 'example.com',
+ hostname: 'example.com',
+ href: 'https://example.com/',
+ pathname: '/',
+ path: '/',
+ headers: { Connection: 'keep-alive' } },
+ httpImpl: https
+});
+
+
+
+assert.deepEqual(configFromDb({
+ id: 123,
+ protocol: 'raw',
+ normal_wait_secs: 11,
+ error_wait_secs: 60,
+ address: 'example.com:8000'
+}),
+{ src_id: 123,
+ protocol: 'raw',
+ name: '[123] example.com:8000',
+ normal_wait_secs: 11,
+ error_wait_secs: 60,
+ netOptions: { host: 'example.com', port: 8000 },
+ timeout: null }
+);
+
+
+
+assert.deepEqual(configFromDb({
+ id: 123,
+ protocol: 'socket.io',
+ normal_wait_secs: 30,
+ error_wait_secs: null,
+ address: 'http://example.com/socket.io/'
+}),
+{ src_id: 123,
+ protocol: 'socket.io',
+ name: '[123] http://example.com/socket.io/',
+ normal_wait_secs: 30,
+ error_wait_secs: 30,
+ ioAddress: 'http://example.com/socket.io/',
+ ioOptions:
+ { 'reconnection delay': 30000,
+ 'max reconnection attempts': 2,
+ 'force new connection': true } }
+);
+
+
+
139 tests/testHttpResponseDecoder.js
@@ -0,0 +1,139 @@
+
+var assert = require('assert'),
+ Buffer = require('buffer').Buffer;
+
+
+var checkAndDecode = require('../libs/HttpResponseDecoder.js').checkAndDecode;
+
+function mustThrow(a,b,c) {
+ assert.throws(function(){
+ checkAndDecode(a,b,c);
+ });
+}
+
+function mustNotThrow(a,b,c) {
+ assert.doesNotThrow(function(){
+ checkAndDecode(a,b,c);
+ });
+}
+
+
+// Some sample encodings. Steal some Japanese text from node-iconv's test.
+// Note the source file itself is UTF-8! Our UTF-7 example is longer than
+// necessary (the ASCII version would read fine as UTF-7) but still valid
+var HELLO_STRING = "Hello";
+var HELLO_ASCII = new Buffer([0x48, 0x65, 0x6c, 0x6c, 0x6f]);
+var HELLO_EBCDIC = new Buffer([0xC8, 0x85, 0x93, 0x93, 0x96]);
+var HELLO_UTF_7 = new Buffer("+AEgAZQBsAGwAbw-");
+
+var ICONVEXAMPLE_ISO_2022_JP = new Buffer([0x1b, 0x24, 0x40, 0x24, 0x2c]);
+var ICONVEXAMPLE_UTF_8 = new Buffer('');
+
+var CREPE_UTF_8 = new Buffer([0x63, 0x72, 0xC3, 0xAA, 0x70, 0x65]);
+var INVALID_UTF_8 = new Buffer([0x63, 0x72, 0xC3, 0x70, 0x65]); // missing second byte in two-byte code
+assert.equal(CREPE_UTF_8.toString(), 'crêpe');
+
+var CREPE_UTF_8_SPLIT = [
+ new Buffer([]),
+ new Buffer([0x63, 0x72, 0xC3, 0xAA]),
+ new Buffer([ 0x70]),
+ new Buffer([]),
+ new Buffer([ 0x65])
+ ];
+
+var INVALID_UTF_8_SPLIT = [
+ new Buffer([]),
+ new Buffer([0x63, 0x72, 0xC3 ]),
+ new Buffer([ 0x70]),
+ new Buffer([ 0x65])
+ ];
+
+
+
+// Need a 2xx code, headers, and an array of buffers
+mustNotThrow(200, { 'content-type': 'text/plain' }, [new Buffer("hello world")]);
+
+mustThrow(302, { 'content-type': 'text/plain' }, [new Buffer("hello world")]);
+mustThrow(null, { 'content-type': 'text/plain' }, [new Buffer("hello world")]);
+mustThrow(200, null, [new Buffer("hello world")]);
+mustThrow(302, { 'content-type': 'text/plain' }, null);
+
+
+// XXX - We choke on a null header block, but allow zero headers in an object. Correct?
+mustNotThrow(200, {}, [new Buffer("hello world")]);
+mustThrow(200, null, [new Buffer("hello world")]);
+
+
+// Test basic decoding. Note "Hello" is ASCII is the same in UTF-7 and UTF-8! There
+// is also a more complicated UTF-7 encoding which doesn't equal "Hello" in anything else
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-7' }, [HELLO_UTF_7]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-7' }, [HELLO_ASCII]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-8' }, [HELLO_ASCII]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=ascii' }, [HELLO_ASCII]));
+
+assert.notEqual(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=ascii' }, [HELLO_UTF_7]));
+
+
+// Make sure the charset can be anywhere
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-7' }, [HELLO_UTF_7]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; foo=bar; charset=utf-7' }, [HELLO_UTF_7]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-7; foo=bar' }, [HELLO_UTF_7]));
+assert.equal(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-7 ' }, [HELLO_UTF_7]));
+assert.notEqual(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-8' }, [HELLO_UTF_7]));
+assert.notEqual(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; foo=bar; charset=utf-8' }, [HELLO_UTF_7]));
+assert.notEqual(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-8; foo=bar' }, [HELLO_UTF_7]));
+assert.notEqual(HELLO_STRING, checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-8 ' }, [HELLO_UTF_7]));
+
+
+// utf-9 does not exist so we should not accept it
+mustNotThrow(200, { 'content-type': 'text/plain; charset=utf-8' }, [HELLO_ASCII]);
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-9' }, [HELLO_ASCII]);
+
+
+// ISO-2022-JP. No character encoding implies UTF-8. Interpreting utf-8 as ISO-8859-1 should yield incorrectness
+assert.equal(
+ checkAndDecode(200, { 'content-type': 'text/plain; charset=utf-8' }, [ICONVEXAMPLE_UTF_8]),
+ checkAndDecode(200, { 'content-type': 'text/plain; charset=iso-2022-jp' }, [ICONVEXAMPLE_ISO_2022_JP]));
+
+assert.equal(
+ checkAndDecode(200, { 'content-type': 'text/plain' }, [ICONVEXAMPLE_UTF_8]),
+ checkAndDecode(200, { 'content-type': 'text/plain; charset=iso-2022-jp' }, [ICONVEXAMPLE_ISO_2022_JP]));
+
+assert.notEqual(
+ checkAndDecode(200, { 'content-type': 'text/plain; charset=iso-8859-1' }, [ICONVEXAMPLE_UTF_8]),
+ checkAndDecode(200, { 'content-type': 'text/plain; charset=iso-2022-jp' }, [ICONVEXAMPLE_ISO_2022_JP]));
+
+
+// the same content received all at once, versus in several little blocks
+assert.equal(
+ checkAndDecode(200, {}, [CREPE_UTF_8]),
+ checkAndDecode(200, {}, CREPE_UTF_8_SPLIT));
+
+
+// invalid utf-8
+mustNotThrow(200, { 'content-type': 'text/html; charset=utf-8' }, [CREPE_UTF_8]);
+mustThrow(200, { 'content-type': 'text/html; charset=utf-8' }, [INVALID_UTF_8]);
+
+mustNotThrow(200, { 'content-type': 'text/html; charset=utf-8' }, CREPE_UTF_8_SPLIT);
+mustThrow(200, { 'content-type': 'text/html; charset=utf-8' }, INVALID_UTF_8_SPLIT);
+
+
+// make sure content-length header is respected. crêpe (accent over middle character)
+// is 5 characters which are encoded as 6 bytes in utf-8. content-length can be a number
+// or string
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '5' }, [CREPE_UTF_8]);
+mustNotThrow(200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '6' }, [CREPE_UTF_8]);
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '7' }, [CREPE_UTF_8]);
+
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 5 }, [CREPE_UTF_8]);
+mustNotThrow(200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 6 }, [CREPE_UTF_8]);
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 7 }, [CREPE_UTF_8]);
+
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '5' }, CREPE_UTF_8_SPLIT);
+mustNotThrow(200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '6' }, CREPE_UTF_8_SPLIT);
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': '7' }, CREPE_UTF_8_SPLIT);
+
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 5 }, CREPE_UTF_8_SPLIT);
+mustNotThrow(200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 6 }, CREPE_UTF_8_SPLIT);
+mustThrow (200, { 'content-type': 'text/plain; charset=utf-8', 'content-length': 7 }, CREPE_UTF_8_SPLIT);
+
Please sign in to comment.
Something went wrong with that request. Please try again.