Skip to content

Commit

Permalink
#7 Want function for parsing an integer
Browse files Browse the repository at this point in the history
Reviewed by: Dave Pacheco <dap@joyent.com>
Approved by: Dave Pacheco <dap@joyent.com>
  • Loading branch information
melloc committed Mar 13, 2017
1 parent 825aba4 commit 6ea6cb4
Show file tree
Hide file tree
Showing 6 changed files with 816 additions and 3 deletions.
4 changes: 4 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

None yet.

## v1.4.0 (2017-03-13)

* #7 Add parseInteger() function for safer number parsing

## v1.3.1 (2016-09-12)

* #13 Incompatible with webpack
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ test:
node test/hrtimeadd.js
node test/extraprops.js
node test/merge.js
node test/parse-integer.js
@echo tests okay

include ./Makefile.targ
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,61 @@ Returns true if the given string ends with the given suffix and false
otherwise.


### parseInteger(str, options)

Parses the contents of `str` (a string) as an integer. On success, the integer
value is returned (as a number). On failure, an error is **returned** describing
why parsing failed.

By default, leading and trailing whitespace characters are not allowed, nor are
trailing characters that are not part of the numeric representation. This
behaviour can be toggled by using the options below. The empty string (`''`) is
not considered valid input. If the return value cannot be precisely represented
as a number (i.e., is smaller than `Number.MIN_SAFE_INTEGER` or larger than
`Number.MAX_SAFE_INTEGER`), an error is returned. Additionally, the string
`'-0'` will be parsed as the integer `0`, instead of as the IEEE floating point
value `-0`.

This function accepts both upper and lowercase characters for digits, similar to
`parseInt()`, `Number()`, and [strtol(3C)](https://illumos.org/man/3C/strtol).

The following may be specified in `options`:

Option | Type | Default | Meaning
------------------ | ------- | ------- | ---------------------------
base | number | 10 | numeric base (radix) to use, in the range 2 to 36
allowSign | boolean | true | whether to interpret any leading `+` (positive) and `-` (negative) characters
allowImprecise | boolean | false | whether to accept values that may have lost precision (past `MAX_SAFE_INTEGER` or below `MIN_SAFE_INTEGER`)
allowPrefix | boolean | false | whether to interpret the prefixes `0b` (base 2), `0o` (base 8), `0t` (base 10), or `0x` (base 16)
allowTrailing | boolean | false | whether to ignore trailing characters
trimWhitespace | boolean | false | whether to trim any leading or trailing whitespace/line terminators
leadingZeroIsOctal | boolean | false | whether a leading zero indicates octal

Note that if `base` is unspecified, and `allowPrefix` or `leadingZeroIsOctal`
are, then the leading characters can change the default base from 10. If `base`
is explicitly specified and `allowPrefix` is true, then the prefix will only be
accepted if it matches the specified base. `base` and `leadingZeroIsOctal`
cannot be used together.

**Context:** It's tricky to parse integers with JavaScript's built-in facilities
for several reasons:

- `parseInt()` and `Number()` by default allow the base to be specified in the
input string by a prefix (e.g., `0x` for hex).
- `parseInt()` allows trailing nonnumeric characters.
- `Number(str)` returns 0 when `str` is the empty string (`''`).
- Both functions return incorrect values when the input string represents a
valid integer outside the range of integers that can be represented precisely.
Specifically, `parseInt('9007199254740993')` returns 9007199254740992.
- Both functions always accept `-` and `+` signs before the digit.
- Some older JavaScript engines always interpret a leading 0 as indicating
octal, which can be surprising when parsing input from users who expect a
leading zero to be insignificant.

While each of these may be desirable in some contexts, there are also times when
none of them are wanted. `parseInteger()` grants greater control over what
input's permissible.

### iso8601(date)

Converts a Date object to an ISO8601 date string of the form
Expand Down
249 changes: 248 additions & 1 deletion lib/jsprim.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
* lib/jsprim.js: utilities for primitive JavaScript types
*/

var mod_assert = require('assert');
var mod_assert = require('assert-plus');
var mod_util = require('util');

var mod_extsprintf = require('extsprintf');
Expand All @@ -29,6 +29,8 @@ exports.mergeObjects = mergeObjects;
exports.startsWith = startsWith;
exports.endsWith = endsWith;

exports.parseInteger = parseInteger;

exports.iso8601 = iso8601;
exports.rfc1123 = rfc1123;
exports.parseDateTime = parseDateTime;
Expand Down Expand Up @@ -279,6 +281,251 @@ function parseDateTime(str)
}
}


/*
* Number.*_SAFE_INTEGER isn't present before node v0.12, so we hardcode
* the ES6 definitions here, while allowing for them to someday be higher.
*/
var MAX_SAFE_INTEGER = Number.MAX_SAFE_INTEGER || 9007199254740991;
var MIN_SAFE_INTEGER = Number.MIN_SAFE_INTEGER || -9007199254740991;


/*
* Default options for parseInteger().
*/
var PI_DEFAULTS = {
base: 10,
allowSign: true,
allowPrefix: false,
allowTrailing: false,
allowImprecise: false,
trimWhitespace: false,
leadingZeroIsOctal: false
};

var CP_0 = 0x30;
var CP_9 = 0x39;

var CP_A = 0x41;
var CP_B = 0x42;
var CP_O = 0x4f;
var CP_T = 0x54;
var CP_X = 0x58;
var CP_Z = 0x5a;

var CP_a = 0x61;
var CP_b = 0x62;
var CP_o = 0x6f;
var CP_t = 0x74;
var CP_x = 0x78;
var CP_z = 0x7a;

var PI_CONV_DEC = 0x30;
var PI_CONV_UC = 0x37;
var PI_CONV_LC = 0x57;


/*
* A stricter version of parseInt() that provides options for changing what
* is an acceptable string (for example, disallowing trailing characters).
*/
function parseInteger(str, uopts)
{
mod_assert.string(str, 'str');
mod_assert.optionalObject(uopts, 'options');

var baseOverride = false;
var options = PI_DEFAULTS;

if (uopts) {
baseOverride = hasKey(uopts, 'base');
options = mergeObjects(options, uopts);
mod_assert.number(options.base, 'options.base');
mod_assert.ok(options.base >= 2, 'options.base >= 2');
mod_assert.ok(options.base <= 36, 'options.base <= 36');
mod_assert.bool(options.allowSign, 'options.allowSign');
mod_assert.bool(options.allowPrefix, 'options.allowPrefix');
mod_assert.bool(options.allowTrailing,
'options.allowTrailing');
mod_assert.bool(options.allowImprecise,
'options.allowImprecise');
mod_assert.bool(options.trimWhitespace,
'options.trimWhitespace');
mod_assert.bool(options.leadingZeroIsOctal,
'options.leadingZeroIsOctal');

if (options.leadingZeroIsOctal) {
mod_assert.ok(!baseOverride,
'"base" and "leadingZeroIsOctal" are ' +
'mutually exclusive');
}
}

var c;
var pbase = -1;
var base = options.base;
var start;
var mult = 1;
var value = 0;
var idx = 0;
var len = str.length;

/* Trim any whitespace on the left side. */
if (options.trimWhitespace) {
while (idx < len && isSpace(str.charCodeAt(idx))) {
++idx;
}
}

/* Check the number for a leading sign. */
if (options.allowSign) {
if (str[idx] === '-') {
idx += 1;
mult = -1;
} else if (str[idx] === '+') {
idx += 1;
}
}

/* Parse the base-indicating prefix if there is one. */
if (str[idx] === '0') {
if (options.allowPrefix) {
pbase = prefixToBase(str.charCodeAt(idx + 1));
if (pbase !== -1 && (!baseOverride || pbase === base)) {
base = pbase;
idx += 2;
}
}

if (pbase === -1 && options.leadingZeroIsOctal) {
base = 8;
}
}

/* Parse the actual digits. */
for (start = idx; idx < len; ++idx) {
c = translateDigit(str.charCodeAt(idx));
if (c !== -1 && c < base) {
value *= base;
value += c;
} else {
break;
}
}

/* If we didn't parse any digits, we have an invalid number. */
if (start === idx) {
return (new Error('invalid number: ' + JSON.stringify(str)));
}

/* Trim any whitespace on the right side. */
if (options.trimWhitespace) {
while (idx < len && isSpace(str.charCodeAt(idx))) {
++idx;
}
}

/* Check for trailing characters. */
if (idx < len && !options.allowTrailing) {
return (new Error('trailing characters after number: ' +
JSON.stringify(str.slice(idx))));
}

/* If our value is 0, we return now, to avoid returning -0. */
if (value === 0) {
return (0);
}

/* Calculate our final value. */
var result = value * mult;

/*
* If the string represents a value that cannot be precisely represented
* by JavaScript, then we want to check that:
*
* - We never increased the value past MAX_SAFE_INTEGER
* - We don't make the result negative and below MIN_SAFE_INTEGER
*
* Because we only ever increment the value during parsing, there's no
* chance of moving past MAX_SAFE_INTEGER and then dropping below it
* again, losing precision in the process. This means that we only need
* to do our checks here, at the end.
*/
if (!options.allowImprecise &&
(value > MAX_SAFE_INTEGER || result < MIN_SAFE_INTEGER)) {
return (new Error('number is outside of the supported range: ' +
JSON.stringify(str.slice(start, idx))));
}

return (result);
}


/*
* Interpret a character code as a base-36 digit.
*/
function translateDigit(d)
{
if (d >= CP_0 && d <= CP_9) {
/* '0' to '9' -> 0 to 9 */
return (d - PI_CONV_DEC);
} else if (d >= CP_A && d <= CP_Z) {
/* 'A' - 'Z' -> 10 to 35 */
return (d - PI_CONV_UC);
} else if (d >= CP_a && d <= CP_z) {
/* 'a' - 'z' -> 10 to 35 */
return (d - PI_CONV_LC);
} else {
/* Invalid character code */
return (-1);
}
}


/*
* Test if a value matches the ECMAScript definition of trimmable whitespace.
*/
function isSpace(c)
{
return (c === 0x20) ||
(c >= 0x0009 && c <= 0x000d) ||
(c === 0x00a0) ||
(c === 0x1680) ||
(c === 0x180e) ||
(c >= 0x2000 && c <= 0x200a) ||
(c === 0x2028) ||
(c === 0x2029) ||
(c === 0x202f) ||
(c === 0x205f) ||
(c === 0x3000) ||
(c === 0xfeff);
}


/*
* Determine which base a character indicates (e.g., 'x' indicates hex).
*/
function prefixToBase(c)
{
if (c === CP_b || c === CP_B) {
/* 0b/0B (binary) */
return (2);
} else if (c === CP_o || c === CP_O) {
/* 0o/0O (octal) */
return (8);
} else if (c === CP_t || c === CP_T) {
/* 0t/0T (decimal) */
return (10);
} else if (c === CP_x || c === CP_X) {
/* 0x/0X (hexadecimal) */
return (16);
} else {
/* Not a meaningful character */
return (-1);
}
}


function validateJsonObjectJS(schema, input)
{
var report = mod_jsonschema.validate(input, schema);
Expand Down
5 changes: 3 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
{
"name": "jsprim",
"version": "1.3.1",
"version": "1.4.0",
"description": "utilities for primitive JavaScript types",
"main": "./lib/jsprim.js",
"repository": {
"type": "git",
"url": "git://github.com/davepacheco/node-jsprim.git"
"url": "git://github.com/joyent/node-jsprim.git"
},
"dependencies": {
"assert-plus": "1.0.0",
"extsprintf": "1.0.2",
"json-schema": "0.2.3",
"verror": "1.3.6"
Expand Down
Loading

0 comments on commit 6ea6cb4

Please sign in to comment.