s10n stands for "sanitization". Just like l10n stands for "localization". See also i18n, l10n et al
A library to make basic user input sanitization and subsequent validation an easier job.
Sanitization is NOT validation, but it can help make validation an easier job and/or help to suggest to a user an input variation that better matches input expectations or requirements.
As with validation sanitization, if in place, should be applied on both frontend and backend, since a user can bypass sanitization and validation on the frontend and send input directly to a backend endpoint.
Let's assume the following scenario of a username input.
The rule is that only a-z, A-Z, numbers, underscore and dash are only expected in valid input.
A user submits a string of #UsEr #$%"' NaMe 5_6-9
.
Input gets invalidated, the rule gets presented to the user,
and the user expected to remove all invalid characters.
The input then becomes a valid string of UsErNaMe5_6-9
.
Alternatively an app might have suggested (or enforced) a valid input. Examples below are demonstration of default and tuned behaviour of a relevant semantic sanitizer (spaces get replaced with underscores).
let input = " UsEr #$%' NaMe 5_6-9 ";
s10n(input).keepUsername().value; // "UsErNaMe5_6-9"
s10n(input).keepUsernameLC().value; // "username5_6-9"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9"
Semantic sanitizers applied are a combination of elementary and compound transformers with an optional parameter to replace spaces (in this particular use case).
Let's assume the input received from a user is
" \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r "
Here are some issues worth attention and optimization:
- it contains problematic whitespaces
- it contains sequences of 2 or more whitespaces
- it contains leading and trailing whitespaces
- there is a variety of line break characters, potentially hazardous (CRLF injection)
- there are leading and trailing empty lines
- line break characters are invalid in a one line input
Any of the above can be considered as some unnecessarily contaminated data.
Having all issues fixed the above input would have been:
"Sensible text\nLine 2"
for a multiline input"Sensible text Line 2"
for a simple string input
let input = " \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r ";
s10n(input).minimizeWhitespaces().value; // "Sensible text Line 2"
s10n(input)
.preserveLineBreaks() // modifier for subsequent methods to preserve line breaks
.minimizeWhitespaces().value; // "Sensible text\nLine 2"
minimizeWhitespaces
does the following:
- normalizes line break characters, i.e.
CRLF (
\n\r
) and individual CR (\r
) are converted into LF (\n
) (default behaviour) - normalizes whitespaces into standard space character (
\x20
) - merges continuous whitespaces into a single space character
- normalizes lines in a multiline input (strips leading and trailing spaces in each line of a multiline input)
- trims leading and trailing whitespaces
- trims leading and trailing line breaks
Explore sandbox for more use cases.
Run npm i s10n
to add s10n as a dependency to your project.
In your app import s10n by either of the methods:
const s10n = require("s10n")
-- node styleimport s10n from "s10n"
-- module import style
Pick an appropriate version on jsdelivr CDN and add to the html file. Example:
<script src="https://cdn.jsdelivr.net/npm/s10n@latest/dist/s10n.min.js"></script>
Check the examples across this documentation for the use cases.
Use sandbox to play around.
s10n
offers a number of elementary, compound and semantic
transformers and sanitizers as well as a method to apply
an arbitrary sanitizer.
Below are the usage examples to give a general impression of the API.
s10n(" Some text \n Yet basically valid \n\n ")
.preserveLineBreaks()
.minimizeWhitespaces().value; // "Some text\nYet basically valid"
s10n(" My User Name ").keepUsernameLC("_").value; // "my_user_name"
let input = " Some arbitrary \t \xA0 text ";
s10n(input)
.normalizeWhitespaces()
.trim().value; // "Some arbitrary text"
s10n(input)
.mergeWhitespaces()
.trim().value; // "Some arbitrary text"
Modifiers affect behaviour of subsequent transformers.
Defines whether to preserve or disregard line break characters when applying transformers.
Default behaviour is to disregard line break characters.
This setting doesn't affect some transformers (e.g. trimLineBreaks()
).
These are marked correspondingly.
let input = " \n\n\n ";
s10n(input).trim().value; // ""
s10n(input)
.preserveLineBreaks()
.trim().value; // "\n\n\n"
Call disregardLineBreaks
when subsequent sanitizers should
disregard line breaks after any preceding transformations
has been affected by preserveLineBreaks
.
By default, whenever any sanitizer affects line break characters
a \n
is considered as a valid or target line break character.
This behaviour can be changed for subsequent sanitizers
(e.g. setLineBreakCharacter('\r')
).
Whenever line breaks in a string get normalized
CRLF (\r\n
) is converted into a single line break character
(\n
by default, or a value assigned by setLineBreakCharacter
method).
let input = "\r\n\n\n\r\r";
s10n(input).normalizeLineBreaks().value; // "\n\n\n\n\n"
s10n(input)
.setLineBreakCharacter("\r")
.normalizeLineBreaks().value; // "\r\r\r\r\r"
Elementary transformers have a pretty limited scope of responsibility. Normally used for basic transformations and as building blocks by compound transformers, semantic sanitizers and custom transformers/sanitizers.
s10n treats an extended set of characters, including
\x20\u200B\u200C\u200D\u2060\uFEFF\xA0
as whitespaces.
Characters \n
and \r
are not considered whitespaces
when preserveLineBreaks
modifier applied.
trim()
- removes leading and trailing whitespacestrimLineBreaks()
- always removes leading and trailing line break characters, disregarding the LineBreak modifier settingmergeLineBreaks()
- normalizes and merges consequent line breaks disregarding the LineBreak modifier settingnormalizeWhitespaces()
- all whitespaces are converted into space characters (\x20
)mergeWhitespaces()
- merges continuous clusters of whitespaces into a single space character (\x20
)stripWhitespaces()
- strips all whitespaces from input
Examples:
let input = "\n Z\tY \x0A \n X W\uFEFFV \n\n \n";
s10n(input).trim().value; // "Z\tY \x0A \n X W\uFEFFV"
s10n(input)
.preserveLineBreaks()
.trim().value; // <same as input>
s10n(input).trimLineBreaks().value; // " Z\tY \x0A \n X W\uFEFFV \n\n "
s10n(input)
.preserveLineBreaks()
.trimLineBreaks().value; // <same as with disregardNLineBreaks>
s10n("\n\r\r\r\n\nfoo\n\r\nbar\n\r\r\r\n\n").mergeLineBreaks().value; // "\nfoo\nbar\n"
s10n("\n \r\r\r\n\nfoo\n \r\nbar\n\r\r \r\n\n").mergeLineBreaks().value; // "\n \nfoo\n \nbar\n \n"
s10n(input).normalizeWhitespaces().value; // " Z Y X W V "
s10n(input)
.preserveLineBreaks()
.normalizeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"
s10n(input).mergeWhitespaces().value; // " Z Y X W V "
s10n(input)
.preserveLineBreaks()
.mergeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"
s10n(input).stripWhitespaces().value; // "ZYXWV"
s10n(input)
.preserveLineBreaks()
.stripWhitespaces()
.value(); // "\nZY\nXWV\n\n\n"
s10n(input)
.preserveLineBreaks()
.normalizeWhitespaces()
.mergeWhitespaces()
.trimLineBreaks()
.trim().value; // "Z Y \n X W V \n\n"
See also normalizeLineBreaks()
normalizeLineBreaks(lineBreakCharacter = undefined)
- transforms CRLF, CR, LF into a line break character defined following the rules below:- as specified by
lineBreakCharacter
argument - if param
lineBreakCharacter
is undefined, then as set bysetLineBreakCharacter()
- if
setLineBreakCharacter()
wasn't applied, then defaults to LF ('\n'
)
- as specified by
normalizeMultiline()
- strips whitespaces that immediately precede or follow line break characters; ignores LineBreak modifier setting
Examples:
let input = "\r\n\r abc \r\n def \r \t ghi \n \t\t \n \r\n\n\r\n\n\r\r \r\r\n";
s10n(input).normalizeLineBreaks().value; // "\n\n abc \n def \n \t ghi \n \t\t \n \n\n\n\n\n\n \n\n"
s10n(input).normalizeMultiline().value; // "\r\n\rabc\r\ndef\rghi\n\n\r\n\n\r\n\n\r\r\r\r\n"
s10n(input)
.normalizeLineBreaks()
.normalizeMultiline().value; // "\n\nabc\ndef\nghi\n\n\n\n\n\n\n\n\n\n"
See also minimizeWhitespaces()
These methods' behaviour is NOT affected
by LineBreak modifier
(disregarded by default, i.e. \s
RegExp token comprises \r
and \n
).
Specify \n
and/or \r
explicitly whenever those should be kept or removed.
Method argument should follow RegExp character class specification.
keepOnlyCharset(allowedChars = "-A-Za-z0-9_\\x20.,}{\\]\\[)(", regexpFlags)
- keep listed characters onlykeepOnlyRegExp(regexp, regexpFlags)
- keep characters as per RegExp (RegExp object or regexp body as a string)remove(disallowedChars, regexpFlags)
- remove listed charactersreplace(needle, replacement = "", regexpFlags)
- replaces a needle (which is a string, or a RegExp object) with the replacement string
regexpFlags
in the methods above is an optional parameter and
defaults to the flags as specified in _regexp
("gu").
Examples:
let input1 = "ABCDabcd01239 _-.,(abcd){defg}[hijk]";
s10n(input1).keepOnlyCharset("}{][)(").value; // "(){}[]"
s10n(input1).keepOnlyRegexp(/\{.*?\}|\[.*?\]|\(.*?\)/gu).value; // "(abcd){defg}[hijk]"
let input2 = "ABCDEFGHabcdefghABCDEFGHabcdefgh";
s10n(input2).remove("ABCD").value; // "EFGHabcdefghEFGHabcdefgh"
s10n(input2).remove("ABCD", "giu").value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/).value; // "EFGHabcdefghABCDEFGHabcdefgh"
s10n(input2).remove(/ABCD/giu).value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/, "giu").value; // "EFGHefghEFGHefgh"
toLowerCase()
- converts to lower casetoUpperCase()
- converts to upper case
Examples:
let input = "aBcD01";
s10n(input).toLowerCase().value; // "abcd01"
s10n(input).toUpperCase().value; // "ABCD01"
Compound transformers implement complex transformation rules applying multiple transformations, often using elementary transformers.
keepBase10Digits()
- strips out anything but0-9
keepBase16Digits()
- (alias:keepHexDigits()
) - strips out anything but0-9a-fA-F
; best combined chained withtoLowerCase()
ortoUpperCase()
for consistent resultminimizeWhitespaces()
- removes leading, trailing and continuous clusters of whitespaces and line breaks; when preceded withpreserveLineBreaks()
treats input as a multiline string and thus trims spaces in every line
Examples:
let input1 = " XYZ 20fE\n\n ";
s10n(input1).keepBase10Digits().value; // 20
s10n(input1).keepBase16Digits().value; // 20fE
s10n(input1)
.keepBase16Digits()
.toLowerCase().value; // 20fe
s10n(input1)
.keepHexDigits()
.toLowerCase().value; // 20fe
let input2 = " Some text \n Yet basically valid \n\n ";
s10n(input2).minimizeWhitespaces().value; // "Some text Yet basically valid"
s10n(input2)
.preserveLineBreaks()
.minimizeWhitespaces().value; // "Some text\nYet basically valid"
Semantic sanitizers implement semantically meaningful yet heavily opinionated sanitization rules for particular use cases.
keepOnlyEmailPopularCharset()
- keeps onlyA-Za-z0-9_@.-
keepOnlyEmailExtendedCharset()
- keeps onlyA-Za-z0-9_@.+)(-
keepOnlyEmailRfcCharset()
- keeps only charset as per rfc (A-Za-z0-9_\\-@.+)( \":;<>\\\\,\\[\\]}{!#$%&'*/=?^`|~
)keepUsername(whiteSpaceReplacement = "")
- keeps onlya-zA-Z0-9_-
, whitespaces are stripped or are merged and replaced withwhiteSpaceReplacement
if any specifiedkeepUsernameLC(whiteSpaceReplacement = "")
- same askeepUsername
but the result is converted to lower case
Examples:
let input = " UsEr #$%\"' NaMe + (5_6-9) @ .Co.Uk ";
s10n(input).keepOnlyEmailPopularCharset().value; // "UsErNaMe5_6-9@.Co.Uk"
s10n(input).keepOnlyEmailExtendedCharset().value; // "UsErNaMe+(5_6-9)@.Co.Uk"
s10n(input).keepOnlyEmailRfcCharset().value; // "UsEr#$%\"'NaMe+(5_6-9)@.Co.Uk"
s10n(input)
.keepOnlyEmailPopularCharset()
.toLowerCase().value; // "username5_6-9@.co.uk"
s10n(input).keepUsername().value; // "UsErNaMe5_6-9CoUk"
s10n(input).keepUsernameLC().value; // "username5_6-9couk"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9_CoUk"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9_couk"
Note: sanitized email input is still invalid but (arguably) yet easier to double-check and fix.
What if those semantic sanitizers do not fit my needs? Consider implementing a customized transformer.
A custom transformer is a method to apply complex sanitization logic using elementary or compound transformers, semantic sanitizers or applying a completely unique rule set.
apply(callback, ...arguments)
- callback will receive current value, calling context (reference to current s10n object asthis
), and any extra arguments passedextend(methodName, method)
- registers a re-usable custom transformation methodextend
should be called ons10n
object itself rather than in a sanitization chain- the method is accessible at every sanitization chain once registered
- the method should transform
this.value
and/or call other built-in or registered custom transformers/sanitizers - the method should return
this
to make it chainable - do not define the method as an arrow function
Example:
s10n("c00l").apply(
(value, context, needle, replacement) => value.replace(context._regexp(needle), replacement),
"0",
"o"
).value; // "cool"
s10n.extend("makeCool", function() {
// replaces 'o' and 'O' followed with whitespaces (extended set) with a single '0'
this.replace(this._regexp("o\\s+", "gi"), "0");
return this;
});
s10n("coO\x0A o\t l").makeCool().value; // "co00l"
Getting sanitized value (as a string)
is as simple as terminating
transformation chain with .value
.
E.g. s10n(" my User Name ").usernameLC().value
.
In string context .value
is optional as a string
is being returned by default.
E.g. `Username: ${s10n(" my User Name ").usernameLC()}`
or s10n(" my User Name ").usernameLC() + ''
Explicit value access methods:
value
- value as istoString()
- same as.value
toNumber()
- converts sanitized string into a Number. Use with caution as it will returnNaN
if sanitized string contains anything else but a valid Number literal.
Examples:
let input = "65";
s10n(input).value; // "65"
s10n(input).toString(); // "65"
`${s10n(input)}`; // "65"
s10n(input) + ""; // "65"
s10n(input).toNumber(); // 65
_regexp(patternString, flags = "gu")
- using this utility will ensure that\s
entities in pattern string are replaced with an extended set of whitespaces. Recommended for use inapply
callback.
Example:
s10n("\t \xA0 ABC\n\t \uFEFF").apply((value, context) =>
// replaces extended set of whitespaces with dashes
value.replace(context._regexp("\\s"), "-")
).value; // "----ABC----"
Refer to CONTRIBUTING.md for details.