s10n

s10n stands for "sanitization". Just like l10n stands for "localization". See also i18n, l10n et al

A library to make basic user input sanitization and subsequent validation an easier job.

Use cases

Sanitization is NOT validation, but it can help make validation an easier job and/or help to suggest to a user an input variation that better matches input expectations or requirements.

As with validation sanitization, if in place, should be applied on both frontend and backend, since a user can bypass sanitization and validation on the frontend and send input directly to a backend endpoint.

Example 1. Username

Let's assume the following scenario of a username input.

The rule is that only a-z, A-Z, numbers, underscore and dash are only expected in valid input.

A user submits a string of #UsEr #$%"' NaMe 5_6-9. Input gets invalidated, the rule gets presented to the user, and the user expected to remove all invalid characters. The input then becomes a valid string of UsErNaMe5_6-9.

Alternatively an app might have suggested (or enforced) a valid input. Examples below are demonstration of default and tuned behaviour of a relevant semantic sanitizer (spaces get replaced with underscores).

let input = "  UsEr #$%' NaMe  5_6-9  ";
s10n(input).keepUsername().value; // "UsErNaMe5_6-9"
s10n(input).keepUsernameLC().value; // "username5_6-9"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9"

Semantic sanitizers applied are a combination of elementary and compound transformers with an optional parameter to replace spaces (in this particular use case).

Example 2. Arbitrary text

Let's assume the input received from a user is " \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r "

Here are some issues worth attention and optimization:

it contains problematic whitespaces
it contains sequences of 2 or more whitespaces
it contains leading and trailing whitespaces
there is a variety of line break characters, potentially hazardous (CRLF injection)
there are leading and trailing empty lines
line break characters are invalid in a one line input

Any of the above can be considered as some unnecessarily contaminated data.

Having all issues fixed the above input would have been:

"Sensible text\nLine 2" for a multiline input
"Sensible text Line 2" for a simple string input

let input = "  \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible   text  \n Line 2  \n\r\n\r\r  ";

s10n(input).minimizeWhitespaces().value; // "Sensible text Line 2"

s10n(input)
  .preserveLineBreaks() // modifier for subsequent methods to preserve line breaks
  .minimizeWhitespaces().value; // "Sensible text\nLine 2"

minimizeWhitespaces does the following:

normalizes line break characters, i.e. CRLF (\n\r) and individual CR (\r) are converted into LF (\n) (default behaviour)
normalizes whitespaces into standard space character (\x20)
merges continuous whitespaces into a single space character
normalizes lines in a multiline input (strips leading and trailing spaces in each line of a multiline input)
trims leading and trailing whitespaces
trims leading and trailing line breaks

Explore sandbox for more use cases.

[ ^^ Back to TOC ^^ ]

Installation and Usage

Option1. Install as a project dependency

Run npm i s10n to add s10n as a dependency to your project.

In your app import s10n by either of the methods:

const s10n = require("s10n") -- node style
import s10n from "s10n" -- module import style

Option 2. Link directly to the html file

Pick an appropriate version on jsdelivr CDN and add to the html file. Example:

<script src="https://cdn.jsdelivr.net/npm/s10n@latest/dist/s10n.min.js"></script>

Usage

Check the examples across this documentation for the use cases.

Use sandbox to play around.

[ ^^ Back to TOC ^^ ]

API

s10n offers a number of elementary, compound and semantic transformers and sanitizers as well as a method to apply an arbitrary sanitizer.

Below are the usage examples to give a general impression of the API.

s10n("  Some text  \n Yet basically valid \n\n  ")
  .preserveLineBreaks()
  .minimizeWhitespaces().value; // "Some text\nYet basically valid"

s10n("  My User Name  ").keepUsernameLC("_").value; // "my_user_name"

let input = "  Some   arbitrary \t \xA0 text  ";

s10n(input)
  .normalizeWhitespaces()
  .trim().value; // "Some   arbitrary     text"

s10n(input)
  .mergeWhitespaces()
  .trim().value; // "Some arbitrary text"

[ ^^ Back to TOC ^^ ]

Modifiers

Modifiers affect behaviour of subsequent transformers.

Treating line break characters

Defines whether to preserve or disregard line break characters when applying transformers.

Default behaviour is to disregard line break characters. This setting doesn't affect some transformers (e.g. trimLineBreaks()). These are marked correspondingly.

let input = " \n\n\n ";
s10n(input).trim().value; // ""
s10n(input)
  .preserveLineBreaks()
  .trim().value; // "\n\n\n"

Call disregardLineBreaks when subsequent sanitizers should disregard line breaks after any preceding transformations has been affected by preserveLineBreaks.

Line break character

By default, whenever any sanitizer affects line break characters a \n is considered as a valid or target line break character.

This behaviour can be changed for subsequent sanitizers (e.g. setLineBreakCharacter('\r')). Whenever line breaks in a string get normalized CRLF (\r\n) is converted into a single line break character (\n by default, or a value assigned by setLineBreakCharacter method).

let input = "\r\n\n\n\r\r";

s10n(input).normalizeLineBreaks().value; // "\n\n\n\n\n"

s10n(input)
  .setLineBreakCharacter("\r")
  .normalizeLineBreaks().value; // "\r\r\r\r\r"

[ ^^ Back to TOC ^^ ]

Elementary transformers

Elementary transformers have a pretty limited scope of responsibility. Normally used for basic transformations and as building blocks by compound transformers, semantic sanitizers and custom transformers/sanitizers.

Transform whitespaces

s10n treats an extended set of characters, including \x20\u200B\u200C\u200D\u2060\uFEFF\xA0 as whitespaces. Characters \n and \r are not considered whitespaces when preserveLineBreaks modifier applied.

trim() - removes leading and trailing whitespaces
trimLineBreaks() - always removes leading and trailing line break characters, disregarding the LineBreak modifier setting
mergeLineBreaks() - normalizes and merges consequent line breaks disregarding the LineBreak modifier setting
normalizeWhitespaces() - all whitespaces are converted into space characters (\x20)
mergeWhitespaces() - merges continuous clusters of whitespaces into a single space character (\x20)
stripWhitespaces() - strips all whitespaces from input

Examples:

let input = "\n  Z\tY \x0A \n X W\uFEFFV \n\n  \n";

s10n(input).trim().value; // "Z\tY \x0A \n X W\uFEFFV"
s10n(input)
  .preserveLineBreaks()
  .trim().value; // <same as input>

s10n(input).trimLineBreaks().value; // "  Z\tY \x0A \n X W\uFEFFV \n\n  "
s10n(input)
  .preserveLineBreaks()
  .trimLineBreaks().value; // <same as with disregardNLineBreaks>

s10n("\n\r\r\r\n\nfoo\n\r\nbar\n\r\r\r\n\n").mergeLineBreaks().value; // "\nfoo\nbar\n"
s10n("\n \r\r\r\n\nfoo\n \r\nbar\n\r\r \r\n\n").mergeLineBreaks().value; // "\n \nfoo\n \nbar\n \n"

s10n(input).normalizeWhitespaces().value; // "   Z Y     X W V      "
s10n(input)
  .preserveLineBreaks()
  .normalizeWhitespaces().value; // "\n  Z Y   \n X W V \n\n  \n"

s10n(input).mergeWhitespaces().value; // " Z Y X W V "
s10n(input)
  .preserveLineBreaks()
  .mergeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"

s10n(input).stripWhitespaces().value; // "ZYXWV"
s10n(input)
  .preserveLineBreaks()
  .stripWhitespaces()
  .value(); // "\nZY\nXWV\n\n\n"

s10n(input)
  .preserveLineBreaks()
  .normalizeWhitespaces()
  .mergeWhitespaces()
  .trimLineBreaks()
  .trim().value; // "Z Y \n X W V \n\n"

Handle line breaks

normalizeLineBreaks(lineBreakCharacter = undefined) - transforms CRLF, CR, LF into a line break character defined following the rules below:
- as specified by lineBreakCharacter argument
- if param lineBreakCharacter is undefined, then as set by setLineBreakCharacter()
- if setLineBreakCharacter() wasn't applied, then defaults to LF ('\n')
normalizeMultiline() - strips whitespaces that immediately precede or follow line break characters; ignores LineBreak modifier setting

Examples:

let input = "\r\n\r  abc  \r\n  def   \r \t   ghi   \n \t\t  \n \r\n\n\r\n\n\r\r  \r\r\n";
s10n(input).normalizeLineBreaks().value; // "\n\n  abc  \n  def   \n \t   ghi   \n \t\t  \n \n\n\n\n\n\n  \n\n"
s10n(input).normalizeMultiline().value; // "\r\n\rabc\r\ndef\rghi\n\n\r\n\n\r\n\n\r\r\r\r\n"
s10n(input)
  .normalizeLineBreaks()
  .normalizeMultiline().value; // "\n\nabc\ndef\nghi\n\n\n\n\n\n\n\n\n\n"

Keep/Remove/Replace

These methods' behaviour is NOT affected by LineBreak modifier (disregarded by default, i.e. \s RegExp token comprises \r and \n). Specify \n and/or \r explicitly whenever those should be kept or removed.

Method argument should follow RegExp character class specification.

keepOnlyCharset(allowedChars = "-A-Za-z0-9_\\x20.,}{\\]\\[)(", regexpFlags) - keep listed characters only
keepOnlyRegExp(regexp, regexpFlags) - keep characters as per RegExp (RegExp object or regexp body as a string)
remove(disallowedChars, regexpFlags) - remove listed characters
replace(needle, replacement = "", regexpFlags) - replaces a needle (which is a string, or a RegExp object) with the replacement string

regexpFlags in the methods above is an optional parameter and defaults to the flags as specified in _regexp ("gu").

Examples:

let input1 = "ABCDabcd01239 _-.,(abcd){defg}[hijk]";
s10n(input1).keepOnlyCharset("}{][)(").value; // "(){}[]"
s10n(input1).keepOnlyRegexp(/\{.*?\}|\[.*?\]|\(.*?\)/gu).value; // "(abcd){defg}[hijk]"

let input2 = "ABCDEFGHabcdefghABCDEFGHabcdefgh";
s10n(input2).remove("ABCD").value; // "EFGHabcdefghEFGHabcdefgh"
s10n(input2).remove("ABCD", "giu").value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/).value; // "EFGHabcdefghABCDEFGHabcdefgh"
s10n(input2).remove(/ABCD/giu).value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/, "giu").value; // "EFGHefghEFGHefgh"

[ ^^ Back to TOC ^^ ]

Other transformations

toLowerCase() - converts to lower case
toUpperCase() - converts to upper case

Examples:

let input = "aBcD01";
s10n(input).toLowerCase().value; // "abcd01"
s10n(input).toUpperCase().value; // "ABCD01"

[ ^^ Back to TOC ^^ ]

Compound transformers

Compound transformers implement complex transformation rules applying multiple transformations, often using elementary transformers.

keepBase10Digits() - strips out anything but 0-9
keepBase16Digits() - (alias: keepHexDigits()) - strips out anything but 0-9a-fA-F; best combined chained with toLowerCase() or toUpperCase() for consistent result
minimizeWhitespaces() - removes leading, trailing and continuous clusters of whitespaces and line breaks; when preceded with preserveLineBreaks() treats input as a multiline string and thus trims spaces in every line

Examples:

let input1 = "  XYZ 20fE\n\n  ";

s10n(input1).keepBase10Digits().value; // 20
s10n(input1).keepBase16Digits().value; // 20fE
s10n(input1)
  .keepBase16Digits()
  .toLowerCase().value; // 20fe
s10n(input1)
  .keepHexDigits()
  .toLowerCase().value; // 20fe

let input2 = "  Some text  \n Yet basically valid \n\n  ";

s10n(input2).minimizeWhitespaces().value; // "Some text Yet basically valid"

s10n(input2)
  .preserveLineBreaks()
  .minimizeWhitespaces().value; // "Some text\nYet basically valid"

[ ^^ Back to TOC ^^ ]

Semantic sanitizers

Semantic sanitizers implement semantically meaningful yet heavily opinionated sanitization rules for particular use cases.

keepOnlyEmailPopularCharset() - keeps only A-Za-z0-9_@.-
keepOnlyEmailExtendedCharset() - keeps only A-Za-z0-9_@.+)(-
keepOnlyEmailRfcCharset() - keeps only charset as per rfc ( A-Za-z0-9_\\-@.+)( \":;<>\\\\,\\[\\]}{!#$%&'*/=?^`|~)
keepUsername(whiteSpaceReplacement = "") - keeps only a-zA-Z0-9_-, whitespaces are stripped or are merged and replaced with whiteSpaceReplacement if any specified
keepUsernameLC(whiteSpaceReplacement = "") - same as keepUsername but the result is converted to lower case

Examples:

let input = "  UsEr   #$%\"' NaMe +  (5_6-9) @ .Co.Uk  ";

s10n(input).keepOnlyEmailPopularCharset().value; // "UsErNaMe5_6-9@.Co.Uk"
s10n(input).keepOnlyEmailExtendedCharset().value; // "UsErNaMe+(5_6-9)@.Co.Uk"
s10n(input).keepOnlyEmailRfcCharset().value; // "UsEr#$%\"'NaMe+(5_6-9)@.Co.Uk"
s10n(input)
  .keepOnlyEmailPopularCharset()
  .toLowerCase().value; // "username5_6-9@.co.uk"

s10n(input).keepUsername().value; // "UsErNaMe5_6-9CoUk"
s10n(input).keepUsernameLC().value; // "username5_6-9couk"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9_CoUk"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9_couk"

Note: sanitized email input is still invalid but (arguably) yet easier to double-check and fix.

What if those semantic sanitizers do not fit my needs? Consider implementing a customized transformer.

[ ^^ Back to TOC ^^ ]

Custom transformations

A custom transformer is a method to apply complex sanitization logic using elementary or compound transformers, semantic sanitizers or applying a completely unique rule set.

apply(callback, ...arguments) - callback will receive current value, calling context (reference to current s10n object as this), and any extra arguments passed
extend(methodName, method) - registers a re-usable custom transformation method
- extend should be called on s10n object itself rather than in a sanitization chain
- the method is accessible at every sanitization chain once registered
- the method should transform this.value and/or call other built-in or registered custom transformers/sanitizers
- the method should return this to make it chainable
- do not define the method as an arrow function

Example:

s10n("c00l").apply(
  (value, context, needle, replacement) => value.replace(context._regexp(needle), replacement),
  "0",
  "o"
).value; // "cool"

s10n.extend("makeCool", function() {
  // replaces 'o' and 'O' followed with whitespaces (extended set) with a single '0'
  this.replace(this._regexp("o\\s+", "gi"), "0");
  return this;
});
s10n("coO\x0A o\t l").makeCool().value; // "co00l"

[ ^^ Back to TOC ^^ ]

Getting sanitized value

Getting sanitized value (as a string) is as simple as terminating transformation chain with .value. E.g. s10n(" my User Name ").usernameLC().value. In string context .value is optional as a string is being returned by default. E.g. `Username: ${s10n(" my User Name ").usernameLC()}` or s10n(" my User Name ").usernameLC() + ''

Explicit value access methods:

value - value as is
toString() - same as .value
toNumber() - converts sanitized string into a Number. Use with caution as it will return NaN if sanitized string contains anything else but a valid Number literal.

Examples:

let input = "65";
s10n(input).value; // "65"
s10n(input).toString(); // "65"
`${s10n(input)}`; // "65"
s10n(input) + ""; // "65"
s10n(input).toNumber(); // 65

[ ^^ Back to TOC ^^ ]

Utility methods

_regexp(patternString, flags = "gu") - using this utility will ensure that \s entities in pattern string are replaced with an extended set of whitespaces. Recommended for use in apply callback.

Example:

s10n("\t \xA0 ABC\n\t \uFEFF").apply((value, context) =>
  // replaces extended set of whitespaces with dashes
  value.replace(context._regexp("\\s"), "-")
).value; // "----ABC----"

[ ^^ Back to TOC ^^ ]

Development and Publishing

Refer to CONTRIBUTING.md for details.

[ ^^ Back to TOC ^^ ]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
sandbox		sandbox
src		src
.babelrc		.babelrc
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
rollup.config.js		rollup.config.js
travis.yml		travis.yml
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

s10n

Table of Contents

Use cases

Example 1. Username

Example 2. Arbitrary text

Installation and Usage

Option1. Install as a project dependency

Option 2. Link directly to the html file

Usage

API

Modifiers

Treating line break characters

Line break character

Elementary transformers

Transform whitespaces

Handle line breaks

Keep/Remove/Replace

Other transformations

Compound transformers

Semantic sanitizers

Custom transformations

Getting sanitized value

Utility methods

Development and Publishing

About

Releases 1

Packages

Contributors 2

Languages

License

OleksiyRudenko/s10n

Folders and files

Latest commit

History

Repository files navigation

s10n

Table of Contents

Use cases

Example 1. Username

Example 2. Arbitrary text

Installation and Usage

Option1. Install as a project dependency

Option 2. Link directly to the html file

Usage

API

Modifiers

Treating line break characters

Line break character

Elementary transformers

Transform whitespaces

Handle line breaks

Keep/Remove/Replace

Other transformations

Compound transformers

Semantic sanitizers

Custom transformations

Getting sanitized value

Utility methods

Development and Publishing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages