diff --git a/README.md b/README.md index 0da6a6be..e440228d 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ JavaScript AST analysis. This package has been created to export the [Node-Secur The goal is to quickly identify dangerous code and patterns for developers and Security researchers. Interpreting the results of this tool will still require you to have a set of security notions. -> πŸ’– I have no particular background in security. I'm simply becoming more and more interested and passionate about static code analysis. But I would be more than happy to learn that my work can help prevent potential future attacks (or leaks). +> **Note** I have no particular background in security. I'm simply becoming more and more interested and passionate about static code analysis. But I would be more than happy to learn that my work can help prevent potential future attacks (or leaks). ## Goals The objective of the project is to successfully detect all potentially suspicious JavaScript codes.. The target is obviously codes that are added or injected for malicious purposes.. @@ -71,7 +71,7 @@ console.log(warnings); The analysis will return: `http` (in try), `crypto`, `util` and `fs`. -> ⚠️ There is also a lot of suspicious code example in the root cases directory. Feel free to try the tool on these files. +> **Warning** There is also a lot of suspicious code example in the `./examples` cases directory. Feel free to try the tool on these files. ## Warnings @@ -106,26 +106,24 @@ import * as i18n from "@nodesecure/i18n"; console.log(i18n.getToken(jsxray.warnings.parsingError.i18n)); ``` -## Warnings Legends (v2.0+) +## Warnings Legends -> Node-secure versions equal or lower than 0.7.0 are no longer compatible with the warnings table below. +> **Warning** versions of NodeSecure greather than v0.7.0 are no longer compatible with the warnings table below. -This section describe all the possible warnings returned by JSXRay. +This section describe all the possible warnings returned by JSXRay. Click on the warning **name** for additional information and examples. -| name | description | -| --- | --- | -| parsing-error | An error occured when parsing the JavaScript code with meriyah. It mean that the conversion from string to AST as failed. If you encounter such an error, **please open an issue here**. | -| unsafe-import | Unable to follow an import (require, require.resolve) statement/expr. | -| unsafe-regex | A RegEx as been detected as unsafe and may be used for a ReDoS Attack. | -| unsafe-stmt | Usage of dangerous statement like `eval()` or `Function("")`. | -| unsafe-assign | Assignment of a protected global like `process` or `require`. | -| encoded-literal | An encoded literal has been detected (it can be an hexa value, unicode sequence, base64 string etc) | -| short-identifiers | This mean that all identifiers has an average length below 1.5. Only possible if the file contains more than 5 identifiers. | -| suspicious-literal | This mean that the sum of suspicious score of all Literals is bigger than 3. | -| obfuscated-code (**experimental**) | There's a very high probability that the code is obfuscated... | -| weak-crypto (**experimental**) | The code probably contains a weak crypto algorithm ("md5...) | - -> πŸ‘€ More details on warnings and their implementations [here](./WARNINGS.md) +| name | experimental | description | +| --- | :-: | --- | +| [parsing-error](./docs/parsing-error.md) | ❌ | The AST parser throw an error | +| [unsafe-import](./docs/unsafe-import.md) | ❌ | Unable to follow an import (require, require.resolve) statement/expr. | +| [unsafe-regex](./docs/unsafe-regex.md) | ❌ | A RegEx as been detected as unsafe and may be used for a ReDoS Attack. | +| [unsafe-stmt](./docs//unsafe-stmt.md) | ❌ | Usage of dangerous statement like `eval()` or `Function("")`. | +| [unsafe-assign](./docs/unsafe-assign.md) | ❌ | Assignment of a protected global like `process` or `require`. | +| [encoded-literal](./docs/encoded-literal.md) | ❌ | An encoded literal has been detected (it can be an hexa value, unicode sequence or a base64 string) | +| [short-identifiers](./docs/short-identifiers.md) | ❌ | This mean that all identifiers has an average length below 1.5. | +| [suspicious-literal](./docs/suspicious-literal.md) | ❌ | A suspicious literal has been found in the source code. | +| [obfuscated-code](./docs/obfuscated-code.md) | βœ”οΈ | There's a very high probability that the code is obfuscated. | +| [weak-crypto](./docs/weak-crypto.md) | βœ”οΈ | The code probably contains a weak crypto algorithm (md5, sha1...) | ## API @@ -153,6 +151,32 @@ interface Report { +
+runASTAnalysisOnFile(pathToFile: string, options?: RuntimeFileOptions): Promise< ReportOnFile > + +```ts +interface RuntimeOptions { + module?: boolean; + isMinified?: boolean; +} +``` + +Run the SAST scanner on a given JavaScript file. + +```ts +export type ReportOnFile = { + ok: true, + warnings: Warning[]; + dependencies: ASTDeps; + isMinified: boolean; +} | { + ok: false, + warnings: Warning[]; +} +``` + +
+ ## Contributors ✨ diff --git a/WARNINGS.md b/WARNINGS.md deleted file mode 100644 index fa574b52..00000000 --- a/WARNINGS.md +++ /dev/null @@ -1,93 +0,0 @@ -# Warnings - -## Introduction -This document provides a more complete explanation of how certain warnings are currently implemented. Most of them are still at the experimentation stage and a lot of iteration will be necessary to make them accurate. - ---- - -## obfuscated-code -This new warning has been integrated in the v2.0 release of the package. A complete Google Drive document has been written to describe all patterns of obfuscation tools and way of detecting them. - -- [JSXRay - Patterns of obfuscated JavaScript code](https://docs.google.com/document/d/11ZrfW0bDQ-kd7Gr_Ixqyk8p3TGvxckmhFH3Z8dFoPhY/edit?usp=sharing) - -For the moment no implementation has been completely frozen. - -## unsafe-regex -JS-X-Ray use the npm package [safe-regex](https://github.com/davisjam/safe-regex) to checkup all Literal and RegEx Constructor. - -## unsafe-assign -The analysis traces the assignment of several global variables considered to be dangerous. They can often be used for malicious purposes and hide information from tools like ours. - -On Node.js we track the use of `require` and `process` (and particulary things like `process.mainModule.require`). With the example below the analysis will still be able to trace the use of require: - -```js -const b = process; -const c = b.mainModule; -c.require("http"); -``` - -## short-identifiers -The current analysis store in memory all Identifiers name. There are several sources: -- VariableDeclarator: `var boo;` -- AssignmentExpression: `boo = 5;` -- FunctionDeclaration: `function boo() {}` -- Property of ObjectExpression: `{ boo: 5 }` - -However we do not take into account the properties of objects for this warning. The warning is generated only if: - -- The file is not already declared as Minified. -- There is more than five identifiers. -- The sum of all identifiers name length is below 1.5. - -## encoded-literal -The analysis checks all the Literals in the tree and search for encoded values. JS-X-Ray currently supports three types of detection: -- Hexadecimal sequence: `'\x72\x4b\x58\x6e\x75\x65\x38\x3d'` -- Unicode sequence: `\u03B1` -- Base64 encryption: `z0PgB0O=` - -Hexadecimal and Unicode sequence are tested directly on the raw Literal provided by meriyah. For base64 detection we the npm package [is-base64](https://github.com/miguelmota/is-base64). - -JavaScript implementation: -```js -const hasHexadecimalSequence = /\\x[a-fA-F0-9]{2}/g.exec(node.raw) !== null; -const hasUnicodeSequence = /\\u[a-fA-F0-9]{4}/g.exec(node.raw) !== null; -const isBase64 = isStringBase64(node.value, { allowEmpty: false }); -``` - -## suspicious-literal -It's one of the most interesting warnings. I personally built it with the idea of detecting long strings of characters that are very common in malicious obfuscated/encrypted codes like in [smith-and-wesson-skimmer](https://badjs.org/posts/smith-and-wesson-skimmer/). - -The basic idea is to say that any string longer than 45 characters with no space is very suspicious... Then we establish a suspicion score that will be incremented according to several criteria: - -- if the string contains space in the first 45 characters then we set the score to zero, else we set the score to one. -- if the string has more than 200 characters then we add 1 to the score. -- we add one to the score for each 750 characters. So a length of 1600 will add two to the score. -- we add two point to the score if the string contains more than 70 unique characters. - -So it's possible for a string with more than 45 characters to come out with a score of zero if: -- there is space in the first 45 characters of the string. -- less than 70 unique characters. - -JavaScript implementation: -```js -function strCharDiversity(str) { - return new Set([...str]).size; -} - -function strSuspectScore(str) { - if (str.length < 45) { - return 0; - } - - const includeSpace = str.includes(" "); - const includeSpaceAtStart = includeSpace ? str.slice(0, 45).includes(" ") : false; - let suspectScore = includeSpaceAtStart ? 0 : 1; - if (str.length > 200) { - suspectScore += Math.floor(str.length / 750); - } - - return strCharDiversity(str) >= 70 ? suspectScore + 2 : suspectScore; -} -``` - -The warning is generated only if the sum of all scores exceeds three. diff --git a/docs/encoded-literal.md b/docs/encoded-literal.md new file mode 100644 index 00000000..c608dca5 --- /dev/null +++ b/docs/encoded-literal.md @@ -0,0 +1,21 @@ +# Encoded literal + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| encoded-literal | `Information` | `sast_warnings.encoded_literal` | ❌ | + +## Introduction + +The SAST scanner assert all Literals in the tree and search for encoded values. JS-X-Ray currently supports three types of detection: +- Hexadecimal sequence: `'\x72\x4b\x58\x6e\x75\x65\x38\x3d'` +- Unicode sequence: `\u03B1` +- Base64 encryption: `z0PgB0O=` + +Hexadecimal and Unicode sequence are tested directly on the raw Literal provided by meriyah. For base64 detection we use the npm package [is-base64](https://github.com/miguelmota/is-base64). + +Example of a JavaScript implementation: +```js +const hasHexadecimalSequence = /\\x[a-fA-F0-9]{2}/g.exec(node.raw) !== null; +const hasUnicodeSequence = /\\u[a-fA-F0-9]{4}/g.exec(node.raw) !== null; +const isBase64 = isStringBase64(node.value, { allowEmpty: false }); +``` diff --git a/docs/obfuscated-code.md b/docs/obfuscated-code.md new file mode 100644 index 00000000..41a2d5bf --- /dev/null +++ b/docs/obfuscated-code.md @@ -0,0 +1,80 @@ +# Obfuscated code + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| obfuscated-code | `Critical` | `sast_warnings.obfuscated_code` | βœ”οΈ | + +## Introduction + +An **experimental** warning capable of detecting obfuscation and sometimes the tool used. The scanner is capable to detect: + +- [freejsobfuscator](http://www.freejsobfuscator.com/) +- [jjencode](https://utf-8.jp/public/jjencode.html) +- [jsfuck](http://www.jsfuck.com/) +- [obfuscator.io](https://obfuscator.io/) +- morse +- [trojan source](https://trojansource.codes/) + +Example of obfuscated code is in the root `examples` directory. + +### Technical note +A complete G.Drive document has been written to describe the patterns of obfuscation tools and some way of detecting them: + +- [JSXRay - Patterns of obfuscated JavaScript code](https://docs.google.com/document/d/11ZrfW0bDQ-kd7Gr_Ixqyk8p3TGvxckmhFH3Z8dFoPhY/edit?usp=sharing) + +> **Note** There is no frozen implementation and this is an early implementation + +## Example + +The following code uses Morse code to obfuscate its real intent. This was used in an attack and I find it quite funny so i implemented morse detection πŸ˜‚. + +```js +function decodeMorse(morseCode) { + var ref = { + '.-': 'a', + '-...': 'b', + '-.-.': 'c', + '-..': 'd', + '.': 'e', + '..-.': 'f', + '--.': 'g', + '....': 'h', + '..': 'i', + '.---': 'j', + '-.-': 'k', + '.-..': 'l', + '--': 'm', + '-.': 'n', + '---': 'o', + '.--.': 'p', + '--.-': 'q', + '.-.': 'r', + '...': 's', + '-': 't', + '..-': 'u', + '...-': 'v', + '.--': 'w', + '-..-': 'x', + '-.--': 'y', + '--..': 'z', + '.----': '1', + '..---': '2', + '...--': '3', + '....-': '4', + '.....': '5', + '-....': '6', + '--...': '7', + '---..': '8', + '----.': '9', + '-----': '0', + }; + + return morseCode + .split(' ') + .map(a => a.split(' ').map(b => ref[b]).join('')) + .join(' '); +} + +var decoded = decodeMorse(".-- --- .-. -.. .-- --- .-. -.."); +console.log(decoded); +``` diff --git a/docs/parsing-error.md b/docs/parsing-error.md new file mode 100644 index 00000000..5c511e8d --- /dev/null +++ b/docs/parsing-error.md @@ -0,0 +1,22 @@ +# Parsing Error + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| ast-error | `Information` | `sast_warnings.ast_error` | ❌ | + +## Introduction + +Parsing Error is throw when the library [meriyah](https://github.com/meriyah/meriyah) fail to parse the javascript source code into an AST. But it can also happen when the AST analysis fails because we don't manage a case properly. + +> **Note** If you are in the second case, please open an issue [here](https://github.com/NodeSecure/js-x-ray/issues) + +## Example + +```json +{ + "kind": "parsing-error", + "value": "[10:30]: Unexpected token: ','", + "location": [[0,0],[0,0]], + "file": "helpers\\asyncIterator.js" +} +``` diff --git a/docs/short-identifiers.md b/docs/short-identifiers.md new file mode 100644 index 00000000..eb9ee509 --- /dev/null +++ b/docs/short-identifiers.md @@ -0,0 +1,30 @@ +# Short identifiers + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| short-identifiers | `Warning` | `sast_warnings.short_identifiers` | ❌ | + +## Introduction + +The SAST store in memory all Identifiers id so we are able later to sum the length of all ids. We are looking at several ESTree Node in the tree: +- VariableDeclarator: `var boo;` +- AssignmentExpression: `(boo = 5)` +- FunctionDeclaration: `function boo() {}` +- Property of ObjectExpression: `{ boo: 5 }` + +However, we do not take into consideration the properties of Objects for this warning. The warning is generated only if: + +- The file is not already declared as **Minified**. +- There is more than **five** identifiers. +- The sum of all identifiers name length is below `1.5`. + +## Example + +```json +{ + "kind": "short-identifiers", + "location": [[0,0], [0,0]], + "value": 1.5, + "file": "lib\\compile-env.js" +} +``` diff --git a/docs/suspicious-literal.md b/docs/suspicious-literal.md new file mode 100644 index 00000000..d1380128 --- /dev/null +++ b/docs/suspicious-literal.md @@ -0,0 +1,55 @@ +# Suspicious literal + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| suspicious-literal | `Warning` | `sast_warnings.suspicious_literal` | ❌ | + +## Introduction + +Thats one of the most interesting JS-X-Ray warning. We designed it with the idea of detecting long strings of characters that are very common in malicious obfuscated/encrypted codes like in [smith-and-wesson-skimmer](https://badjs.org/posts/smith-and-wesson-skimmer/). + +The basic idea is to say that any string longer than 45 characters with no space is very suspicious... Then we establish a suspicion score that will be incremented according to several criteria: + +- if the string contains **space** in the first **45** characters then we set the score to `zero`, else we set the score to `one`. +- if the string has more than **200 characters** then we add `1` to the score. +- we add one to the score for each **750 characters**. So a length of __1600__ will add `two` to the score. +- we add `two` point to the score if the string contains more than **70 unique characters**. + +So it's possible for a string with more than 45 characters to come out with a score of zero if: +- there is space in the first 45 characters of the string. +- less than 70 unique characters. + +The implementation is done in the [@nodesecure/sec-literal](https://github.com/NodeSecure/sec-literal/blob/main/src/utils.js) package and look like this: +```js +function stringCharDiversity(str, charsToExclude = []) { + const data = new Set(str); + charsToExclude.forEach((char) => data.delete(char)); + + return data.size; +} + +// --- +const kMaxSafeStringLen = 45; +const kMaxSafeStringCharDiversity = 70; +const kMinUnsafeStringLenThreshold = 200; +const kScoreStringLengthThreshold = 750; + +function stringSuspicionScore(str) { + const strLen = stringWidth(str); + if (strLen < kMaxSafeStringLen) { + return 0; + } + + const includeSpace = str.includes(" "); + const includeSpaceAtStart = includeSpace ? str.slice(0, kMaxSafeStringLen).includes(" ") : false; + + let suspectScore = includeSpaceAtStart ? 0 : 1; + if (strLen > kMinUnsafeStringLenThreshold) { + suspectScore += Math.ceil(strLen / kScoreStringLengthThreshold); + } + + return stringCharDiversity(str) >= kMaxSafeStringCharDiversity ? suspectScore + 2 : suspectScore; +} +``` + +> **Note** The warning is generated only if the sum of all scores exceeds **three**. diff --git a/docs/unsafe-assign.md b/docs/unsafe-assign.md new file mode 100644 index 00000000..be213ae9 --- /dev/null +++ b/docs/unsafe-assign.md @@ -0,0 +1,19 @@ +# Unsafe Assignment + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| unsafe-assign | `Warning` | `sast_warnings.unsafe_assign` | ❌ | + +## Introduction + +The SAST scanner traces the assignment of several global variables considered to be dangerous. They can often be used for malicious purposes and hide information from tools like ours. + +On Node.js we track the use of `require` and `process` (and particulary things like `process.mainModule.require`). With the example below the analysis will still be able to trace the use of require: + +```js +const b = process; +const c = b.mainModule; +c.require("http"); +``` + +> **Note** We may remove this warning in future release (it generate to much noise for almost no additional value). diff --git a/docs/unsafe-import.md b/docs/unsafe-import.md new file mode 100644 index 00000000..ec1a3005 --- /dev/null +++ b/docs/unsafe-import.md @@ -0,0 +1,38 @@ +# Unsafe Import + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| unsafe-import | `Warning` | `sast_warnings.unsafe_import` | ❌ | + +## Introduction + +On JS-X-Ray we intensively track the use of `require` CallExpression and also ESM Import declarations. Knowing the dependencies used is really important for our analysis and that why when the SAST fail to follow an important it will throw an `unsafe-import` warning. + +> **Note** Sometimes we trigger this warning on purpose because we have detected a malicious import + +### CJS Note +We analyze and trace several ways to require in Node.js (with CJS): +- require +- require.main.require +- require.mainModule.require +- require.resolve + +## Example + +The code below try to require Node.js core dependency `http`. JS-X-Ray sucessfully detect it and throw an `unsafe-import` warning. + +```js +function unhex(r) { + return Buffer.from(r, "hex").toString(); +} + +const g = Function("return this")(); +const p = g["pro" + "cess"]; + +// Hex 72657175697265 -> require +const evil = p["mainMod" + "ule"][unhex("72657175697265")]; + +// Hex 68747470 -> http +evil(unhex("68747470")).request +``` + diff --git a/docs/unsafe-regex.md b/docs/unsafe-regex.md new file mode 100644 index 00000000..a0807c87 --- /dev/null +++ b/docs/unsafe-regex.md @@ -0,0 +1,33 @@ +# Unsafe Import + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| unsafe-regex | `Warning` | `sast_warnings.unsafe_regex` | ❌ | + +## Introduction + +This warning has been designed to detect and report any regular expressions (regexes) that could lead to a catastrophic backtracking. This can be used by an attacker to drastically reduce the performance of your application. We often call this kind of attack REDOS. + +Learn more: +- [How a RegEx can bring your Node.js service down](https://lirantal.medium.com/node-js-pitfalls-how-a-regex-can-bring-your-system-down-cbf1dc6c4e02) +- [An additional non-backtracking RegExp engine](https://v8.dev/blog/non-backtracking-regexp) +- [The Impact of Regular Expression Denial of Service (ReDoS) in Practice](https://infosecwriteups.com/introduction-987fdc4c7b0) +- [Why Aren’t Regexes a Lingua Franca?](https://davisjam.medium.com/why-arent-regexes-a-lingua-franca-esecfse19-a36348df3a2) +- [Comparing regex matching algorithms](https://swtch.com/~rsc/regexp/regexp1.html) + +> **Note** credit goes to the `safe-regex` package author for the last three resources. + +### Technical implementation + +Under the hood the package [safe-regex](https://github.com/davisjam/safe-regex) is used to assert all **RegExpLiteral** and RegEx Constructor (eg `new RegEx()`). + +## Example + +```json +{ + "kind": "unsafe-regex", + "location": [[286,18],[286,65]], + "value": "^node_modules\\/(@[^/]+\\/?[^/]+|[^/]+)(\\/.*)?$", + "file": "index.js" +} +``` diff --git a/docs/unsafe-stmt.md b/docs/unsafe-stmt.md new file mode 100644 index 00000000..29d944e5 --- /dev/null +++ b/docs/unsafe-stmt.md @@ -0,0 +1,31 @@ +# Unsafe Statement + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| unsafe-stmt | `Warning` | `sast_warnings.unsafe_stmt` | ❌ | + +## Introduction + +Warning to notify of the usage of `eval()` or `Function()` in the source code. Their use is not recommended and can be used to execute insecure code (for example to retrieve the `globalThis` / `window` object). + +- [MDN - Never use eval()!](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval#never_use_eval!) + +## Example + +The warning **value** can be either `Function` or `eval`. + +```json +{ + "kind": "unsafe-stmt", + "location": [[49,37],[49,62]], + "value": "Function", + "file": "index.js" +} +``` + +Example of a dangerous code that an attacker may use: +```js +const xxx = Function("return this")(); +// xxx is equal to globalThis +console.log(xxx); +``` diff --git a/docs/weak-crypto.md b/docs/weak-crypto.md new file mode 100644 index 00000000..2cc450f2 --- /dev/null +++ b/docs/weak-crypto.md @@ -0,0 +1,23 @@ +# Weak crypto + +| Code | Severity | i18n | Experimental | +| --- | --- | --- | :-: | +| weak-crypto | `Information` | `sast_warnings.weak_crypto` | βœ”οΈ | + +## Introduction + +Detect usage of weak crypto algorithm with the Node.js core `Crypto` dependency. Algorithm considered to be weak are: + +- md5 +- md4 +- md2 +- sha1 +- ripemd160 + +## Example + +```js +import crypto from "crypto"; + +crypto.createHash("md5"); +``` diff --git a/cases/event-stream.js b/examples/event-stream.js similarity index 100% rename from cases/event-stream.js rename to examples/event-stream.js diff --git a/cases/forbes-skimmer.js b/examples/forbes-skimmer.js similarity index 100% rename from cases/forbes-skimmer.js rename to examples/forbes-skimmer.js diff --git a/cases/jscrush.js b/examples/jscrush.js similarity index 100% rename from cases/jscrush.js rename to examples/jscrush.js diff --git a/cases/kopiluwak.js b/examples/kopiluwak.js similarity index 100% rename from cases/kopiluwak.js rename to examples/kopiluwak.js diff --git a/cases/modrrnize.js b/examples/modrrnize.js similarity index 100% rename from cases/modrrnize.js rename to examples/modrrnize.js diff --git a/cases/npm-audit.js b/examples/npm-audit.js similarity index 100% rename from cases/npm-audit.js rename to examples/npm-audit.js diff --git a/cases/obfuscate.js b/examples/obfuscate.js similarity index 100% rename from cases/obfuscate.js rename to examples/obfuscate.js diff --git a/cases/rate-map.js b/examples/rate-map.js similarity index 100% rename from cases/rate-map.js rename to examples/rate-map.js diff --git a/cases/smith.js b/examples/smith.js similarity index 100% rename from cases/smith.js rename to examples/smith.js diff --git a/index.js b/index.js index 2edb0eeb..a872fff8 100644 --- a/index.js +++ b/index.js @@ -127,7 +127,8 @@ export const warnings = Object.freeze({ obfuscatedCode: { code: "obfuscated-code", i18n: "sast_warnings.obfuscated_code", - severity: "Critical" + severity: "Critical", + experimental: true }, weakCrypto: { code: "weak-crypto",