Skip to content

Commit

Permalink
Working initial version supporting arbitrary nested and back-to-back …
Browse files Browse the repository at this point in the history
…wildcard syntax.

QUnit tests demonstrate usage.
  • Loading branch information
swestwood committed Jul 11, 2013
1 parent 6eeff90 commit d6218a8
Show file tree
Hide file tree
Showing 6 changed files with 616 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
node_modules
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**structured.js** is a Javascript library that provides a simple interface for static analysis of Javascript code, backed by the abstract syntax tree generated by Esprima. Structured.js works in-browser (`<script src='structured.js'></script>`), or as a standalone npm module.
**structured.js** is a Javascript library that provides a simple interface for static analysis of Javascript code, backed by the abstract syntax tree generated by Esprima. Structured.js works in-browser `<script src='structured.js'></script>`, or as a standalone npm module.

### Examples

Expand All @@ -12,3 +12,8 @@
var result = Structured.match(structure, code); // true

Check out the test suite for more.


### Tests

Run structured.js tests with `npm test`
32 changes: 32 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"name": "structured",
"version": "0.1.0",
"description": "Simple interface for checking structure of JS code against a template, backed by Esprima.",
"main": "structured.js",
"scripts": {
"test": "node testrunner"
},
"repository": {
"type": "git",
"url": "git://github.com/Khan/structuredjs.git"
},
"keywords": [
"parsing",
"analysis",
"ast",
"checker",
"structure"
],
"author": "swestwood",
"license": "BSD",
"bugs": {
"url": "https://github.com/Khan/structuredjs/issues"
},
"dependencies": {
"esprima": "~1.0.3",
"underscore": "~1.5.1"
},
"devDependencies": {
"qunit": "~0.5.16"
}
}
214 changes: 214 additions & 0 deletions structured.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
/*
* StructuredJS provides an API for static analysis of code based on an abstract
* syntax tree generated by Esprima (compliant with the Mozilla Parser
* API at https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API).
*
* Dependencies: esprima.js, underscore.js
*/

/* Detect npm versus browser usage */
var exports;
if (typeof module !== "undefined" && module.exports) {
exports = module.exports = {};
var esprima = require("esprima");
var _ = require("underscore");
} else {
exports = this.Structured = {};

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

Considering that underscore and esprima aren't being required here, but are expected, it would be good to throw an error of some sort if those two globals don't exist.

It'd also be good to make a not in the README that those are two dependencies (even though they're listed in the package.json).

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Sounds good! 55fdaf7

}

(function(exports) {

/* Returns true if the code (a string) matches the structure in rawStructure
Throws an exception if code is not parseable.
Example:
code = "if (y > 30 && x > 13) {x += y;}";
rawStructure = function structure() { if(_) {} };
match(code, rawStructure); */

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

Your comment structure threw me. Ending the comment after the text of the last line made the comment (visually) blend into the following code. On my first read I thought that everything was just one massive comment and was confused. Additionally at KA we tend to prefix our multi-line comments with *, thus your comment would look something like:

/*
 * Returns true if the code (a string) matches the structure in rawStructure
 *
 * Throws an exception if code is not parseable.
 *
 * Example:
 *   code = "if (y > 30 && x > 13) {x += y;}";
 *   rawStructure = function structure() { if(_) {} };
 *   match(code, rawStructure);
 */

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Reformatted comments: 71f3ed9

function match(code, rawStructure) {
var structure = parseStructure(rawStructure);
var codeTree = esprima.parse(code);
var toFind = structure.body;
var peers = [];
if (_.isArray(structure.body)) {
toFind = structure.body[0];
peers = structure.body.slice(1);
}
var result = checkMatchTree(codeTree, toFind, peers);
return result;
}

/* Returns a tree parsed out of the structure. The returned tree is an
abstract syntax tree with wildcard properties set to undefined.
structure is a specification looking something like:
function structure() {if (_) { var _ = 3; }}
where _ denotes a blank (anything can go there),
and code can go before or after any statement (only the nesting and
relative ordering matter). */
function parseStructure(structure) {
var fullTree = esprima.parse(structure.toString());
if (!fullTree.type === "Program" || !fullTree.body.length === 1 ||
!fullTree.body[0].type === "FunctionDeclaration" ||
!fullTree.body[0].body) {
throw "Poorly formatted structure code.";
}
var tree = fullTree.body[0].body;
simplifyTree(tree);
return tree;
};

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

You don't need to suffix function declarations with a ;. You'll only need to do that if you're doing an assignment, for example:

var parseStructure = function(structure) { ... };

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Oops, thanks for catching that -- I think they were left over from a refactor. Fixed: 71f3ed9


/* Recursively traverses the tree and sets _ properties to undefined
and empty bodies to null.
Wildcards are explicitly to undefined -- these undefined properties

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

*explicitly set to

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor
must exist and be non-null in order for code to match the structure.
Empty statements are deleted from the tree -- they need not be matched.
If the subtree is an array, we just iterate over the array using
for (var key in tree) */
function simplifyTree(tree) {
for (var key in tree) {
if (!tree.hasOwnProperty(key)) {
continue; // inherited property
}
if (_.isObject(tree[key])) {
if (isWildcard(tree[key])) {
tree[key] = undefined;
} else if (tree[key].type === esprima.Syntax.EmptyStatement) {
// Arrays are objects, but delete tree[key] does not
// update the array length property -- so, use splice.
_.isArray(tree) ? tree.splice(key, 1) : delete tree[key];
} else {
simplifyTree(tree[key]);
}
}
}
};

/* Returns whether or not the node is intended as a wildcard node, which
can be filled in by anything in others' code. */
function isWildcard(node) {
return (node.name && node.name === "_") ||
(_.isArray(node.body) && node.body.length === 0);
};

/* Returns true if currTree matches the wildcard structure toFind.
currTree: The syntax node tracking our current place in the user's code.
toFind: The syntax node from the structure that we wish to find.
peersToFind: The remaining ordered syntax nodes that we must find after
toFind (and on the same level as toFind). */
function checkMatchTree(currTree, toFind, peersToFind) {
if (_.isArray(toFind)) {

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

Any particular reason for still having these lines?

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Mostly as a sanity check -- the recursion is pretty complex with toFind and peersToFind, and it's easy for arrays to slip by since the key access still works on them. We could remove it, but it's nice to have just in case for development.

console.error("toFind should never be an array.");
console.error(toFind);
}
if (exactMatchNode(currTree, toFind)) {
return true;
}
for (var key in currTree) {
if (!currTree.hasOwnProperty(key) || !_.isObject(currTree[key])) {
continue; // Skip inherited properties
}
// Recursively check for matches
if ((_.isArray(currTree[key]) &&

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

Perhaps too cute but I'd be inclined to write this something like:

(_.isArray(currTree[key]) ? checkNodeArray : checkMatchTree)(currTree[key], toFind, peersToFind)

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

That is awesome! I think you're right that it might be too cute, though :) I think I'll keep it verbose for now.

This comment has been minimized.

Copy link
@sophiebits

sophiebits Jul 11, 2013

Contributor

I'd personally do

if (_.isArray(currTree[key]) ?
        checkNodeArray(currTree[key], toFind, peersToFind) :
        checkMatchTree(currTree[key], toFind, peersToFind)) {
    return true;
}

which is still pretty clear (clearer than yours?) and avoids checking isArray twice.

checkNodeArray(currTree[key], toFind, peersToFind)) ||
(!_.isArray(currTree[key]) &&
checkMatchTree(currTree[key], toFind, peersToFind))) {
return true;
}
}
return false;
};

/* Returns true if this level of nodeArr matches the node in
toFind, and also matches all the nodes in peersToFind in order. */
function checkNodeArray(nodeArr, toFind, peersToFind) {
for (var i = 0; i < nodeArr.length; i += 1) {
if (checkMatchTree(nodeArr[i], toFind, peersToFind)) {
if (!peersToFind || peersToFind.length === 0) {
return true; // Found everything needed on this level.
} else {
// We matched this node, but we still have more nodes on
// this level we need to match on subsequent iterations
toFind = peersToFind.shift();
}
}
}
return false;
};

/* Checks whether the currNode exactly matches the node toFind.
A match is exact if for every non-null property on toFind, that
property exists on currNode and:
0. If the property is undefined on toFind, it must exist on currNode.
1. Otherwise, the values have the same type (ie, they match).
2. If the values are numbers or strings, they match.
3. If the values are arrays, checkNodeArray on the arrays returns true.
4. If the values are objects, checkMatchTree on those objects
returns true (the objects recursively match to the extent we
care about, though they may not match exactly). */
function exactMatchNode(currNode, toFind) {
for (var key in toFind) {
// Ignore inherited properties; also, null properties can be
// anything and do not have to exist.
if (!toFind.hasOwnProperty(key) || toFind[key] === null) {
continue;
}
var subFind = toFind[key];
var subCurr = currNode[key];
// Undefined properties can be anything, but they must exist.
if (subFind === undefined) {
if (subCurr === null || subCurr === undefined) {
return false;
} else {
continue;
}
}
// currNode does not have the key, but toFind does
if (subCurr === undefined || subCurr === null) {
return false;
}
// Now handle arrays/objects/values
if (_.isObject(subCurr) !== _.isObject(subFind) ||
_.isArray(subCurr) !== _.isArray(subFind) ||
(typeof(subCurr) !== typeof(subFind))) {
console.error("Object/array/other type mismatch.");
return false;
} else if (_.isArray(subCurr)) {
// Both are arrays, do a recursive compare.
// (Arrays are objects so do this check before the object check)
if (subFind.length === 0) {
continue; // Empty arrays can match any array.
}
var newToFind = subFind[0];
var peers = subFind.length > 1 ? subFind.slice(1) : [];
if (!checkNodeArray(subCurr, newToFind, peers)) {
return false;
}
} else if (_.isObject(subCurr)) {
// Both are objects, so do a recursive compare.
if (!checkMatchTree(subCurr, subFind)) {
return false;
}
} else if (!_.isObject(subCurr)) {
// Check that the non-object (number/string) values match
if (subCurr !== subFind) {
return false;
}
} else { // Logically impossible, but as a robustness catch.

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

Perhaps it'd be better to throw an exception here, rather than log an error to the console.

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Now throws an exception (keeps the console.errors for debugging): 71f3ed9

console.error("Some weird never-before-seen situation!");
console.error(currNode);
console.error(subCurr);
}
}
return true;
};

exports.match = match;

})(exports);
7 changes: 7 additions & 0 deletions testrunner.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/* Runs the QUnit tests for StructuredJS. `node testrunner.js` */

var runner = require("./node_modules/qunit");

This comment has been minimized.

Copy link
@jeresig

jeresig Jul 11, 2013

Member

This can just be var runner = require("qunit");

This comment has been minimized.

Copy link
@swestwood

swestwood Jul 11, 2013

Author Contributor

Cool, thanks: 71f3ed9

runner.run({
code: {path: "./structured.js", namespace: "structured"},
tests: "tests.js"
});
Loading

0 comments on commit d6218a8

Please sign in to comment.