Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name-agnostic way to represent an AST #85

Closed
alexfernandez opened this issue May 24, 2020 · 2 comments
Closed

Name-agnostic way to represent an AST #85

alexfernandez opened this issue May 24, 2020 · 2 comments

Comments

@alexfernandez
Copy link

alexfernandez commented May 24, 2020

Hi! I'm writing a code challenge website, with some rather unusual features like rewarding the smallest code, the fastest, and so on. It's a work in progress, lots of fun!

I'm using abstract-syntax-tree to find out the size of a solution, to disregard tangential details like variable names. It is working great so far.

To find out who submitted a given code first, I want to be able to identify identical solutions, so that users are not compelled to send the same thing over and over. For instance there are always minute variations in runtime which should be disregarded. I would like to use this library too.

Solutions are stored to the database; I need to store the AST for each in an agnostic way, so that they can be compared. I cannot use equal(ast1, ast2) since that would involve doing n^2 comparisons.

I think I can use the following algorithm:

  • Walk over the tree.
  • Find out all identifiers.
  • For each one, replace the name with a string from a sequence. For instance: a, b, ..., z, a0, a1 and so on.
  • Convert the tree to code again.

The end result should be similar to a minification algorithm, at least the part about variables. It is not perfect: it will not detect minute changes in code. But it should work.

Question 1: Do you think it will detect all identical ASTs, or am I missing something?
Question 2: would a minify() function be an interesting addition to the library?

Thanks in advance!

@emilos
Copy link
Contributor

emilos commented May 24, 2020

hey @alexfernandez very interesting!

Question 1: Yes, I think it's possible to detect same ASTs. I think a nice way to implement this is to have a way to create a hash out of an abstract syntax tree, and then just compare the hashes.

Question 2: yes, that would be nice at some point. I documented some techniques here https://github.com/buxlabs/astoptech

PRs are always welcome, even for basics :)

@alexfernandez
Copy link
Author

@emilos Thanks for your prompt reply!

Hmmm, detecting ASTs disregarding variable (and function) names is not as trivial as I thought. The different namespaces (variables, functions) must be considered. After a quick look at the types of nodes there are multiple ways to declare variables, e.g. as VariableDeclaration but also as ImportSpecifier or as parameter names in a FunctionDeclaration. Then there are even more ways of using them! They all need to be considered for a valid minification and/or AST identification.

I think I will leave this right now, I hope I can revisit it in the future. Thanks anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants