To do this, the tool makes uses of the Moses statistical machine translation framework to perform the translation, along with some pre and post processing to handle code specific considerations.
Using the tool
Currently, building from the DockerFile requires additional data (the phrase tables and language models), which are too large to host on GitHub. These are provided in the docker image, but you will need to extract them if you wish to rebuild the image yourself.
jsNaughty depends on several components to run the deobfuscation process. The main service relies on three servers - two instances of mosesserver and one instance on lmServer.
The mosesserver is part of Moses and is used to provide the initial translation suggestions after preprocessing. One server is used with the hashing option, and one with the original obfuscated names.
The lmServer is used to resolve inconsistencies between translations of a variable used on different lines. Moses translates each line individually, but in source code, variables must be named consistently within a scope.
Examples on how we run these servers can be found in DockerFolder/startServers.sh or the README file in the jsnaughty subdirectory (where the website code is located). In tools/config.py are the definitions of how to contect to each server.
Also, during experiments, we found jsNaughty performed similarily in quality to JSNice, but that the recovering names covered very different sets of names. We found combining them created a tool more effect than either. This option is built into both the website and script included in the Docker.
However, it is dependant on JSNice's web service being available. If this step fails, instead the tool will fall back on just using our translation framework to generate names.
One source of slow-downs during translation is the size of the phrase table. The phrase table size can be reduced via pruning; the translations with the least support can be removed, drastically reducing the table size while hopefully not affecting translation quality (see http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc5). We are still investigating how much the pruning affects our accuracy - the Docker and website are using the full phrase tables.
However if you have an uncompressed phrase table and associated corpora, you can run reduce the table size in the following manner (on Linux):
<moses-phrase-filter-path>/Bin/Linux/Index/IndexSA.O32 corpus.clear <moses-phrase-filter-path>/Bin/Linux/Index/IndexSA.O32 corpus.ugly nohup cat phrase-table | <moses-path>/contrib/sigtest-filter/filter-pt -e corpus.clear -f corpus.ugly -l a+e -n 30 > phrase-table.pruned & <moses-path>/bin/processPhraseTableMin -no-alignment-info -encoding None -in phrase-table.pruned -nscores 4 -out phrase-table-pruned.minphr