Skip to content
Bilateral Neural Network implementation in Tensorflow
Java C C++ Other
Branch: master
Clone or download
Latest commit 309f5f1 Mar 23, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
analysis update Mar 9, 2018
ast update the pkl files Oct 3, 2018
ast2vec refactoring scripts Sep 26, 2018
ast_removed_clones ast removed clones Dec 25, 2017
baselines update lstm base line Mar 6, 2018
bi-tbcnn refactoring scripts Sep 26, 2018
code add 4 other classes Oct 3, 2018
corpus updated the scripts Sep 7, 2018
crawler refactoring scripts Sep 26, 2018
dendogram dendrogram Mar 7, 2018
doc merge Jan 28, 2018
docker refactoring scripts Sep 26, 2018
input make docker run script consistent with the script outsider docker Sep 26, 2018
parser
test_vectors update test vertors Mar 9, 2018
.gitignore code and datasets accompany the NL4SE paper Nov 22, 2017
README.md
algorithms.txt
r update the pkl files Oct 3, 2018
run the generated AST in pickle format Oct 3, 2018

README.md

Bilateral Neural Networks for Cross-Language Algorithm Classification

The project is the implementation of our work on the Bilateral Neural Networks introduced in our 2 papers :

  • SANER'19: Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification, by Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG, in the 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering, Research Track, Zhejiang University in Hangzhou, February 24-27, 2019
  • NL4SE-AAAI'18: Cross-Language Learning for Program Classification Using Bilateral Tree-Based Convolutional Neural Networks, by Nghi D. Q. BUI, Lingxiao JIANG, and Yijun YU. In the proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) Workshop on NLP for Software Engineering, New Orleans, Lousiana, USA, 2018.

You can find the papers here: https://bdqnghi.github.io/publications/

Here's our proposed Neural Network structure. In short, this structure is a variance of the Bilateral Neural Network (SNN), which is a class of Neural Network used for finding similarity or a relationship between two comparable things, which can also known as the name "Siamese Neural Network". Each sub network in this case can be:

Installation

workflow

We have prepared a fully automated workflow (Figure 1) for you to classify programs against known algorithm names.

workflow Figure 1. The workflow to use our tools

To do so, you need to first install docker command

sudo apt-get install docker-ce

Enter the following scripts on the command line to build and run the system respectively:

./r clean

Modify the inputs

Insider the input subfolder, you will find the following files:

input
├── algorithm.name
├── all.algo
├── all.lang
├── config.json
├── language.name
└── srcml_node_map.tsv

If you just want to see how it works, we have prepared only two algorithms as follows:

algorithm.name: Names of algorithms

bubblesort
mergesort

Note. You can add more algorithm names. In the NL4SE paper, we used 6 algorithms, included in the all.algo file. You can replace algorithm.name with all.algo.

language.name: names of programming languages

java .java
cpp .cpp

Note You can add more programming languages. In the NL4SE paper, we used just C++ and Java. Other programming languages are included in the all.lang file. You can replace language.name with all.lang.

config.json: configuration of the Github API, please subsitute it with your own username and access token.

{
    "GITHUB_USERNAME": "...",
    "GITHUB_ACCESS_TOKEN": "..."
}

srcml_node_map.tsv: the syntax node types of selected programming language(s)

UNIT_KIND = 0
DECL = 1
DECL_STMT = 2
INIT = 3
EXPR = 4
EXPR_STMT = 5
COMMENT = 6
CALL = 7
CONTROL = 8
INCR = 9
...
STRONG = 383
OMP_OMP = 384
SPECIAL_CHARS = 385

Note that you need to make sure this file is consistent with the underlying parser.

$docker run -it fasttool/fast fast -v
fast v0.0.7 commit id: 3e368dd1e56f5bb8f02673b1c7441f567eab67ee with local changes id: e1b7ca5bf36050ce774cb2650446115bb49bf91ac5d889a9dbb4911ed8130225
built with 6.4.0 on Nov 14 2017 at 20:00:04

If a different version of fast is prepared, it might require a regenerated input file ./ast2vec/ast2vec/fast_pb2.py if there is any change in the language grammar.

Other parameters for tensorflow framework are stored in the following two files, each corresponds to a Tensorflow run.

./ast2vec/ast2vec/parameters.py
./bi-tbcnn/bi-tbcnn/parameters.py

References

@inproceedings{DBLP:conf/aaai/BuiJY18,
  author    = {Nghi D. Q. Bui and
               Lingxiao Jiang and
               Yijun Yu},
  title     = {Cross-Language Learning for Program Classification Using Bilateral
               Tree-Based Convolutional Neural Networks},
  booktitle = {The Workshops of the The Thirty-Second {AAAI} Conference on Artificial
               Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018.},
  pages     = {758--761},
  year      = {2018},
  crossref  = {DBLP:conf/aaai/2018w},
  url       = {https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17338},
  timestamp = {Thu, 19 Jul 2018 13:38:55 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/aaai/BuiJY18},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@INPROCEEDINGS{8667995, 
    author={B. {Nghi D. Q.} and Y. {Yu} and L. {Jiang}}, 
    booktitle={2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)}, 
    title={Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification}, 
    year={2019}, 
    volume={}, 
    number={}, 
    pages={422-433}, 
    keywords={Neural networks;Prediction algorithms;Classification algorithms;Syntactics;Semantics;Machine learning algorithms;Task analysis;cross-language mapping;program classification;algorithm classification;code embedding;code dependency;neural network;bilateral neural network}, 
    doi={10.1109/SANER.2019.8667995}, 
    ISSN={1534-5351}, 
    month={Feb},
}

You can’t perform that action at this time.