PolyFax: An empirical study toolkit on Github projects

Introduction

PolyFax provides basic features, including repository crawler, commit classification, and language interaction categorization. Its precision and recall indicate the possibility of being applied for multiple purposes. For example, the VCC can be used for empirical analysis and provide abundant training data for machine learning (or deep learning) based vulnerability detectors since the code snippets, issues, or even CVEs of the commits can be retrieved on the results of VCC. Moreover, it is not limited to the type of language due to the language-independent implementation.

Meanwhile, PolyFax provides a multi-task wrapper in implementation. Hence it enables parallel processes for Crawler and Analyzer.

Moreover, PolyFax provides three analyzers of implementation of NBR analysis, including NBR on language selection and vulnerabilities, language interfacing mechanism and vulnerabilities, and single language and vulnerabilities.

Based on the design and implementation of PolyFax, developers or researchers can easily extend or customize the PolyFax based on its object-oriented design.

Setup PolyFax

Here we present the procedure to setup PolyFax through source code in three steps as below:

Check prerequisites. PolyFax is well tested with Python3 under OS ubuntu 18.04; the suggested python version is 3.8+.
Download source code through this.
Enter directory PolyFax and run dependence.sh to install the necessary dependencies (e.g., fuzzywuzzy, nltk).

Additionaly, we also provide a docker image with all dependences installed (suggested), it contains the data of the paper On the Vulnerability Proneness of Multilingual Code with the link. Use the command "docker pull daybreak2019/fse22_vpomc" to download the image. The datasets can also be retrieved from Figshare.

Use PolyFax

Following sections demonstrate how to use PolyFax with its four primary functionalities: grabbing repositories from GitHub and running the two analyzers of vulnerability-fixing commit, language interaction categorization and NBR analysis.

Before the experiments, execute the following command to get environment ready:

docker pull daybreak2019/fse22_vpomc:v1.0
docker run -itd --name "polyfax" daybreak2019/fse22_vpomc:v1.0
docker attach polyfax

cd root/ && git clone https://github.com/Daybreak2019/PolyFax && cd PolyFax

Default parameters

PolyFax has a default configure file under config.ini with the content as below:

UserName: the username of GitHub account
Token: the access token of GitHub account
TaskNum: the number of process for PolyFax
Languages: the languages the projects should contain
Domains: the domains the projects belong to
MaxGrabNum: the maximum number of projects to grab

Specifically, {Languages=[]} and {Domains=[]} means the Crawler would not check languages and domains. {MaxGrabNum=-1} indicates Crawler will grab repositories as many as possible.

Grabbing repositories from GitHub

With the {MaxGrabNum=5} configured for demonstration, run the following command to grab the repository from GitHub. In this step, Crawler will grab the repository profile, clone the repositories, and grab commits to local storage.

    python polyfax.py -a crawler

The runtime log is similar as:

Run analyzer of vulnerability-fixing commit categorization (VCC)

When repository profiles and commits are grabbed to local, users can use the following command to categorize vulnerability-fixing commits:

    python polyfax.py -a vcc

The runtime log is similar as:

Run analyzer of language interaction categorization (LIC)

When repository profiles and the sources of repositories are cloned to local in 2.2, users can use the following command to categorize the projects by language interaction mechanisms:

    python polyfax.py -a lic

The runtime log is similar as:

Run NBR analysis in paper [On the Vulnerability Proneness of Multilingual Code]

Before run NBR experiments, copy the corresponding data to PolyFax with the following command:

cp /root/FSE22_Data/* /root/PolyFax/Data/ -rf

NBR: #Secutiry vulnerability vs Language selection

    python polyfax.py -a nbr-combo

the results correspond to Table 3-5 in the paper.

NBR: #Secutiry vulnerability vs Language interfacing category

    python polyfax.py -a nbr-lic

the results correspond to Table 6-7 in the paper.

NBR: #Secutiry vulnerability vs Single language

    python polyfax.py -a nbr-single

the results correspond to Table 8 in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Data/Config		Data/Config
images		images
lib		lib
README.md		README.md
dependence.sh		dependence.sh
polyfax.py		polyfax.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data/Config

Data/Config

images

images

lib

lib

README.md

README.md

dependence.sh

dependence.sh

polyfax.py

polyfax.py

Repository files navigation

PolyFax: An empirical study toolkit on Github projects

Introduction

Setup PolyFax

Use PolyFax

Default parameters

Grabbing repositories from GitHub

Run analyzer of vulnerability-fixing commit categorization (VCC)

Run analyzer of language interaction categorization (LIC)

Run NBR analysis in paper [On the Vulnerability Proneness of Multilingual Code]

NBR: #Secutiry vulnerability vs Language selection

NBR: #Secutiry vulnerability vs Language interfacing category

NBR: #Secutiry vulnerability vs Single language

About

Releases

Packages

Languages

awen-li/PolyFax

Folders and files

Latest commit

History

Repository files navigation

PolyFax: An empirical study toolkit on Github projects

Introduction

Setup PolyFax

Use PolyFax

Default parameters

Grabbing repositories from GitHub

Run analyzer of vulnerability-fixing commit categorization (VCC)

Run analyzer of language interaction categorization (LIC)

Run NBR analysis in paper [On the Vulnerability Proneness of Multilingual Code]

NBR: #Secutiry vulnerability vs Language selection

NBR: #Secutiry vulnerability vs Language interfacing category

NBR: #Secutiry vulnerability vs Single language

About

Resources

Stars

Watchers

Forks

Languages