Dynamic Analysis Framework for Python

This is a framework which performs dynamic analysis of Python code. The tool accepts a piece of Python code along with a valid input, executes it and creates a graph representation which captures the runtime semantics of the program. The framework proceeds to feed the graph into a Graph Convolutional Neural Network which predicts a NumPy API which is semantically closest to it.

More details on the tool and how it works can be found in the Technical Report here.

Installation

The tool has been tested on Python 3.8.2 and 3.8.10. Before proceeding to install the tool, please check the version of your Python install using the command:

python --version

If the output is not Python 3.8.*, we strongly recommend either installing Python 3.8 from here, if one does not have an existing Python installation, or installing a virtual environment using Anaconda, if one already has one or more versions of Python installed. Please follow the following instructions to install Python 3.8 using Anaconda, if applicable, else skip to the installation of this repository.

Installing Python 3.8 using Anaconda (Optional)

Download the Anaconda installer from here and follow all the steps to install. Be sure to initialize conda and restart the terminal to ensure the installation is completed.
Run the following to create a new environment with Python 3.8.2 and follow the prompts:

conda create --name myenv python=3.8.2

Run the following to activate the new environment

conda activate myenv

Verify the Python version using the command:

python --version

Installing the Repository

Clone the repository using the following command (one may use the SSH link as well) and navigate inside the directory:

git clone https://github.com/aayan636/semantic-analysis-python.git
cd semantic-analysis-python

Run the following to install dependencies:

pip install -r requirements.txt

To download the datasets used in our experiments, run the following command which will download and unzip the dataset in the correct location:

python gnn/data/download_dataset.py

Reproducing Results from the Report

This section deals with how to reproduce the results from the tech report linked above. To see how to use the framework for custom code please skip to the Usage section below.

Baselines

First navigate to the correct directory (from the base directory of the repo)

cd Evaluation

Section 5.2 from the report deals with Baseline numbers on the predictions generated by the Codex Large Language Models. To test them out, please request an OpenAI API Key from here. After that use the following commands (Note it can take 5-6 minutes to run on a Macbook Pro 2020 with 2.4 GHz i9 processor due to rate throttling):

export OPENAI_API_KEY=<Your OpenAI API Key>
python run_llm_baseline.py --model M --dataset N

where M can be code-cushman-001 or code-davinci-002 (the two codex models we compare against) and N can be 1, 2 or 3 (referring to the 3 datasets Wild1, Wild2 and Wild3) in section 5.2. Note that we set the temperature of the LLM to 0.1, as is recommended for optimal performance, hence there might be slight variations in the numbers on multiple tries.

Scaling across different Implementations

Section 5.4 from the report deals with scaling across programs written in a different style of coding compared to the code used to generate the training dataset. To replicate these experiments, run the following (takes approx 1 minute with R=1 on a Macbook Pro 2020 with 2.4 GHz i9 processor):

python run_instrumented_wild.py --length L --repeats R

where L is any positive integer (in the report, we try out 1, 2, 5, 10), and R can be any positive integer (in the report, we use only 10). L is the length of the input with which the code snippets are executed, while R is the number of times one code snippet is repeated (due to randomness, the results can be slightly different, a higher R would lead to lesser variance but take more time)

Scaling across Input Size

Navigate to the following from the base directory of the repo:

cd gnn

Section 5.3 from the report deals with scaling across programs run on inputs on varying sizes. To reproduce these numbers, run the following (Note it can take 15-30 minutes to run on a Macbook Pro 2020 with 2.4 GHz i9 processor depending on the size of the dataset):

python train_or_eval.py --mode evaluate --testMode M

where M can be test, testL, testLL or train (the different datasets on which we report accuracies on). The model is trained on the train partition. The code and downloaded data only corresponds to dataset 2 as that is what most of the experiments were run on.

Generating New Data

Warning: the dataset generation takes between 2-15 hours on a Macbook Pro 2020 with 2.4 GHz i9 processor, with using 16 cores. To train a different model, navigate to the home directory of the repo, and run the following:

./script.sh 1000 train

This will generate 15 commands, something like this:

python run_instrumented.py 1 1000 train >1.out 2> 1.err &
python run_instrumented.py 2 1000 train >2.out 2> 2.err &
...
python run_instrumented.py 15 1000 train >2.out 2> 2.err &

Run each of the commands, which will launch an independent thread in the background (simply copy and paste all the commands at once to execute them). Each thread will generate 1000 graphs, one can choose more or less.

Repeat the process replacing train with test, testL and testLL to generate the respective datasets. The generated datasets would be created in the home directory, simply move all of them to gnn/data, and move to the Training section to see how to train a new model.

mv D gnn/data/

where D can be train, test, testL, testLL

Training

After generating the new dataset, navigate to the directory (from the home directory of the repo):

cd gnn

and run the following to start the training:

python train_or_eval.py --mode train --batch_size 40 --num_epochs 20 --data_path data

Other flags can be set to change the save directories etc, but are handled by default. This configuration takes approximately 2-3 hours on a Macbook Pro 2020 with 2.4 GHz i9 processor.

Usage

To evaluate the framework on a custom piece of code, we have provided a sample implementation of how this can be done. Navigate to the home directory of the repo and run the following:

python run_custom.py

In run_custom.py, take a look at the testWildImpl function. There, a user needs to generate an input to run their code on, and call their code snippet, which is written in custom.py. The user can write any code here, as long as they return the final value which is of interest, as is demonstrated in the example. The actual implementation can also be across multiple files, as long the driver is defined as in run_custom.py

This repository has been migrated over from here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
APIs		APIs
Evaluation		Evaluation
demos		demos
gnn		gnn
instrumentation		instrumentation
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
custom.py		custom.py
requirements.txt		requirements.txt
run_custom.py		run_custom.py
run_instrumented.py		run_instrumented.py
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Analysis Framework for Python

Installation

Installing Python 3.8 using Anaconda (Optional)

Installing the Repository

Reproducing Results from the Report

Baselines

Scaling across different Implementations

Scaling across Input Size

Generating New Data

Training

Usage

About

Releases

Packages

Languages

License

aayan636/semantic-analysis-python

Folders and files

Latest commit

History

Repository files navigation

Dynamic Analysis Framework for Python

Installation

Installing Python 3.8 using Anaconda (Optional)

Installing the Repository

Reproducing Results from the Report

Baselines

Scaling across different Implementations

Scaling across Input Size

Generating New Data

Training

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages