Skip to content
forked from appleseedlab/maki

A tool for analyzing syntactic and semantic properties of C Preprocessor macros in C programs

License

Notifications You must be signed in to change notification settings

PappasBrent/maki

 
 

Repository files navigation

Artifact Documentation

Table of contents

Purpose

This artifact provides the source code for Maki, a tool for analyzing macro usage portability as described in this artifact's associated paper. This artifact also contains a dataset with the paper's major results, and instructions on how to build and run Maki to replicate these results.

We are applying for the available and reusable badges. We believe this artifact deserves the available badge because it is publicly available on Zenodo at https://doi.org/10.5281/zenodo.7783131 (DOI 10.5281/zenodo.7783131). We believe this artifact deserves the reusable badge because it includes instructions for reproducing all the paper's major results, along with a dataset one may verify them against. This artifact also utilizes Docker to facilitate reuse, as recommended in the ICSE 2024 Call for Artifact Submissions.

Provenance

The artifact as reported in the original paper is available on Zenodo (https://doi.org/10.5281/zenodo.7783131). A pre-print of the original paper referencing this artifact can be found here: https://pappasbrent.com/assets/Semantic_Analysis_of_Macro_Usage_for_Portability_-_Preprint.pdf.

Data

The datasets directory contains the original data that all major results of the paper are based on, and is approximately 2 megabytes in size. An explanation of its directory structure and contents follows:

datasets/
├── figure_data - Contains all raw data for paper figures
│   ├── all_raw_data.csv - Contains all programs' macro definition properties.
│   ├── figure_2_chart_data.csv - Contains data for Figure 2 in the paper.
│   ├── figure_3_chart_data.csv - Contains data for Figure 3 in the paper.
│   └── figure_4_chart_data.csv - Contains data for Figure 4 in the paper.
├── linux_patch_submissions - Contains submitted Linux kernel patches.
│   ├── accepted/ - Contains submitted Linux patches that maintainers accepted.
│   └── rejected/ - Contains submitted Linux patches that maintainers rejected.
├── porting_enscript_and_m4 - Contains notes taken while hand-porting all macros
│   │                         in enscript and m4 to C.
│   ├── enscript_maki_output.csv - Contains Maki's analysis results for
│   │                              enscript, formatted as a CSV file.
│   ├── enscript_transformation_notes.md - Contains notes taken while hand-
│   │                                      porting macros in enscript to C.
│   ├── enscript_transformations.diff - Diff of changes before and after hand-
│   │                                   porting macros in enscript to C.
│   ├── m4_maki_output.csv - Contains notes taken while hand-porting macros in
│   │                        m4 to C.
│   ├── m4_transformation_notes.md - Contains notes taken while hand-porting
│   │                                macros in m4 to C.
│   └── m4_transformations.diff - Diff of changes before and after hand-porting
│                                 macros in m4 to C.
├── porting_linux_ipc_and_sound_atmel - Contains notes taken while hand-porting
│   │                                   all macros in the ipc and sound/atmel
│   │                                   Linux modules.
│   ├── linux_ipc_sound_atmel_transformation_notes.md - Contains notes taken
│   │                                                   while hand-porting all
│   │                                                   macros in the ipc and
│   │                                                   sound/atmel Linux
│   │                                                   modules.
│   └── linux_ipc_sound_atmel_transformations.diff - Diff of changes before and
│                                                    after hand-porting macros
│                                                    in the ipc and sound/atmel
│                                                    Linux modules.
└── random_sample_notes
│   ├── all.csv - Contains notes taken while manually verifying the properties
│   │             of all macros in the random sample.
│   ├── only_false_negatives.csv - Contains notes only on macros in the random
│   │                              sample for which Maki reported a false
│   │                              negative.
│   └── only_false_positives.csv - Contains notes only on macros in the random
│                                  sample for which Maki reported a false
│                                  positive.
├── time_data - Contains data on Maki's performance.
│   ├── defn_analysis_times.csv - Contains performance data for analyzing
│   │                                   macro definitions across all programs.
│   └── invocation_analysis_times.csv - Contains performance data for analyzing
│                                       macro invocations across all programs.

Setup

Requirements

  • Hardware:
    • At least eight CPU cores and 8GB of RAM are recommended.
    • 1.94GB of free space if one only wishes to "kick the tires" and verify that Maki works by replicating only the analysis of the program bc. Replicating Maki's entire evaluation requires 620GB of storage space.
  • Software: Docker; tested with Docker 24.0.6.

Instructions

First, either:

  • Build the Docker image:

    docker build -t maki:1.0 .
  • Or load the Docker image from the provided tar file:

    docker load --input maki.tar

After building/loading the image, run the docker image as a container:

docker run --name maki-container -it maki:1.0

Build Maki's associated Clang plugin:

bash build_clang_plugin.sh

To restart and attach to the same container in the future without creating a new one, run the following commands:

docker start maki-container
docker attach maki-container

Usage

Basic usage

First follow the instructions listed in the Setup section, being sure to build Maki's docker image, run it as a container, and build Maki's Clang plugin frontend. Then one may run Maki's wrapper script, e.g.:

bash build/bin/cpp2c tests/addressed_args.c

Running the above command will tell Maki to analyze all macro invocations in the source file tests/addressed_args.c and print the results to standard output. The last line of this output should be a large JSON object containing the properties of the last macro invocation in the file:

Invocation      {     "Name" : "ADDR_OF",     "DefinitionLocation" : "/maki/tests/addressed_args.c:3:9",     "InvocationLocation" : "/maki/tests/addressed_args.c:9:5",     "ASTKind" : "Expr",     "TypeSignature" : "int *(int)",     "InvocationDepth" : 0,     "NumASTRoots" : 1,     "NumArguments" : 1,     "HasStringification" : false,     "HasTokenPasting" : false,     "HasAlignedArguments" : true,     "HasSameNameAsOtherDeclaration" : false,     "IsExpansionControlFlowStmt" : false,     "DoesBodyReferenceMacroDefinedAfterMacro" : false,     "DoesBodyReferenceDeclDeclaredAfterMacro" : false,     "DoesBodyContainDeclRefExpr" : false,     "DoesSubexpressionExpandedFromBodyHaveLocalType" : false,     "DoesSubexpressionExpandedFromBodyHaveTypeDefinedAfterMacro" : false,     "DoesAnyArgumentHaveSideEffects" : false,     "DoesAnyArgumentContainDeclRefExpr" : true,     "IsHygienic" : true,     "IsDefinitionLocationValid" : true,     "IsInvocationLocationValid" : true,     "IsObjectLike" : false,     "IsInvokedInMacroArgument" : false,     "IsNamePresentInCPPConditional" : false,     "IsExpansionICE" : false,     "IsExpansionTypeNull" : false,     "IsExpansionTypeAnonymous" : false,     "IsExpansionTypeLocalType" : false,     "IsExpansionTypeDefinedAfterMacro" : false,     "IsExpansionTypeVoid" : false,     "IsAnyArgumentTypeNull" : false,     "IsAnyArgumentTypeAnonymous" : false,     "IsAnyArgumentTypeLocalType" : false,     "IsAnyArgumentTypeDefinedAfterMacro" : false,     "IsAnyArgumentTypeVoid" : false,     "IsInvokedWhereModifiableValueRequired" : false,     "IsInvokedWhereAddressableValueRequired" : false,     "IsInvokedWhereICERequired" : false,     "IsAnyArgumentExpandedWhereModifiableValueRequired" : false,     "IsAnyArgumentExpandedWhereAddressableValueRequired" : true,     "IsAnyArgumentConditionallyEvaluated" : false,     "IsAnyArgumentNeverExpanded" : false,     "IsAnyArgumentNotAnExpression" : false  }

Copying evaluation results out of the Docker container

Run the following command on your host system to copy files out of the Docker container to your host system:

docker cp maki-container:/path/to/source-file /path/to/destination-file

For example, to copy the file README.md out of the container to the current directory in the host system, run the following command:

docker cp maki-container:/maki/README.md .

Testing

Maki's test suite is located in the tests/Tests directory and automated with LLVM LIT and FileCheck. If you used the provided Dockerfile to build Maki, then these dependencies are already installed, and you can run Maki's test suite with the command:

mkdir -p build
cmake -S . -B build/ \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
    -DMAKI_ENABLE_TESTING=ON \
    -DLLVM_EXTERNAL_LIT=/usr/local/bin/lit \
    -DFILECHECK_PATH=/usr/bin/FileCheck-14
cmake --build build/ -t check-cpp2c

Otherwise, you will first need to download the following dependencies:

  • The Python lit script from PyPi:

    python3 -m pip install lit
  • FileCheck as one of LLVM's dev dependencies:

    sudo apt install llvm-dev
  • jq from your package manager, e.g.,

    sudo apt install jq

Then from the project root, run the following command to configure Maki with testing enabled and to run its test suite:

mkdir -p build
cmake -S . -B build/ \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
    -DMAKI_ENABLE_TESTING=ON \
    -DLLVM_EXTERNAL_LIT=<lit_path> \
    -DFILECHECK_PATH=<filecheck_path>
cmake --build build/ -t check-cpp2c

Where <lit_path> and <filecheck_path are the paths to your lit Python script and FileCheck binary, respectively.

Replicating major paper results (kicking the tires)

Replicating all the results presented in the paper would require more than 17 days; therefore we provide reviewers with the option to replicate a small subset of the results, which should only require about five minutes.

If one only wishes to "kick the tires" and verify that Maki's evaluation framework works correctly, then they may run the following script:

bash replicate_results.sh

By default, this will run Maki's evaluation on the program bc only. This script will attempt to download a compressed copy of bc's source code to evaluation/archived_programs, extract it to evaluation/extracted_programs, and run Maki's Clang frontend and Python analysis on it with eight processes. Intermediate and final results will be placed in subdirectories and files within the evaluation directory. The names of these subdirectories and files, along with how to interpret them, are as follows:

  • evaluation/macro_invocation_analyses/: Contains Maki's output for analyzing macro invocations in all programs. For each program, Maki will create a parallel directory structure, and for each C source file analyzed, Maki will create a file with the same name but with the extension .cpp2c containing the results of Maki's macro invocation analysis for that source file.

  • evaluation/macro_definition_analyses/: Contains Maki's output for analyzing macro definitions in all programs. Maki will output its results for each program to a JSON with the same name as the analyzed program. This file contains a single JSON object whose keys are the machine-readable names of characteristics studied in our evaluation. Each key is mapped to a JSON object with the three fields: total, the total number of macro definitions that display the characteristic associated with that key, olms, the number of object-like macros that display the characteristic, and flms, the number of function-like macros that display the characteristic.

  • evaluation/figure_data/: Contains CSV-formatted macro definition analysis data for all programs that was used to generate the figures in the original paper.

    • evaluation/figure_data/all_raw_data.csv: Contains all macro definition analyses for all programs in CSV form.
    • evaluation/figure_data/figure_2_data.csv: Contains the data used to generate Figure 2 in the original paper.
    • evaluation/figure_data/figure_3_data.csv: Contains the data used to generate Figure 3 in the original paper.
    • evaluation/figure_data/figure_4_data.csv: Contains the data used to generate Figure 4 in the original paper.
  • evaluation/time_data/: Contains CSV files reporting Maki's performance data.

    • evaluation/time_data/defn_analysis_times.csv: Lists the time elapsed for Maki to analyze all macro definitions in each program.
    • evaluation/time_data/invocation_analysis_times.csv: Lists time elapsed for Maki to analyze all macro invocations in each program.
    • evaluation/time_data/total_time_analysis.csv: Presents the five-point summary of and total elapsed time to analyze all macro invocations and analyses in all programs.

Resetting

If Maki's evaluation is cancelled or encounters an error before running to completion, please remove all intermediate analysis files before attempting to restart the evaluation. In particular, if one encounters the following error:

IndexError: index -1 is out of bounds for axis 0 with size 0

Then one must reset the evaluation directory to a clean state before trying to replicate the evaluation again. We provide a script for doing this:

bash reset_evaluation.sh

This script removes the following evaluation subdirectories:

  • archived_programs
  • evaluation_programs
  • extracted_programs
  • figure_data
  • macro_definition_analyses
  • macro_invocation_analyses
  • time_data

Replicating major paper results (all results)

The evaluation directory must be in a clean state before trying to replicate all results reported in the original paper. One can run the script reset_evaluation.sh as described in section above to do this. One may then perform Maki's full evaluation by passing the flag --all to the script replicate_results.sh like so:

bash replicate_results.sh --all

The script will download all 21 evaluation programs listed in evaluation/evaluation_programs_all.py to evaluation/archived_programs, and extract and build them in evaluation/extracted_programs. After building all programs, Maki's Clang frontend will run on eight processes to analyze all macro invocations in each of them. Maki will place its analyses for a program's macro invocations in evaluation/macro_invocation_analyses as soon as it finishes running. Next, Maki's python library will use the results from the previous step to analyze all macro definitions for each program, and will place these results in JSON files in evaluation/macro_invocation_analyses. Finally, Maki will convert the data in these JSON files to CSV format, and place the results in evaluation/figure_data. This directory contains data used to generate the figures in the original paper. CSV files containing the time elapsed to analyze each program's macro invocations and definitions will be placed in evaluation/time_data as well.

As a reminder, it took us over 19 days to run Maki on all the programs in our benchmark. We ran Maki with eight processes on all programs except Linux, for which we used 32 processes. Our machine had 2 64-core CPUs and 512GB of RAM. If one attempts to run Maki's full evaluation using fewer processes, or on a machine with less available memory, then it will likely take longer to complete.

Uninstalling

To completely uninstall Maki, first delete all its associated Docker containers:

docker container rm maki-container

Then, delete Maki's Docker image:

docker image rm maki:1.0

You may also want to clear your Docker installation's cache to remove any lingering data related to Maki on your system:

docker system prune -f

Note: This will tell Docker to delete all cached data that it is not currently using, so be careful not to delete any cached data you may want to keep!

Development

If you would like to contribute to Maki, please first install ClangFormat. On Ubuntu you can install ClangFormat by running:

sudo apt install clang-format

Please format all your changes to Maki's C++ code with ClangFormat before committing them. This is to ensure that the project's coding style remains consistent.

Patch files mapping

To decrease the length of file names used for the accepted and rejected linux patch submissions, the code uses file names PATCH_Ref_# (# is a number) as a substitute for the full name of the patch. The mapping of patch reference to submitted patch is as follows:

Accepted Patches: PATCH_Ref_1 == PATCH-1-1-staging-gdm724x-Replace-macro-GDM_TTY_READY-with-static-inline-function.mbox PATCH_Ref_2 == PATCH-media-atomisp-pci-hive_isp_css_common-host-vmem-Replace-SUBWORD-macros-with-functions.mbox PATCH_Ref_3 == PATCH-media-atomisp-pci-sh_css-Replace-macro-STATS_ENABLED-with-function.mbox PATCH_Ref_4 == PATCH-media-imx-imx-media-fim-Replace-macro-icap_enabled-with-function.mbox PATCH_Ref_5 == PATCH-staging-media-atomisp-pci-Replace-bytes-macros-with-functions.mbox PATCH_Ref_6 == PATCH-v2-staging-greybus-gpio-Replace-macro-irq_data_to_gpio_chip-with-function.mbox

Rejected Patches: PATCH_Ref_7 == PATCH-mdeia-ipu3-ipu33-mmu-Replace-macro-IPU3_ADDR2PTE-with-a-function.mbox PATCH_Ref_8 == PATCH-staging-iio-frequency-ad9832-Replace-macro-AD9832_PHASE-with-function.mbox

About

A tool for analyzing syntactic and semantic properties of C Preprocessor macros in C programs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 78.4%
  • C++ 13.8%
  • Python 7.1%
  • Other 0.7%