This repository contains the code and data produced for the paper Challenges of Producing Software Bill Of Materials for Java (IEEE Security & Privacy, 2023).
@article{sbomchallenges,
title = {Challenges of Producing Software Bill Of Materials for Java},
journal = {IEEE Security \& Privacy},
year = {2023},
doi = {10.1109/MSEC.2023.3302956},
author = {Musard Balliu and Benoit Baudry and Sofia Bobadilla and Mathias Ekstedt and Martin Monperrus and Javier Ron and Aman Sharma and Gabriel Skoglund and César Soto-Valero and Martin Wittlinger},
url = {http://arxiv.org/pdf/2303.11102},
}The structure of the repository is as follows:
sbom-productioncontains all scripts used for creating CycloneDX SBOM files for each of the 26 study subjects using 6 different SBOM producers.ground-truth-productioncontains all scripts used for extracting a ground truth dataset of dependency trees for each study subject.metrics-computationcontains all code used for computing metrics relating to the performance of the SBOM tools.results-march-2023contains all experimental data.sbom2023_plotcontains additional code and resources related to the creation of figures for the paper.
The performance of the following 6 CycloneDX SBOM producers were studied:
These are the latest versions as of
Fri 5 May 2023 13:02:33 CEST.
| Producer | Version |
|---|---|
| Build Info Go | 1.9.3 |
| CycloneDX Generator | 8.4.3 |
| CycloneDX Maven Plugin | 2.7.8 |
| jbom | 1.2.1 |
| OpenRewrite | 4.45.0 |
| Depscan | 4.1.2 |
The following versions of 26 Java projects using Maven were selected as study subjects:
| # | GitHub Repository | Commit Hash | Stable release as of 01.01.23 |
|---|---|---|---|
| 1 | jenkins | ce7e5d7 | 2.384 |
| 2 | mybatis-3 | c195f12 | 3.5.11 |
| 3 | flink | c41c8e5 | 1.15.3 |
| 4 | checkstyle | 233c91b | 10.6.0 |
| 5 | CoreNLP | f7782ff | 4.5.1 |
| 6 | neo4j | c082e80 | 5.3.0 |
| 7 | async-http-client | 7a370af | 2.12.3 |
| 8 | error-prone | 27de40b | 2.17.0 |
| 9 | alluxio | d5919d8 | 2.9.0 |
| 10 | javaparser | 1ae25f3 | 3.15.15 |
| 11 | undertow | f52b70c | 2.3.2.Final |
| 12 | webcam-capture | e19125c | 0.3.12 |
| 13 | handlebars.java | 2afc50f | 4.2.1 |
| 14 | jooby | f71b551 | 3.0.0.M1 |
| 15 | tika | 41319f3 | 2.6.0 |
| 16 | orika | eef8209 | 1.5.4 |
| 17 | spoon | ee73f43 | 10.2.0 |
| 18 | accumulo | 706612f | 2.1.0 |
| 19 | couchdb-lucene | 8554737 | 2.1.0 |
| 20 | jHiccup | a440bda | 2.0.10 |
| 21 | vulnerability-assessment-tool | 3d261af | 3.2.5 |
| 22 | para | 41d9005 | 1.47.2 |
| 23 | launch4j-maven-plugin | 3f9818e | 2.2.0 |
| 24 | jacop | 1a395e6 | 4.9.0 |
| 25 | selenese-runner-java | 3e84e8e | 4.2.0 |
| 26 | commons-configuration | 59e5152 | 2.8.0 |
If you are interested in reproducing our results, the script reproduce.sh is provided for your convenience. This script will do the following:
- Generate SBOMs for each study subject and SBOM producer.
- Extract ground truth dependency information from each study subject.
- Calculate the accuracy/precision for each SBOM producer and compare these values with our results, outputting whether the values match or not.
⚠️ Please note that this script can take a considerable amount of time (~2 hours on a laptop) since SBOM production needs to be carried out by 6 different producers on 26 different study subjects.
- Java version 17 or newer
- Apache Maven
- Docker
- Python 3.10 or newer