Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: Proposal to Increase Fuzzing Test Coverage #1050

Open
DaveLak opened this issue Mar 6, 2024 · 2 comments
Open

question: Proposal to Increase Fuzzing Test Coverage #1050

DaveLak opened this issue Mar 6, 2024 · 2 comments
Labels

Comments

@DaveLak
Copy link

DaveLak commented Mar 6, 2024

Question

Hi,
I was looking over fuzzing test coverage for projects integrated with OSS-Fuzz, and I noticed this project (integrated in #1030) has relatively low coverage. It appears the existing test setup is blocking some of the instrumentation from effectively analyzing what is being tested.

I've read CONTRIBUTING.md, and understand the request to outline design decisions before implementation; however, improving it felt like a good learning exercise so I took a pass at trying to improve it as I wasn't sure what questions I would need to answer upfront. With that said, I take no offense if you aren't interested in what I'm proposing, but I felt my results were worth sharing in case you find them valuable.

TL;DR

OSS-Fuzz has two tools that measure code coverage of tests:

The OSS-Fuzz coverage sanatizer.

Useful for measuring basic branch coverage of tests (it uses Coverage.py under the hood.)

Areas of Improvement

  • The current coverage (28.33%) is not insignificant but not comprehensive, leaving some blindspots in the tests.
  • VBA binary embedding & image embedding are not currently covered in the fuzz targets despite being two of the more interesting APIs from a security perspective.

Fuzz Introspector

Useful for fuzz test insights & debugging. Features include static analysis of the libraries under test to determine coverage and instrumentation blockers in the tests & source code.

Areas of Improvement

  • The current test helper implementation prevents the instrumentation from gathering meaningful insights. Fixing this can improve observability into test efficiency and help identify gaps in coverage.

Changes With my Proposal

In addition to increased coverage of security-relevant APIs, my current change-set does the following:

Metric Value Before Value After Change
Fuzz Test Count 1 9 +8
Lines of code covered (OSS-Fuzz) 3,073 6,303 +3,230 lines
Code coverage % (OSS-Fuzz) 28.33% 58% +29.67 percentage points
Functions statically reachable by fuzzers (Fuzz Introspector) 0.0% 44.0% +44.0 percentage points
Runtime code coverage of functions (Fuzz Introspector) 2.4% 63.0% +60.6 percentage points

Additional Details

Background & Current State (The Problem)

Looking at the public Fuzzing Introspection page for this project, I noticed that the listed code coverage is 28.33%, while the static reachability is 0.00%. There appear to be two reasons for this:

  • The code coverage report is generated via the OSS-Fuzz coverage sanatizer while the Fuzz Introspector report populates the static reachability metric.
  • Both reports appear to indicate the xlsx_fuzzer.py fuzz target does not cover much of the functionality of this library, and more significantly, the Fuzz Introspector report indicates that fuzz_helpers.py code is blocking the instrumentation from conducting deeper analysis.

My Proposal (The Solution?)

Following the guidance outlined in the best practices section(s) of the OSS-Fuzz docs, I have experimented with the following:

  1. Refactoring the existing fuzzing test code to remove instrumentation blockers in the utility class.
  2. Replacing the existing fuzz target with a new basic_fuzzer.py target that exercises a significant portion of the xlsxwritter.worksheet feature set.
  3. Adding several new fuzz targets to exercise functionality that is complex or difficult to cover in the basic tests (e.g., VBA binary files, embedded images, different permutations of charts, formulas, formatting, etc.)
  4. Generated seed corpus and dictionary files for each fuzz target to give the fuzzing engine a jumpstart by providing starting inputs that are known to invoke different code paths (so the fuzzing engine can spend its cycles finding new ones.)
  5. Updating the build.sh script to support passing all the above to the appropriate location inside the Fuzz container.

Results

The table in the TL;DR section above outlines the primary improvements. Beyond that, I found:

2 possible bugs:

  1. An unhandled exception is raised when None is passed in specific arguments.
  2. An unhandled exception is raised when passing UTF-8 surrogate characters in specific arguments. This issue might be similar to:

Both feel rather a low impact to me. However, I am unsure of the full threat model for this library, so I won't go into further detail or put up my PR until I hear back.

Please let me know the best way to share my reproduction cases with you! I am happy to post them in a private fork and add you as a contributor chat in email or anything else really.

Next Steps (for your consideration)

If the changes outlined in my proposal above sound like something you would find valuable, the implementation steps to get this implemented are roughly:

  1. Please let me know what the best way to chat with you about the two (probably minor) bugs I may have found
  2. Please let me know if you have questions or concerns or if I need to include anything!
  3. I'll put my PR into draft on this repo, tag whoever is interested, and await feedback.
  4. Some minor changes to the Dockerfile in the OSS-Fuzz repo are required. Getting that merged is a somewhat coordinated effort, as your approval is also necessary for that PR. We would probably want to wait until both PRs (here and there) are approved and merge them at a similar time (there is a chance the daily run in ClusterFuzz may fail once, but I don't know that it matters if it's only one day.)
@jmcnamara
Copy link
Owner

Thanks for the detailed analysis and summary.

In reply to your steps.

  1. You can email at the address in the docs: https://xlsxwriter.readthedocs.io/author.html#asking-questions I could also turn on the "Private vulnerability reporting" for the repo if that help.
  2. I don't have any concerns.
  3. Sounds good.
  4. Sounds okay too.

@DaveLak
Copy link
Author

DaveLak commented Mar 6, 2024

Thanks for the quick reply.

You can email at the address in the docs: xlsxwriter.readthedocs.io/author.html#asking-questions

Great, I'll do that shortly.

I could also turn on the "Private vulnerability reporting" for the repo if that help.

That probably isn't necessary right now. I'll email a minimal reproduction case for each and some details and we can take it from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants