Modernize Capstone testing #1984

Rot127 · 2023-04-03T13:42:07Z

Testing Capstones disassembly results is possible via too many ways.

Test binaries print (but not test!) the values of instruction detail (e.g. which operands are written and read etc.).
MC tests taken from LLVMs MC test files (in suite/MC). They only test the disassembly of bytes to their assembly strings. To test those a:

Python script exists
And the cstest binary.

Issue tests (files which name is issue.cs). They look very similar to the MC tests, but are not related. They can test detail information as well. cstest processes them.

This is very confusing and could be unified.

I propose to:

Get rid of the test_<arch> binaries, because they hard code every test.
Get rid of the Python scripts. Because they are a duplicate of cstest.
Rewrite cstest

cstest

cstest should be written from scratch. It needs modernization anyway (e.g. remove global variables) and we could settle on a single test file format.

This new format should support simple bytes <-> assembly string testing, as well as testing the content of cs_detail.
Once this is done we can also write scripts to translate LLVMs MC regression tests to our file format.

Before the v6 release we should also test every possible detail setting for correctness. See #1984 (comment)

CI

Test build of bindings and their correct working.
Test build with -DCAPSTONE_DIET
Add Coverty scans (Setup Coverity GitHub Action #2029)
Test all additional features like skipping bytes (see bugs like: skipdata doesn't work correctly from python #2336)

The text was updated successfully, but these errors were encountered:

Rot127 · 2023-04-03T20:33:44Z

Also considering #1281 when generating tests.

Rot127 · 2023-04-06T11:41:47Z

The current MC test assume that a byte string can be disassembled independently from other byte strings. This is not true for instructions which depend on each other (IT/VPT blocks in ARM for example).
This needs to be considered.

Rot127 · 2023-05-05T16:24:52Z

@kabeor @aquynh We thought about choosing yaml as file format for the test files.
It's better to read then json and can be used for the bindings as well.
Also there are libraries for yaml in most distros.
Any opinions?

kabeor · 2023-05-06T02:42:33Z

@Rot127 Sounds make sense. Would you like to show us an code example?

Rot127 · 2023-05-06T12:51:12Z

I'll add one once the ~~PPC~~ AArch64 refactor is coming to an end.

Rot127 · 2023-05-30T18:59:47Z

Testing the build of bindings and their correct function isn't tested in the CI jobs as well (#2034 (comment)).

XVilka · 2023-05-31T10:00:01Z

See also #1954

peace-maker · 2023-07-27T09:38:54Z

To verify that the bindings lead to the same output as the raw C library, currently there are tests which reimplement the test_* binaries in the binding language including the same exact detail output and just diff the output of the test_* binary with e.g. the test_*.py script.

One could write a test runner in every supported binding language which can be triggered by the cstest program and outputs in the same format as the detail printers in cstest.

Currently there are multiple places where the instructions are printed which all have to be updated when the structs change:

In cstool
In the separate test_* binaries
In cstest
(In every binding (python, java))

Only having one place to stringify an instruction would streamline the update process. When using cstool as the baseline, instruction printing could be split into a separate library with arm_insn_to_string functions which only handles rendering the instruction details to a string array. Printing and reformating those parts can be done by the consumers.

Rot127 · 2023-07-27T10:48:27Z

Only having one place to stringify an instruction would streamline the update process.

I very much like the idea to centralize the printing of instructions. Would you mind putting this into its own issue?

One could write a test runner in every supported binding language which can be triggered by the cstest program and outputs in the same format as the detail printers in cstest

I am not sure if I understand you here correctly. But I think testing the instruction details by comparing strings between cstool and bindings output is a bad idea.

If we have the data to test in a yaml file, we can just as well test the objects directly.
This would also allow to test for binary stuff.

XVilka · 2023-08-05T17:09:44Z

Using YAML to test the data layout is also a custom in the LLVM project:

XVilka · 2023-08-05T17:12:24Z

Only having one place to stringify an instruction would streamline the update process. When using cstool as the baseline, instruction printing could be split into a separate library with arm_insn_to_string functions which only handles rendering the instruction details to a string array. Printing and reformating those parts can be done by the consumers.

The best approach would be a new small library in the same repository - libcsprint or something. It could be used on its own even outside cstool this way then.

Rot127 · 2023-08-31T17:45:32Z

Just an idea how to test every possible execution path and generate a set of instructions which cover every possible detail combination.

Depending on architecture the number of possible unique ways to set the detail struct should be around <num operand groups> * 2 (just a guess, but multiple groups share a single execution path, while some have a very diverse one. See add_cs_detail()).

For AArch64 for example this would be roughly 162 * 2 (a very large number in comparison to other archs).

To determine the execution path for each operand group:

Execute every possible encoding (assuming 4byte instructions) and write valid decoding to a file.

Alternatively we can als just test every encoding from suite/MC/

For every valid encoding, run gcov on it and diff the coverage graph of all add_cs_detail() calls.

If it contains a unique add_cs_detail() path, mark this encoding as valid test.
If not, decode next instruction.

The resulting 200-300 instructions can be checked manually for valid details.
Then add tests manually for paths which couldn't be reached with this method (e.g. PPC prefixed instructions).

XVilka · 2023-09-22T08:39:11Z

Once it's done some files could be removed, e.g.:

suite/regress/*

XVilka mentioned this issue Jun 13, 2023

Fix Python bindings after changes to cs_detail #2041

Merged

XVilka mentioned this issue Jul 18, 2023

PowerPC regressions in v5.0 #2087

Closed

Rot127 mentioned this issue Jul 26, 2023

Fix running cstest in CI #2126

Merged

This was referenced Aug 25, 2023

auto-sync progress tracker: Refactor and implement architectures #2015

Open

Apple compatibility changes for #2026 + misc. AARCH64 naming cleanups. Rot127/capstone#1

Closed

Rot127 mentioned this issue Dec 21, 2023

By default compile as universal2 for macOS #2221

Merged

Rot127 mentioned this issue Feb 28, 2024

Invalid information for ARM memory operand on next #2281

Open

Rot127 added this to the v6 milestone Mar 19, 2024

Rot127 added the Testing Test related issue label Mar 19, 2024

Rot127 linked a pull request Jul 4, 2024 that will close this issue

Modern testing #2384

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize Capstone testing #1984

Modernize Capstone testing #1984

Rot127 commented Apr 3, 2023 •

edited

Loading

Rot127 commented Apr 3, 2023

Rot127 commented Apr 6, 2023

Rot127 commented May 5, 2023 •

edited

Loading

kabeor commented May 6, 2023 •

edited

Loading

Rot127 commented May 6, 2023 •

edited

Loading

Rot127 commented May 30, 2023

XVilka commented May 31, 2023

peace-maker commented Jul 27, 2023

Rot127 commented Jul 27, 2023

XVilka commented Aug 5, 2023

XVilka commented Aug 5, 2023

Rot127 commented Aug 31, 2023 •

edited

Loading

XVilka commented Sep 22, 2023

Modernize Capstone testing #1984

Modernize Capstone testing #1984

Comments

Rot127 commented Apr 3, 2023 • edited Loading

Rot127 commented Apr 3, 2023

Rot127 commented Apr 6, 2023

Rot127 commented May 5, 2023 • edited Loading

kabeor commented May 6, 2023 • edited Loading

Rot127 commented May 6, 2023 • edited Loading

Rot127 commented May 30, 2023

XVilka commented May 31, 2023

peace-maker commented Jul 27, 2023

Rot127 commented Jul 27, 2023

XVilka commented Aug 5, 2023

XVilka commented Aug 5, 2023

Rot127 commented Aug 31, 2023 • edited Loading

XVilka commented Sep 22, 2023

Rot127 commented Apr 3, 2023 •

edited

Loading

Rot127 commented May 5, 2023 •

edited

Loading

kabeor commented May 6, 2023 •

edited

Loading

Rot127 commented May 6, 2023 •

edited

Loading

Rot127 commented Aug 31, 2023 •

edited

Loading