Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure auto-sync docs to have them more contained #2355

Merged
merged 5 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
127 changes: 26 additions & 101 deletions docs/AutoSync.md → suite/auto-sync/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
# Auto-Sync

`auto-sync` is the architecture update tool for Capstone.
Because the architecture modules of Capstone use mostly code from LLVM,
we need to update this part with every LLVM release. `auto-sync` helps
with this synchronization between LLVM and Capstone's modules by
automating most of it.
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->

You can find it in `suite/auto-sync`.
# Architecture of the Auto-Sync framework

This document is split into four parts.

Expand All @@ -15,8 +12,8 @@ This document is split into four parts.
3. Instructions how to refactor an architecture to use `auto-sync`.
4. Notes about how to add a new architecture to Capstone with `auto-sync`.

Please read the section about architecture module design in
[ARCHITECTURE.md](ARCHITECTURE.md) before proceeding.
Please read the section about capstone module design in
[ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) before proceeding.
The architectural understanding is important for the following.

## Update procedure
Expand Down Expand Up @@ -98,102 +95,30 @@ _Note_: For details about this checkout `suite/auto-sync/CppTranslator/README.md
Because the result of the `CppTranslator` is not perfect,
we still have many syntax problems left.

Those need to be fixed by hand.
In order to ease this process we run the `Differ` after the `CppTranslator`.

The `Differ` parses each _translated_ file and the corresponding source file _currently_ used in Capstone.
It then compares specific nodes from the just translated file to the equivalent nodes in the old file.

The user can choose if she accepts the version from the translated file or the old file.
This decision is saved for every node.
If there exists a saved decision for a node, the previous decision automatically applied again.

Every other syntax error must be solved manually.
Those need to be fixed partially by hand.

## Update an architecture
**Differ**

To update an architecture do the following:

Rebase `llvm-capstone` onto the new LLVM release (if not already done).
```
# 1. Clone Capstone's LLVM
git clone https://github.com/capstone-engine/llvm-capstone
cd llvm-capstone
git checkout auto-sync

# 2. Rebase onto the new LLVM release and resolve the conflicts.

# 3. Build tblgen
mkdir build
cd build
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD=<ARCH> -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug

# 4. Run the updater
cd ../../suite/auto-sync/
./Updater/ASUpdater.py -a <ARCH>
```

The update script will execute the steps described above and copy the new files to their directories.

Afterward try to build Capstone and fix any build errors left.

If new instructions or operands were added, add test cases for those
(recession tests for instructions are located in `suite/MC/`).

TODO: Operand and detail tests
<!--
TODO: Wait until `cstest` is rewritten and add description about operand testing.
Issue: https://github.com/capstone-engine/capstone/issues/1984
-->

## Refactor an architecture for `auto-sync`

To refactor an architecture to use `auto-sync`, you need to add it to the configuration.

1. Add the architecture to the supported architectures list in `ASUpdater.py`.
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)

Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:

```
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
```

The task after this is to:

- Replace leftover C++ syntax with its C equivalent.
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
- Add any missing logic to the translated files.
- Make it build and write tests.
- Run the Differ again and always select the old nodes.

**Notes:**

- If you find yourself fixing the same syntax error multiple times,
please consider adding a `Patch` to the `CppTranslator` for this case.

- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.

- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.

- Sometimes the LLVM code uses a single function from a larger source file.
It is not worth it to translate the whole file just for this function.
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
In order to ease this process we run the `Differ` after the `CppTranslator`.

- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
The `Differ` compares our two versions of C files we have now.
One of them are the C files currently used by the architecture module.
On the other hand we have the translated C files. Those are still faulty and need to be fixed.

```
// generate content <FILENAME.inc> begin
// generate content <FILENAME.inc> end
```
Most fixes are syntactical problems. Those were almost always resolved before, during the last update.
The `Differ` helps you to compare the files and let you select which version to accept.

The update script will insert the content of the `.inc` file at this place.
Sometimes (not very often though), the newly translated C files contain important changes.
Most often though, the old files are already correct.

## Adding a new architecture
The `Differ` parses both files into an abstract syntax tree and compares certain nodes with the same name
(mostly functions).

Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.
The user can choose if she accepts the version from the translated file or the old file.
This decision is saved for every node.
If there exists a saved decision for two nodes, and the nodes did not change since the last time,
it applies the previous decision automatically again.

Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.
The `Differ` is far from perfect. It only helps to automatically apply "known to be good" fixes
and gives the user a better interface to solve the other problems.
But there will still be syntax errors left afterward. These must be fixed by hand.
114 changes: 95 additions & 19 deletions suite/auto-sync/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,19 @@
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
Copyright © 2024 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->

# Architecture updater
# Architecture updater - Auto-Sync

This is Capstones updater for some architectures.
Unfortunately not all architectures are supported yet.
`auto-sync` is the architecture update tool for Capstone.
Because the architecture modules of Capstone use mostly code from LLVM,
we need to update this part with every LLVM release. `auto-sync` helps
with this synchronization between LLVM and Capstone's modules by
automating most of it.

## Install dependencies
Please refer to [intro.md](intro.md) for an introduction about this tool.

## Install

Setup Python environment and Tree-sitter

Expand All @@ -20,11 +24,25 @@ sudo apt install python3-venv
# Setup virtual environment in Capstone root dir
python3 -m venv ./.venv
source ./.venv/bin/activate
```

Install Auto-Sync framework

```
cd suite/auto-sync/
pip install -e .
```

## Update
## Architecture

Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.

This step is essential! Please don't skip it.

## Update an architecture

Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
Not all arch-modules support Auto-Sync yet.

Check if your architecture is supported.

Expand Down Expand Up @@ -52,6 +70,14 @@ Run the updater
./src/autosync/ASUpdater.py -a <ARCH>
```

## Update procedure

1. Run the `ASUpdater.py` script.
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
and update them if necessary.
3. Try to build Capstone and fix the build errors.


## Post-processing steps

This update translates some LLVM C++ files to C.
Expand All @@ -60,7 +86,7 @@ you will get build errors if you try to compile Capstone.

The last step to finish the update is to fix those build errors by hand.

## Developer
## Additional details

### Overview updated files

Expand Down Expand Up @@ -96,14 +122,7 @@ Those files are written by us:
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
- `<ARCH>Module.*`: Interface to the Capstone core.

### Update procedure

1. Run the `ASUpdater.py` script.
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
and update them if necessary.
3. Try to build Capstone and fix the build errors.

### Update details
### Relevant documentation and troubleshooting

**LLVM file translation**

Expand All @@ -129,9 +148,66 @@ Documentation about the `.inc` file generation is in the [llvm-capstone](https:/

**Formatting**

- If you make changes to the `CppTranslator` please format the files with `black`
- If you make changes to the `CppTranslator` please format the files with `black` and `usort`
```
source ./.venv/bin/activate
pip3 install black
python3 -m black --line-length=120 CppTranslator/*/*.py
pip3 install black usort
python3 -m usort format src/autosync
python3 -m black src/autosync
```

## Refactor an architecture for Auto-Sync framework

Not all architecture modules support Auto-Sync yet.
Here is an overview of the steps to add support for it.

<hr>

To refactor one of them to use `auto-sync`, you need to add it to the configuration.

1. Add the architecture to the supported architectures list in `ASUpdater.py`.
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)

Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:

```
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
```

The task after this is to:

- Replace leftover C++ syntax with its C equivalent.
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
- Add any missing logic to the translated files.
- Make it build and write tests.
- Run the Differ again and always select the old nodes.

**Notes:**

- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):

```
// generate content <FILENAME.inc> begin
// generate content <FILENAME.inc> end
```

The update script will insert the content of the `.inc` file at this place.

- If you find yourself fixing the same syntax error multiple times,
please consider adding a `Patch` to the `CppTranslator` for this case.

- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.

- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.

- Sometimes the LLVM code uses a single function from a larger source file.
It is not worth it to translate the whole file just for this function.
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.

## Adding a new architecture

Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.

Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.