Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture updater (auto-sync) - Updating AArch64 #2026

Merged
merged 350 commits into from
Nov 15, 2023

Conversation

Rot127
Copy link
Collaborator

@Rot127 Rot127 commented May 20, 2023

Draft PR for the auto-sync refactor of AArch64.

  • Make Capstone build with generated and translated files.
  • Add details
    • Implement operand handling
    • Fix td files
      • SME vector lead/store/read/write have incorrect outs/ins set
  • Update and add tests
  • za writes are not listed in register write list
  • Fix ARM Disassembler to set Decoder arg Do during LLVM 17 update
  • Add AArch64_GRP_PRIVILEGE again
  • Handle detail in printSyspAlias printSysAlias
  • Handle alias
  • Brackets in some names: ID: 73 (bfmlalb{)
  • Make sure post index works well.
  • 'A0 4D DF 4C' op 4 missing
  • 0x21c8df4d disp missing.
  • Missing #0 imm for '\x20\x98\xe0\x5e'
  • MIssing imm 3000df4c

Todo after merge

  • Fix this 1am decision and rename HelperMethods.py to Helpers.py.
  • New Decoder pointer arg must be added to ARM, PPC, TriCore.
  • Add CS_OP_MEM_REG and CS_OP_MEM_IMM op types to prevent confusion that the CS_OP_MEM is a flag within the type.
  • Generate CAPSTONE_DIET builds into all printOperand() functions.
  • lookupByName for ARM
  • LLVM bug (42e9dec) for bfc alias.
  • U/SBFM alias modulo change depending on S and N bit. (wait for encoding info).

Breaking changes

  • Renamed all ARM64 -> AArch64 (for filenames, enums variable names). Necessary to be consistent with LLVM.
  • SME operands changed (contin more detail, terminology is closer to the docs).
  • System operands change (now categorized into SysAlias, SysImm, SysReg).

Closes

closes #1857
closes #1853
closes #1854
closes #1876 (matches LLVM disassembly now, which is considered source of truth).
closes #1881
closes #1887
closes #1890
closes #1895
closes #1953
closes #2055
closes #2072
closes #2130
closes #1842
closes #1760
closes #1670
closes #1669
closes #1652
closes #1629
closes #1621
closes #1620
closes #1591

wait for discussion: #1889

@Rot127
Copy link
Collaborator Author

Rot127 commented May 20, 2023

@kabeor @aquynh @XVilka Any objections against renaming ARM64 to AArch64?

LLVM refers to it as AArch64 and Capstone is inconsistent with it. So would you agree that we should stick to only one of those names (I'd prefer AArch64 so we are closer to LLVM)?
Edit: (AArch64 would also mean less if statements in the update scripts).

@kabeor
Copy link
Member

kabeor commented May 20, 2023

@Rot127 Not at all. However, do this mean that users need change their code if they update from v4? (Maybe alias for arm64_xxx like functions needed?)

@Rot127
Copy link
Collaborator Author

Rot127 commented May 20, 2023

However, do this mean that users need change their code if they update from v4?

Yes. Everything which was hand edited in Capstone uses ARM64 as prefix (file-, enum-, function names etc.). But code from LLVM uses AArch64.

If we decide to use ARM64 or leave the current inconsistent naming, we add complexity to the update scripts and the backends. Because we generate most enums and mapping tables now with the backends, we would need to check the target name every time it is used (and emit ARM64 instead of AArch64).

The decision is basically:

Change to ARM64

Pro: Saves users to trouble to update their code.
Con: On the cost of adding more (maybe a lot) complexity to auto-sync

Use AArch64 everywhere

The other way around.

Edit: Personally I would still like to go the AArch64 way for easier maintainability. Also there are many changes coming up anyway. So the users are only forced once to adapt their code.
On the other hand, I can not judge how deeply CS is integrated in other projects.

@XVilka
Copy link
Contributor

XVilka commented May 20, 2023

No objections against renaming

@Rot127
Copy link
Collaborator Author

Rot127 commented May 22, 2023

Yes. Everything which was hand edited in Capstone uses ARM64 as prefix (file-, enum-, function names etc.). But code from LLVM uses AArch64.

@kabeor @aquynh Any more comments regarding this? Otherwise I would go forward with it.

@Rot127
Copy link
Collaborator Author

Rot127 commented May 23, 2023

@kabeor If there is not a really strong argument for capitalized target names in identifiers, I would like to revert this commit (capstone-engine/llvm-capstone@cef9d8a).
For AArch64 it gets really difficult to patch most of the usage out of the translated files. Also AArch64 identifiers weren't capitalized before anyway.

@Rot127 Rot127 mentioned this pull request Jun 4, 2023
@Rot127
Copy link
Collaborator Author

Rot127 commented Jun 4, 2023

@FinnWilkinson While updating AArcch64 I also do a lot of necessary refactor and renaming while at it. You added the SME extension and I would like to change the Matrix SME op as well. So the naming is more in line with the SME docs.

typedef struct aarch64_op_sme {
  aarch64_sme_op_type type; ///< AArch64_SME_OP_TILE, AArch64_SME_OP_TILE_VEC, AArch64_SME_OP_ACC_MATRIX
  aarch64_reg tile;  ///< Matrix tile register
  aarch64_reg slice_reg; ///< slice index reg
  uint8_t slice_offset;	  ///< slice index offset
} aarch64_op_sme;

The sme_op_types would follow the LLVM description of those ops:

  • AArch64_SME_OP_TILE: Single tile (only tile is set).
  • AArch64_SME_OP_TILE_VEC: Indexed tile (tile, slice_reg, slice_offset is set).
  • AArch64_SME_OP_ACC_MATRIX: Complete accumulator matrix (only tile is set to reg ZA)

Could you quickly give a thumbs up if this makes sense?

@R33v0LT R33v0LT mentioned this pull request Jul 2, 2023
XVilka referenced this pull request in qemu/qemu Jul 11, 2023
this are the changes from volumit
(https://github.com/volumit/qemu/commits/master) compacted into one
patch.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
@XVilka
Copy link
Contributor

XVilka commented Jul 20, 2023

Please rebase on top of the latest next

@Rot127
Copy link
Collaborator Author

Rot127 commented Jul 20, 2023

Do it after PPC is in a more polished state (in a few days).

@Rot127
Copy link
Collaborator Author

Rot127 commented Aug 24, 2023

@kabeor @aquynh The disassembler tests run through so far and every possible instruction was decoded at least once without segfault. Going to continue with Rizin integration now, which will test the details set (CI still fails because of the yet unfixed issues.cs). But in general everything should be in a good shape.
Feel free to start your review.

@XVilka
Copy link
Contributor

XVilka commented Sep 1, 2023

Since reviews take so long, I recommend starting them right now for this PR @kabeor @aquynh

@FinnWilkinson
Copy link
Contributor

@FinnWilkinson While updating AArcch64 I also do a lot of necessary refactor and renaming while at it. You added the SME extension and I would like to change the Matrix SME op as well. So the naming is more in line with the SME docs.

typedef struct aarch64_op_sme {
  aarch64_sme_op_type type; ///< AArch64_SME_OP_TILE, AArch64_SME_OP_TILE_VEC, AArch64_SME_OP_ACC_MATRIX
  aarch64_reg tile;  ///< Matrix tile register
  aarch64_reg slice_reg; ///< slice index reg
  uint8_t slice_offset;	  ///< slice index offset
} aarch64_op_sme;

The sme_op_types would follow the LLVM description of those ops:

  • AArch64_SME_OP_TILE: Single tile (only tile is set).
  • AArch64_SME_OP_TILE_VEC: Indexed tile (tile, slice_reg, slice_offset is set).
  • AArch64_SME_OP_ACC_MATRIX: Complete accumulator matrix (only tile is set to reg ZA)

Could you quickly give a thumbs up if this makes sense?

@Rot127 Appologies that I haven't seen this sooner... I've been watching this PR for a while and have completely missed this.

This looks good yes for SME. I presume that for SME2's new multi-vector operations & vector-groups this is encoded into the instruction enum rather than neededing to be another field of the aarch64_op_sme struct?
Example usage can be seen in this SUDOT SME2 instruction.

- Do not generate constants automatically (dscript is way too buggy).
- Update printing of details.
@Rot127 Rot127 marked this pull request as ready for review November 2, 2023 18:40
@Rot127
Copy link
Collaborator Author

Rot127 commented Nov 2, 2023

@kabeor @aquynh I open this now for review so you have enough time.

There are still minor things left. Mostly syntax changes in the Python tests (hence the failing tests) and some naming problems (the failing fuzz test). I will run the complete instruction space once through it in the next days to check for segfaults again. But generally the issues.cs pass and this will not change much.

I'll add the encoding after this is merged. Since this is already huge.

@Rot127
Copy link
Collaborator Author

Rot127 commented Nov 5, 2023

0x0 - 0xffffffff go all through now (with -d and without -d).

@Rot127
Copy link
Collaborator Author

Rot127 commented Nov 6, 2023

@kabeor @aquynh I think I'd like to address the last two points on the list with LLVM 17. Since it needs an update anyways. Besides that, I would really like to see this merged. Then we got the biggest part of this whole thing behind us.

@kabeor
Copy link
Member

kabeor commented Nov 8, 2023

Will merge it this weekend:)

@kabeor
Copy link
Member

kabeor commented Nov 12, 2023

@Rot127 LGTM, can you make a warning to ask them to upgrade their code when people who are still calling arm64xxx? Will merge after that.

@Rot127
Copy link
Collaborator Author

Rot127 commented Nov 12, 2023

@kabeor Where would you add such a warning?
Also mind that there is still the V6 guide.
What do you think about finishing the guide and make a v6 alpha release?
In the release page all the big changes could be listed as well as missing features (encoding, real/alias detail ARM etc.). This would also bring to attention, that there is quite some change going on.

@XVilka
Copy link
Contributor

XVilka commented Nov 12, 2023

I think such a warning, if necessary, should be done in a separate PR

@XVilka
Copy link
Contributor

XVilka commented Nov 13, 2023

@kabeor @aquynh, what do you think about merging this as is and further improvements in the following PRs? This one grew very big already, and merging it will help more widespread testing of this code.

@kabeor
Copy link
Member

kabeor commented Nov 14, 2023

Well I mean a warning comes from code.

For example, if a user directly upgrades to v6 without reading the documentation, when he still calls the previous code (containing arm64 API), the code should return a warning and stop running.

It's ok to be done in a separate PR, can I assign this to you? @Rot127

And please confirm, if there is nothing still being done in this PR, I will merge now.

@Rot127
Copy link
Collaborator Author

Rot127 commented Nov 14, 2023

And please confirm, if there is nothing still being done in this PR, I will merge now.

You can merge.

For example, if a user directly upgrades to v6 without reading the documentation, when he still calls the previous code (containing arm64 API), the code should return a warning and stop running.

Since we changed all the names, people wouldn't be able to build Capstone. Or are you speaking of prebuilds?

In the latter case yes sure. For cstool we could add a warning if someone passes arm64. You can assign this to me. libcapstone again will fail to link, since everything is renamed now.

But this is also why I think we should make some kind of pre-release. So everyone is aware and knows the details.

@kabeor
Copy link
Member

kabeor commented Nov 15, 2023

Cool, merged!

@kabeor kabeor merged commit d3eb79c into capstone-engine:next Nov 15, 2023
11 checks passed
@Rot127 Rot127 deleted the auto-sync-aarch64 branch June 15, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment