Skip to content

Revise x86_64 inline asm: Fix for LDC v1.29+, and enable on Windows and with GDC too#458

Merged
RazvanN7 merged 2 commits intodlang-community:masterfrom
kinke:asm
Apr 29, 2022
Merged

Revise x86_64 inline asm: Fix for LDC v1.29+, and enable on Windows and with GDC too#458
RazvanN7 merged 2 commits intodlang-community:masterfrom
kinke:asm

Conversation

@kinke
Copy link
Contributor

@kinke kinke commented Apr 28, 2022

Resolves #457.

@kinke
Copy link
Contributor Author

kinke commented Apr 28, 2022

The GDC/GCC-style inline asm syntax is also inline-able with LDC, unlike naked DMD-style one. IIRC, LDC v1.22 added support for that syntax.

@WebFreak001
Copy link
Member

@dd86k can you review this?

@codecov
Copy link

codecov bot commented Apr 28, 2022

Codecov Report

Merging #458 (5ed52e3) into master (b772900) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #458      +/-   ##
==========================================
+ Coverage   82.85%   82.86%   +0.01%     
==========================================
  Files          11       11              
  Lines        8315     8321       +6     
==========================================
+ Hits         6889     6895       +6     
  Misses       1426     1426              
Impacted Files Coverage Δ
src/dparse/lexer.d 85.67% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b772900...5ed52e3. Read the comment docs.

@kinke
Copy link
Contributor Author

kinke commented Apr 28, 2022

FWIW, here's the almost-identical GCC-style one for C++, supported by gcc and clang: godbolt

@dd86k
Copy link

dd86k commented Apr 28, 2022

@dd86k can you review this?

@WebFreak001: I don't mind testing this, but what would I be able to test, or should I expect from this PR, precisely?

@kinke
Copy link
Contributor Author

kinke commented Apr 28, 2022

The unittest runner (dub test) would fail for me without this change with LDC v1.29 on Linux (#457, obviously with a CPU with SSE 4.2), now works. I haven't tested it on Windows, nor with GDC.

@dd86k
Copy link

dd86k commented Apr 29, 2022

To resume: Run tests with this PR on a Linux system with SSE4.2 using LDC 1.29.0 and GDC? Optionally on Windows?

@kinke
Copy link
Contributor Author

kinke commented Apr 29, 2022

Yep, that should cover everything I think. - Note that LDC versions < 1.22 couldn't be used anymore to compile libdparse; that would require some more static if + enum business to use the DMD-style syntax for those.

@dd86k
Copy link

dd86k commented Apr 29, 2022

Cloned https://github.com/kinke/libdparse asm onto my machine.

Commit: 0.19.1+commit.1.g378e41e

Tests using dub test --compiler=ldc2 and dub test --compiler=gdc respectively:

  • Linux x86_64 5.16.19
    • LDC 1.28.0: 8 modules passed unittests
    • GDC 11.2.0: All unit tests have been run successfully.
  • Windows 10 21H1 x86_64 10.19044.1645
    • LDC 1.28.1: 8 modules passed unittests
  • Windows 11 21H2 x86_64 10.22000.613
    • LDC 1.29.0: 8 modules passed unittests
    • Note: VS2019 Build Tools + VS2022 Community installed

Didn't test GDC on Windows because last time I tried using GDC on Windows last year, it couldn't compile D code at all.

Hope this answers it for you! I'll be here if you need anything else.

Processor info

Linux (host):

Name:        AuthenticAMD AMD Ryzen 9 5950X 16-Core Processor            
Identifier:  Family 0x19 Model 0x21 Stepping 0x0
Cores:       16 cores, 32 threads
Max. Memory: 256 TiB physical, 256 TiB virtual
Baseline:    x86-64-v3
Techs:       core-performance-boost htt
Extensions:  x87/fpu +f16c mmx extmmx amd64/x86-64 +lahf64 amd-v/vmx +svm=v1 aes-ni adx sha bmi1 bmi2
SSE:         sse sse2 sse3 ssse3 sse4.2 sse4a
AVX:         avx avx2
AMX:         None
Mitigations: ibpb ibrs ibrs_pref stibp stibp_on ssbd
Cache L1-D:   16x   32 KiB,  512 KiB total, si
Cache L1-I:   16x   32 KiB,  512 KiB total, si
Cache L2-U:   16x  512 KiB,    8 MiB total, si ci
Cache L3-U:    2x   32 MiB,   64 MiB total, si nwbv

Windows 10 (guest, should be same for 11):

Name:        AuthenticAMD AMD Ryzen 9 5950X 16-Core Processor
Identifier:  Family 0x19 Model 0x21 Stepping 0x0
Cores:       2 cores, 4 threads
Max. Memory: 256 TiB physical, 256 TiB virtual
Baseline:    x86-64
Techs:       htt
Extensions:  x87/fpu mmx extmmx amd64/x86-64 +lahf64 amd-v/vmx +svm=v1 aes-ni
SSE:         sse sse2 sse3 ssse3 sse4.2 sse4a
AVX:         avx avx2
AMX:         None
Mitigations:
ParaVirt.:   Hyper-V
Cache L1-D:    2x   32 KiB,   64 KiB total, si
Cache L1-I:    2x   32 KiB,   64 KiB total, si
Cache L2-U:    2x  512 KiB,    1 MiB total, si ci
Cache L3-U:    1x   32 MiB,   32 MiB total, si nwbv

* given $(B chars).
*/
void skip(bool matching, chars...)(const ubyte*, ulong*, ulong*) pure nothrow
void skip(bool matching, chars...)(const ubyte* bytes, ulong* pindex, ulong* pcolumn) pure nothrow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these attributes not be inferred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, but I didn't want to mix in other refactorings (the pointers would have been gone first and replaced by refs obviously...).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think avoiding refactoring solely on principle is a good way to kill progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't avoid it by principle, I decide from time to time. The point here is clearly asm, not tampering with a codebase I've seen first yesterday and don't plan to contribute to regularly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no one contributes to this codebase regularly ;)

@kinke
Copy link
Contributor Author

kinke commented Apr 29, 2022

[pushed a little extension to make LDC < v1.22 work again]

@maxhaton
Copy link

In some quick tests I did, this whitespace skipping approach doesn't seem to be particularly quick, I'm not convinced there's all that much point in keeping this at all.

@RazvanN7
Copy link
Contributor

Merging this to unblock the D-Scanner pipeline

@RazvanN7 RazvanN7 merged commit b94a157 into dlang-community:master Apr 29, 2022
@edi33416
Copy link

In some quick tests I did, this whitespace skipping approach doesn't seem to be particularly quick, I'm not convinced there's all that much point in keeping this at all

I agree. While I like reading some assembly as much as the next person, I don't think we should use inline asm anymore.
I think this code will hinder better optimizations, such as using the ymm or zmm registers for the newer cpus.
I'm pretty sure that @dd86k 's 16 core would do better without the inline asm.

I believe that the inline asm should be removed in a future PR. This will prevent future issues like this one, and will enable the compiler to perform optimizations as the compiler and architectures evolve without requiring code patches.

@RazvanN7
Copy link
Contributor

This PR: dlang-community/D-Scanner#866 confirms that this fix is correct, however, we still need a new tag so that the dub test also passes. cc @WebFreak001

@WebFreak001
Copy link
Member

ok will tag release

@RazvanN7
Copy link
Contributor

@WebFreak001 Is there anything else that needs to be done for the dub registry to pick up the new tag or does it simply take a while before it gets updated? I see that the dub registry [1] still thinks that 0.19.1 is the latest tag.

[1] https://code.dlang.org/packages/libdparse

@WebFreak001
Copy link
Member

ping @Hackerpilot

otherwise wait for 0~8 hours

@RazvanN7
Copy link
Contributor

ok, no problem, I can wait. I just wanted to make sure that we are on track. Thanks for your assistance!

@kinke kinke deleted the asm branch April 29, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Program exited with code -11 on not Windows and ldc-1.29.0

6 participants

Comments