Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Currently, webasm are only have le(LittleEndian) mode. #1212

Closed
lygstate opened this issue Jun 3, 2018 · 31 comments
Closed

Currently, webasm are only have le(LittleEndian) mode. #1212

lygstate opened this issue Jun 3, 2018 · 31 comments

Comments

@lygstate
Copy link
Contributor

lygstate commented Jun 3, 2018

Is that possible add an BigEndian mode?
At least not stop the implementor to implement that.
Give a option.

@daminetreg
Copy link

It would allow way less portability of wasm files and data layouts for serialization IMHO and today majors modern platforms are all little endian. I mean I cannot think to anything else than PPC for bigendianness relevance.

I find the choices of having only one endianness really great and we rely on this strongly for the wasm product we are building currently.

@lygstate
Copy link
Contributor Author

lygstate commented Jun 3, 2018

PPC is what I want to support:) Cause for VxWorks and other aireplane industry, big endian are the most used version CPU:) And for a lot of existing code, big endian as default are settled. So if wasm support for big endian would be a big win for that.

@lygstate
Copy link
Contributor Author

lygstate commented Jun 3, 2018

For example, if I want to use qemu to simulate PPC under litlle endian machine, that would be faster:)
Cause we have good WASM Jit

@SimHacker
Copy link

SimHacker commented Jun 21, 2019

The endian ship has sailed. Welcome to the monoculture!

Lucky for you the PowerPC is designed to swap bytes really efficiently. ;)

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_lwbrx_lbx_lwbri_instrs.htm

It can also shift and mask very fast, since it was designed to efficiently emulate other instruction sets.

https://devblogs.microsoft.com/oldnewthing/20180810-00/?p=99465

@lars-t-hansen
Copy link
Contributor

I suppose endianness could in some sense be an attribute of the memory object, in the same way as shareability is -- one would only be allowed to use a memory in a module if the endianness attributes of the memory and the module's imported memory are equal.

Non-native-endian access is not always simple. It's true some architectures have load/store-with-reverse-endianness instructions, but do they have ditto atomic accesses? I'm inclined to doubt it. There would have to be a significant amount (not just now but over time) of big-endian-only software that could be compiled to wasm before requiring a big-endian mode would pay off. Like sticking to ieee-conforming floating point, sticking to little-endian simplifies a lot of things for most users and for the implementations.

@rossberg
Copy link
Member

I suppose endianness could in some sense be an attribute of the memory object

Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.

And then there is hardware that is neither little nor big endian, IIRC.

@lars-t-hansen
Copy link
Contributor

I suppose endianness could in some sense be an attribute of the memory object

Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.

I assume that you're referring to some floating point layouts being mixed-endian (eg little-endian within the word but the words in big-endian order)? I haven't seen those in a while; I know I encountered them on some ARM systems but that's over a decade ago.

And then there is hardware that is neither little nor big endian, IIRC.

I don't doubt it, though to my knowledge I've never worked on such a system myself and I don't know any concrete examples.

Even granting both of your objections: solving 99% of use cases instead of just 95% of use cases might be a worthwhile improvement. I'm not exactly advocating doing so, I'm mostly interested in probing the design space.

@Serentty
Copy link

I think instead of adding an explicit big endian mode to WebAssembly and introducing incompatibility, it would be better to add instructions which make it fast to deal with big endian data. One possibility is big endian load and store operations, but even just something like a single-byte i32.bswap instruction could allow big endian processing with very little overhead. JITs wouldn't actually have to perform a swap on a big endian architecture: they could fuse it with the memory access instruction next to it and compile it into a native big endian access.

@binji
Copy link
Member

binji commented Apr 13, 2020

Yep, adding an i32.bswap was discussed a while back (see FutureFeatures.md) and could be a nice small proposal.

@dtig
Copy link
Member

dtig commented Apr 13, 2020

In the interest of probing the design space,i32.bswap would work for MVP Wasm but if the intention was to support some of the in-progress proposals like Threads and/or SIMD, that might be somewhat more challenging. For example, atomic operations using a byte swap operation only might render the accesses to be non-atomic. In the SIMD proposal, we assume that only one 128-bit type is introduced, and the representations are interchangeable so while doable, adding a BigEndian mode would at minimum require additional byte swap operations, more may be needed to detect different representations and only swap when necessary.

@SoniEx2
Copy link

SoniEx2 commented Sep 11, 2020

as we mentioned in #1374, it might be interesting to create an LLVM backend that force-swaps every emitted store/load and it'd solve this problem while retaining backwards compatibility. this would be easily detectable by a hypothetical wasm->PPC compiler and retains all guarantees currently made by wasm, at the only expense of being slower on LE platforms. no need to change the binary format or create a new one, just need to change the compiler/LLVM.

@sunfishcode
Copy link
Member

It wouldn't just be a change to LLVM; it'd be a new C ABI. The existence of such a wasm C ABI might benefit a small number of people, but it would bubble up into many places throughout the ecosystem, creating extra work and confusion for a lot of people.

@SoniEx2
Copy link

SoniEx2 commented Sep 11, 2020

it would just be a change to LLVM. the new ABI would be a side-effect of the change more than anything.

it's not like wasm comes with a stdlib...

let it "bubble up", tbh. if you're using that switch, it's on you to make it work. and it should/will work if you compile everything you need with it.

@sunfishcode
Copy link
Member

There are libc implementations for wasm. They're widely used.

As an LLVM maintainer, I'm opposed to such a change landing upstream.

@SoniEx2
Copy link

SoniEx2 commented Sep 11, 2020

what if it just wasn't exposed and you had to go out of your way to enable it?

@sunfishcode
Copy link
Member

One of the main goals of wasm is to enable modules that run well on many platforms, however what you're describing is building different modules for different platforms. Also, even with a hidden option, there's a risk that it will grow in scope over time, a risk that people will misinterpret and/or misuse it, a risk that people will point to it as a precedent for adding more such features, and a risk that it could become a maintenance or development burden. Adding a new endianness to an LLVM target involves, among other things, adding a new target triple, and target triples end up getting a fair amount of visibility.

Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.

@SoniEx2
Copy link

SoniEx2 commented Sep 13, 2020

keep wasm little-endian, add stuff to LLVM to make up for it.

@sunfishcode
Copy link
Member

That would create a big-endian ABI, which risks creating a lot of extra work and confusion.

@lygstate
Copy link
Contributor Author

I think other than create a big-endian mode/ABI in Wasm, add enough instruction to make wasm running on big-endian machine don't loose performance is a better option

@SoniEx2
Copy link

SoniEx2 commented Sep 14, 2020

that's basically #1374 tho and there are things that make it impossible (mainly arrays and unions)

@ppmag
Copy link

ppmag commented Nov 26, 2020

Most portable serialization formats have network byte order (big-endian).
I'm expecting significant overhead in my serialization code - I need to swap every size prefix, not just ints itself...

Idea of having i64.bswap looks really nice for me....

@sunfishcode
Copy link
Member

sunfishcode commented Oct 28, 2022

Wasm is little-endian, by design. There are now multiple wasm engines implementing wasm's little-endian semantics on big-endian hosts, and they appear to work well. See #1426 for discussion of a bswap instruction.

@SoniEx2
Copy link

SoniEx2 commented Oct 28, 2022

we feel like interface types should support be/le translation. (interface types are shipped with the binary, right? so it's basically "free" (aka slow on the "wrong" platform) as far as libc's and whatnot are concerned?)

@sunfishcode
Copy link
Member

Interface types (now the component model) does encapsulate endianness. That said, adding a big-endian mode to Wasm would still have enormous costs and confusion for the ecosystem as a whole.

@SoniEx2
Copy link

SoniEx2 commented Oct 29, 2022

so we can have an ABI defined entirely by interface types and not have to worry about performance tuning for weird wasm VMs? ^^

@sunfishcode
Copy link
Member

You can have an ABI defined entirely by the component model. This doesn't mean you'll never have to worry about performance tuning though.

@SoniEx2
Copy link

SoniEx2 commented Oct 29, 2022

we feel like that should enable LLVM to use big-endian calling conventions when generating wasm, while the libc implementation itself is still LE, and then it just generates component model stuff to adapt between them?

then a special BE VM can detect that and use a big endian libc and get better performance that way!

@sunfishcode
Copy link
Member

The LLVM backend will not be adding big-endian support. Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.

@penzn
Copy link

penzn commented Oct 29, 2022

Interface types (now the component model) does encapsulate endianness.

@sunfishcode, do you mean it is endianness-agnostic or can it actually express endianness? I could not find a good reference to that in the repo.

LLVM backend doesn't yet support producing Component Model definitions, also switching the ABI isn't a very simple task.

@sunfishcode
Copy link
Member

It's endianness-agnostic. When a component model API passes or returns a value with a type like u32, it's just an integer value in a particular range, and not a sequence of bytes you can observe. If you store it in linear memory and observe the bytes there, at that point, it's you writing those bytes, and not the component model.

I think I misunderstood the question above. The component model can define ABIs, however that's different from the C ABI that compilers expect to talk to their libc with. The component model does not automatically make it possible to make a big-endian C application on top of a little-endian libc.

@SoniEx2
Copy link

SoniEx2 commented Oct 29, 2022

well it should, that'd be amazing for using wasm as a weird IR for weird platforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests