djgpp ? #3
Comments
|
I'm happy that people are still finding that post! So, right now it's not actually 16-bit code, despite what the compiler flags make it look like. For context, when Intel extended x86 from 16-bit to 32-bit, they added two ways to make use of these new 32-bit instructions. You could either switch to the new “protected mode” where instructions were 32-bit by default, and you could use an escape code to make instructions 16-bit, or stay in the old “real mode” where instructions are 16-bit by default. The code that the compiler is generating here is for real mode, but it still includes 32-bit instructions. I actually had a horrible bug earlier where the 16-bit and 32-bit instructions were swapped because for some reason Intel decided to make the escape codes for “switch to 16-bit” and “switch to 32-bit” the same and context-dependent on the current mode, and I had forgotten that compiler flag. There are some unfortunate caveats about the setup here, including that in order to prevent LLVM from crashing, I have to tell it to use 32-bit pointers even though only the low 16 bits can actually be used in real mode without using a segment selector. So pointers are essentially twice the size that they need to be. DJGPP is different from what I've done here so far in that it switches the CPU to protected mode. This is nice because it means that you can use 32-bit pointers and get a flat(-ish) address space, which is what Rust expects. If you actually wanted to develop software for DOS, this would probably be a good idea. I just wanted to start with real mode because it's definitively DOS, as opposed to protected mode, which seeps into the time of Windows. Real mode is just more retro. I don't see how I could use DJGPP to help with this since it's based around GCC, and unfortunately Rust does not have any sort of GCC-based backend available at the moment. That said, DJGPP is fine if you want to write in C. If you're interested in learning how to extend this, prod the hardware, play with VGA text mode, and so on, then I would recommend joining the Rust community Discord server, and I can walk you through the MS-DOS API, the Rust programming language, inline assembly, and all sorts of fun stuff like that. Here's a link. |
|
Wait a minute, you might be onto something with this whole DJGPP idea! I might be able to write some sort of stub in C using DJGPP that loads a module of 32-bit Rust code! |
|
@stuaxo Oh, by the way, when you get on the Discord server, my username there is “Kiong-luē Liân-huâ”. |
|
Back when I was playing with DOS, my practical knowledge stopped at the different memory models of real mode (Turbo Pascal and Turbo C let you choose these in the IDE), protected mode always seemed out of reach, as there were scary looking bits of assembly you could download, including the (then new) flat real mode. Can you write a bit about how com.ld and startup.s work ? Is generating .exe rather than .com harder because they have a particular layout ? I found this old guide on using nasm to create exes, not sure if it helps? I'll try and jump on the discord server @ some point, though free time is a bit fragmented these days. |
|
Logged onto discord long enough to work out that I'd set my username to |
|
It's funny, because to me protected mode is “normal” (since it's what modern software all runs in) and real mode is the the thing to be learning. The Luckily, Yes, the EXE format is harder becase it's a lot more complicated, and includes information about how to use multiple segments of memory. I've actually considered, instead of figuring out how to get the Rust compiler to generate EXEs, simply finding an ELF loader for MS-DOS. That might also have the benefit of letting me switch to protected mode. |
|
I just stumbled upon this issue. Can't help but chime in. I am also an MS-DOS enthusiast, and I made some attempts to target DOS via DJGPP some months ago. Alas, creating a target descriptor may not be enough, as the linker is likely expecting a different intermediate compilation outcome. I would have an .EXE file, but it would crash immediately once run due to a memory access violation (although depending on how it's compiled, I could also get a SIGFPE signal due to a division by zero). I wouldn't be surprised if it was related with wrong symbol names or something like that. For posterity, this is roughly what I tried for the target triple, built then with xargo on a barebones no_std project. Perhaps someone else more familiar with the subject can continue building on top of this or provide any feedback on parts which are clearly incorrect. Still, it might be true that bootstrapping a C project that runs Rust modules might be more feasible. {
"abi-return-struct-as-int": true,
"allows-weak-linkage": false,
"arch": "x86",
"cpu": "i686",
"custom-unwind-resume": true,
"data-layout": "e-m:x-p:32:32-i32:32-f64:32-n8:16:32-a:0:32-S128",
"dynamic-linking": false,
"eliminate-frame-pointer": false,
"emit-debug-gdb-scripts": false,
"env": "djgpp",
"exe-suffix": ".exe",
"executables": true,
"function-sections": false,
"late-link-args": {
"gcc": [
"-Wl,--end-group"
]
},
"linker": "i686-pc-msdosdjgpp-gcc",
"ar": "i686-pc-msdosdjgpp-ar",
"linker-flavor": "gcc",
"llvm-target": "i686-pc-windows-gnu",
"position-independent-executables": false,
"disable-redzone": true,
"os": "msdos",
"post-link-objects": [
],
"pre-link-args": {
"gcc": [
"-m32",
"-march=i686",
"-fno-pie",
"-fno-use-linker-plugin",
"-nostdlib",
"-Wl,--as-needed",
"-Wl,--gc-sections",
"-Wl,--start-group"
]
},
"pre-link-objects-exe": [
"/usr/i686-pc-msdosdjgpp/lib/crt0.o",
"/usr/i686-pc-msdosdjgpp/lib/libc.a"
],
"requires-uwtable": true,
"staticlib-prefix": "",
"staticlib-suffix": ".a",
"target-c-int-width": "32",
"target-endian": "little",
"target-family": "windows",
"target-pointer-width": "32",
"vendor": "pc"
}#![feature(start, lang_items)]
#![no_std]
#![no_main]
use core::panic::PanicInfo;
use libc::{c_char, c_int};
extern "C" {
fn exit(c: c_int);
}
#[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> isize {
0
}
#[panic_handler]
fn handle_panic(_info: &PanicInfo) -> ! {
// exit using libc
unsafe {
exit(-1);
core::hint::unreachable_unchecked()
}
} |
|
@Enet4 Thanks for this! I think I'll probably want to come back to this soon, so this is something I'll consider as well. |
|
I looked into this and also couldn't get it to work yet. But I found some things which might help: The #include <stdio.h>
int main(int argc, char **argv) {
return 0;
}results in: 00001f10 <_main>:
1f10: 55 push %ebp
1f11: 89 e5 mov %esp,%ebp
1f13: b8 00 00 00 00 mov $0x0,%eax
1f18: 5d pop %ebp
1f19: c3 ret
1f1a: 90 nop
1f1b: 90 nop
1f1c: 90 nop
1f1d: 90 nop
1f1e: 90 nop
1f1f: 90 nopwhere this: #![feature(start)]
#![no_main]
#![no_std]
use core::panic::PanicInfo;
#[start]
#[no_mangle]
pub extern "C" fn main() -> i32 {
0
}
#[panic_handler]
fn handle_panic(_info: &PanicInfo) -> ! {
loop {}
}results in: 00001df0 <_main>:
1df0: 55 push %ebp
1df1: 89 e5 mov %esp,%ebp
1df3: 83 ec 08 sub $0x8,%esp
1df6: 8b 45 0c mov 0xc(%ebp),%eax
1df9: 8b 4d 08 mov 0x8(%ebp),%ecx
1dfc: 89 45 fc mov %eax,-0x4(%ebp)
1dff: 89 4d f8 mov %ecx,-0x8(%ebp)
1e02: e8 30 00 00 00 call 1e37 <___main+0x17>
1e07: 31 c0 xor %eax,%eax
1e09: 83 c4 08 add $0x8,%esp
1e0c: 5d pop %ebp
1e0d: c3 ret
1e0e: 90 nop
1e0f: 90 nopThat call to The sourcecode of djgpp from djlsr205.zip contains all the startup code which helps to follow along, see I hope this helps someone to figure this out. I'm giving up for now. |
|
They might help on the DJGPP mailing list, I'm fairly sure the devs respond there I'd ask, but don't have quite enough x86 asm experience to know what to ask. |
|
I got further. It seems to be a linker problem. The actual rust object file looks like this: 00000000 <_main>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 00 00 00 00 call 8 <_main+0x8>
8: b8 05 00 00 00 mov $0x5,%eax
d: 5d pop %ebp
e: c3 ret
f: 90 nop
00000010 <_rust_begin_unwind>:
10: 55 push %ebp
11: 89 e5 mov %esp,%ebp
13: 8b 45 08 mov 0x8(%ebp),%eax
16: eb fe jmp 16 <_rust_begin_unwind+0x6>But the #include <stdio.h>
extern int rustmain();
int main(int argc, char **argv) {
return rustmain();
}and called the main rust function use libc::{c_char, c_int};
extern "C" {
pub fn printf(format: *const c_char, ...) -> c_int;
}
#[no_mangle]
pub extern "C" fn rustmain() -> i32 {
unsafe {
printf(b"Hello, World!\0".as_ptr() as *const i8);
}
0
}To get it to compile to this point I had to add |
|
Hmm, the rust generated assembler doesn't seem to reference anything outside of itself: 00000000 <_rustmain>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 8d 05 00 00 00 00 lea 0x0,%eax
c: 89 04 24 mov %eax,(%esp)
f: c7 44 24 04 0e 00 00 movl $0xe,0x4(%esp)
16: 00
17: e8 00 00 00 00 call 1c <_rustmain+0x1c>
1c: 89 45 fc mov %eax,-0x4(%ebp)
1f: 8b 45 fc mov -0x4(%ebp),%eax
22: 89 04 24 mov %eax,(%esp)
25: e8 00 00 00 00 call 2a <_rustmain+0x2a>
2a: 31 c0 xor %eax,%eax
2c: 83 c4 0c add $0xc,%esp
2f: 5d pop %ebp
30: c3 ret
31: 90 nop
32: 90 nop
33: 90 nop
34: 90 nop
35: 90 nop
36: 90 nop
37: 90 nop
38: 90 nop
39: 90 nop
3a: 90 nop
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop
00000040 <_rust_begin_unwind>:
40: 55 push %ebp
41: 89 e5 mov %esp,%ebp
43: 8b 45 08 mov 0x8(%ebp),%eax
46: eb fe jmp 46 <_rust_begin_unwind+0x6>But the linker seems to do something sensible: 0000c780 <_rustmain>:
c780: 55 push %ebp
c781: 89 e5 mov %esp,%ebp
c783: 83 ec 0c sub $0xc,%esp
c786: 8d 05 00 58 01 00 lea 0x15800,%eax
c78c: 89 04 24 mov %eax,(%esp)
c78f: c7 44 24 04 0e 00 00 movl $0xe,0x4(%esp)
c796: 00
c797: e8 50 00 00 00 call c7ec <__ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6as_ptr17h42cd1679a299a9e9E+0x1c>
c79c: 89 45 fc mov %eax,-0x4(%ebp)
c79f: 8b 45 fc mov -0x4(%ebp),%eax
c7a2: 89 04 24 mov %eax,(%esp)
c7a5: e8 70 00 00 00 call c81a <_printf+0x2a>
c7aa: 31 c0 xor %eax,%eax
c7ac: 83 c4 0c add $0xc,%esp
c7af: 5d pop %ebp
c7b0: c3 ret
c7b1: 90 nop
c7b2: 90 nop
c7b3: 90 nop
c7b4: 90 nop
c7b5: 90 nop
c7b6: 90 nop
c7b7: 90 nop
c7b8: 90 nop
c7b9: 90 nop
c7ba: 90 nop
c7bb: 90 nop
c7bc: 90 nop
c7bd: 90 nop
c7be: 90 nop
c7bf: 90 nopI guess there is adjusting of pointers going on. I don't know enough about linkers unfortunately. |
|
@fschulze Wow, I really appreciate this! Is this 32-bit protected mode? |
|
@fschulze I'm coming back to this now! I'll let you know if I need you to walk me through anything. This seems very promising. |
|
Hm... it seems that this is targeting the Pentium II (i686). I wonder if it would be possible to pass |
|
I just checked and it does indeed support |
|
Okay, so in my experience what seems to be happening is that when C code calls Rust it works fine, but when Rust code calls C it ends up calling the wrong address by some small offset. I've played around with the target specification a bit but didn't find anything that fixed it yet. |
|
As an example, if I call |
|
Well, this is one way to fix the issue: let exit = (libc::exit as usize) - 0xA;
let exit: extern "C" fn(libc::c_int) -> ! = core::mem::transmute(exit);
exit(0); |
|
Either it is because of some kind of calling convention or there could be differences in the output of llvm versus what the gnu linker wants. Is the offset always the same, also for other functions? Have you been able to use printf or some other simpler function other than exit? If so, that info might be helpful when asking on the djgpp mailing list after all. |
I can do some tests to see.
I tried writing a hello world program like this (I also tried a version where I didn't offset the pointer to the string): let puts = (libc::puts as usize) - 0xA;
let puts: extern "C" fn(*const libc::c_char) -> () = core::mem::transmute(puts);
puts(((b"Hello from Rust!\0".as_ptr() as usize) - 0xA) as *const libc::c_char);It crashed not the program itself, not the OS, but all of DOSBox. |
|
It seems that the offset is not constant. Every time I call a function, it increases by 0xA. |
|
did you notice that the offset corresponds to the bytes used for the instructions? |
|
If you used i686-pc-msdosdjgpp-objdump on the object file, the offsets won't be finalized. I think you have to look at the final exe for that. |
|
This is on the final EXE. I see you're right though. Ten bytes pass in my code, and it's ten bytes more offset. |
|
It seems to me like one side is trying to generate addresses which are relative. |
|
This is interesting. #[no_mangle]
pub unsafe extern "C" fn rust_main() -> libc::c_int {
libc::puts as i32
}This returns the correct address for |
|
It even ends up calling a different address each time when I do this. Maybe LLVM is smart enough to realize that the let puts_usize = libc::puts as usize;
let hello = b"Hello!\0".as_ptr() as *const libc::c_char;
core::mem::transmute::<usize, extern "C" fn(*const libc::c_char) -> i32>(puts_usize)(hello);
core::mem::transmute::<usize, extern "C" fn(*const libc::c_char) -> i32>(puts_usize)(hello); |
|
Okay, this just reached a whole other level of strangeness. I realized that even though the disassembler is showing it calling different addresses, if you look at the machine code bytes in question, they're identical. And when I paste the hex into other disassemblers, it shows it as calling address 0xB5. |
FYI, there are DJGPP builds for Debian/Ubuntu that default to i386. https://launchpad.net/~jwt27/+archive/ubuntu/djgpp-toolchain |
|
Hm... I'm still not convinced it's the calling convention, because I don't think calculating relative offsets should be part of that. |
|
If I pass a function pointer from C to Rust, I'm able to call functions that don't take pointers, such as |
|
I posted a question about this on the DJGPP mailing list. https://groups.google.com/d/msg/comp.os.msdos.djgpp/0l6wjO-oSM0/wucHtHpCAgAJ |
|
Using some keywords from your mail, I found this in the llvm source, maybe its a lead: http://llvm.org/doxygen/RuntimeDyldCOFFI386_8h_source.html#l00131 |
|
Not sure if this helps in any way, but maybe it's worth checking whether the relocation resolution matches the COFF specification as presented in the official DJGPP website. http://www.delorie.com/djgpp/doc/coff/ |
|
Unfortunately I'm not sure which of those relocation types is being used, as none of them match the names that Rust gives. |
|
Which names does rust give ? |
|
It has stuff like “static”, “dynamic”, and “dynamic-no-pic”. |
|
If I set the code model to large, the issue goes away. However, now the compiler doesn't generate relative jumps at all, and instead loads the absolute address into a register and calls the register. So function calls are now more instructions and also introduce register pressure. Still, it's better than not working. |
|
That's interesting. I just tried that out with this: // imports, root attributes, panic handler, and other declarations omitted
#[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> isize {
unsafe {
puts(b"Rust says hello DOS!\0".as_ptr() as *const c_char);
}
0
}RUSTFLAGS='-C code-model=large' RUST_TARGET_PATH=`pwd` xargo buildThis compiles and runs, but does not print anything when run, it just exits gracefully. Replacing I might do some extra sleuthing later. |
|
I haven't been able to get C's I/O functions to work. Passing pointers around seems like it sometimes doesn't work even now. I've been writing to the screen using the VGA buffer at |
|
Whelp, I don't have much to show for it this time. I can be sure that the program runs the declared main function, as performing thousands of volatile writes leads to delays in the program's execution, but the screen isn't updated to reflect the intended changes. In particular, this main function in C works just fine and prints the given text. #include <stdio.h>
int main(int argc, char* argv[]) {
puts("Hello DOS from C.");
return 0;
}I was almost about to say that the equivalent in Rust does nothing, but... this code: #[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> c_int {
unsafe {
puts(b"Rust says hello DOS!\0".as_ptr() as *const c_char);
}
0
}Is resulting in this output if I run the C program first. If I don't, it just prints a new line with no visible characters. The assembly shows that at some point the function 0000c760 <_main>:
c760: 55 push %ebp
c761: 89 e5 mov %esp,%ebp
c763: 83 ec 14 sub $0x14,%esp
c766: 8b 45 0c mov 0xc(%ebp),%eax
c769: 8b 4d 08 mov 0x8(%ebp),%ecx
c76c: ba 90 5a 00 00 mov $0x5a90,%edx
c771: 89 45 fc mov %eax,-0x4(%ebp)
c774: 89 4d f8 mov %ecx,-0x8(%ebp)
c777: ff d2 call *%edx
c779: 89 e0 mov %esp,%eax
c77b: c7 40 04 15 00 00 00 movl $0x15,0x4(%eax)
c782: c7 00 00 0a 01 00 movl $0x10a00,(%eax)
c788: b8 10 c8 00 00 mov $0xc810,%eax
c78d: ff d0 call *%eax
c78f: 89 45 f4 mov %eax,-0xc(%ebp)
c792: 89 e0 mov %esp,%eax
c794: 8b 4d f4 mov -0xc(%ebp),%ecx
c797: 89 08 mov %ecx,(%eax)
c799: b8 30 c8 00 00 mov $0xc830,%eax
c79e: ff d0 call *%eax
c7a0: 31 c0 xor %eax,%eax
c7a2: 83 c4 14 add $0x14,%esp
c7a5: 5d pop %ebp
c7a6: c3 ret Minor note: I had a look at your reproducible example on the mailing list, and I noticed that the null terminator was missing in one of the string literals, although I don't believe that it could ever make a difference there. |
|
Is it worth trying dosemu2? There are a lot of logging options available, including including messages when outputting to video. |
|
Oh yeah, forgetting the null terminator would be an issue if you pass it to |
|
I think the main problem here is that rustc generates PE-COFF objects. I'm quite surprised this even links at all, and that both djgpp's and mingw's objdump recognize the files as their own. I knew the formats were similar but I didn't think they would be totally ambiguous. The relocations look valid (the same) in both objdumps however.
This is because PE-COFF puts const strings in an |
That is indeed very strange considering that it has no problem actually showing a string when I write it to the screen using my own routine. Maybe it is including it somewhere else since it's typed as a byte array instead of a string. I did that because it's not in UTF-8, which Rust strings must be. |
It's because that loop is unrolled and each byte is written individually: movl $72, %ecx
movl $15, %edx
calll __ZN5dos322io11write_entry17h0442a9727dcda470E
movl $101, %ecx
movl $15, %edx
calll __ZN5dos322io11write_entry17h0442a9727dcda470E
movl $108, %ecx
movl $15, %edx
calll __ZN5dos322io11write_entry17h0442a9727dcda470E
movl $108, %ecx
movl $15, %edx
calll __ZN5dos322io11write_entry17h0442a9727dcda470E
... |
|
That makes sense. So you checked and confirmed that the compiler is generating PE object files instead of COFF ones? Is there any easy solution for this? |
|
The only real solution I think, is to add support for the coff-go32 target in LLVM. A workaround might be to have a binutils compiled with Or an even dirtier hack; make rustc emit asm, then fixup the section names with a script and assemble it with |
|
I wonder if the version of Binutils that I have from the DJGPP build that I got might have PE support. If not, it seems I'll probably end up with yet another dependency. |
|
Can you use objcopy to convert it into the correct format ? From some googling, it looks like it can convert from COFF to PE. Every time I read these, I end up going on a google about executable or object formats, but never seem to find something quite useful enough :) |
I tried this, and the relocations are still wrong. So the object format may not be the problem after all, and maybe PE can be similar enough to go32 that it's safe to link with each other, at least for simple programs. |
|
Apologies for the noise, I should have scrolled up :) Is there much involved to add support, I guess it's pretty parts already there? |
|
It seems djgpp uses the value of a relative relocation differently, it adds the offset of the relocation in the section. I wrote a little perl script to change the relocation format of a windows coff to the djgpp variant and rename the .rdata section to .text. It did work for a tiny |
|
@arbruijn This is huge! I'll try it out very soon. |
|
@arbruijn Thank you for sharing this. I'm afraid that I haven't gotten it to work yet. Is it only meant to work on non-linked COFF object files? How did you apply the script in the building process? It's also worth noting that there were a few recent breaking changes to the target specification schema. A specific change last May has changed |
|
There's probably a better way, but what I did was:
|
|
I don't know if it will help, but there is now a GCC frontend for Rust https://rust-gcc.github.io/ |
|
@stuaxo Interesting, that would also open up other architectures like the CPU of the Dreamcast, which isn't yet supported by LLVM |


djgpp has been packaged for Linux https://launchpad.net/~stsp-0/+archive/ubuntu/djgpp/+packages could that help, or is the aim just to have 16 bit code ?
I'm pretty fuzzy on how this works (though managed to compile the example and run in dosbox).
I read your post -
https://www.reddit.com/r/rust/comments/ask2v5/dos_the_final_frontier/
As I'm new to rust, I'm pretty fuzzy on how this works, and where to extend things to play with this..
BlogOS has some VGA text mode routines, which looks like an interesting place to start playing with, though at the moment, not even sure how to build another file apart from dos.com
The text was updated successfully, but these errors were encountered: