An i386 emulator for executing live-bootstrap in order to analyze (and check) the live-bootstrap contents. The final goal is to produce a documentation similar to this, which is produced by a program that parses the kaem files and output which files are read and produced.
The first argument for the Emulator program is the path (terminated with a '/') to the directory where
live-bootstrap is located. This can be preceded of followed by the '-l' option to specify a log
file where all output of the Emulator program will be written to. (Note that the executables that
are executed can also produce output to stdout and stderr that will not be written to the
log file.) There is also an option '-gen' followed by a number, which will cause the Emulator to
generate a program.cpp with the code of the that step. (After this the Emulator terminates.)
(When compiled with -DENABLE_DO_TRACE, the program also excepts the -trace and when
compiled with -DTRACE_MEMORY, the program also excepts the -trace_mem options to activate tracing.
These are primarily for debugging purposes and will generate lots of output.) The remainder arguments
are the executable and it arguments. When live-bootstrap is located in a sibling directory, the
following command can be used:
./Emulator ../live-bootstrap/seed/stage0-posix/ bootstrap-seeds/POSIX/x86/kaem-optional-seed kaem.x86
The results of the execution will be placed in the results directory (and sub directories), which
are automatically created.
The program also produces the files skip_processes_new.txt and stat.txt. This first can be used for incremental
execution where steps that already have been executed succesful can be skipped. For this the contents
of the file need to be copied to the file skip_processes.txt till the steps for which the
execution has to continue. Each line contains a step/process number followed by the number of
all child steps and the exit code (as hexadecimal) of the step.
Note that the contents of the live-bootstrap directory are not modified.
Files that are modified are placed in the results directory. The program always first searches
for a file in the results directory, before looking in the live-bootstrap directory, it is possible
that skipping steps could lead to incorrect results.
The file stat.txt contains statistics about the parent-child relation between steps and
which files are executed, read and/or written by each step.
The program can also generate a C++ program (in the file program.cpp) for a specific step (after
which execution will stop). The code to enable this can be found at the end of Processor::int_execve
where the method init_statements is called.
The source also includes some additional programs .The programs M1_Emulator.cpp, which can generate a
C++ program from an M1 file, and the program sdiff.cpp, which is (work-in-progress) from comparing programs
generated by Emulator.cpp and M1_Emulator.cpp. This programs were developed at one point for
debugging purposes and is not being maintained.
The program missing_inst.cpp can be used to determine which instruction are not yet implemented
in Emulator.cpp using the output objdump using the following command:
objdump -d executable | ./missing_inst | sort
The scan_trace program can be used to parse the output of the strace command on executing
live-bootstrap in a chroot environment with the Bash script
run_chroot.
The output of this can be viewed here.
The program stops at parsing the input when the tcc-boot0 executable is executed the first time.
The most important development steps with pointer to commits.
My first goal was to have the emulator being able to process the hex0-seed file
and allow it to reproduce it parsing hex0_x86.hex0. Assuming that live-bootstrap
in an adjecent git repository, the following command should work:
./Emulator ../live-bootstrap/sysa/stage0-posix/src/bootstrap-seeds/POSIX/x86/hex0-seed ../live-bootstrap/sysa/stage0-posix/src/x86/hex0_x86.hex0 out.bin
diff ../live-bootstrap/sysa/stage0-posix/src/bootstrap-seeds/POSIX/x86/hex0-seed out.bin
This goal was achieved in commit 1d5494e2. After this changes were made that such that the above command is no longer working
My second goal was to have the emulator being able to process the kaem-optional-seed file.
This goal was achieved with giving the following command, where the first argument is the
path to the stage0 source directory:
./Emulator ../live-bootstrap/sysa/stage0-posix/src/ bootstrap-seeds/POSIX/x86/kaem-optional-seed kaem.x86
This ends with a message about an unknown opcode in the hex1.
I also made some changes such that the generated files are placed in the directory
x86/artifact where the the emulator is executed.
This was implemented in commit f88cf442
The next step will be to be able to process that the hex1 file.
My next goal was to have the emulator being able to process the hex1 file.
I made some progress and found some interesting resources:
After some debugging, for which modified the code to be able to do some debugging, I managed to implement this in commit f88cf442
There appeared to be another bug, which took me a lot of time, and is related to the fact that sign extension was not used for certain subtract and compare instructions (0x83). This bug is solved in the commit 5e37d614.
With comment df17a9eb
it seems we have a working hex2-0 program. From M0.hex2 it did produce an M0 executable,
but it looks like the M0 contains instruction that have not been implemented yet
in the emulator. Again, I spend substantial time debugging, looking at output
and comparing it with the input files. It cannot be excluded that there are still
bugs in the current Emulator that have resulted in an incorrect working hex2-0
program.
With commit 9e7eda25
(and some before), I fixed some problems related to processes effecting each other.
I was not aware that the brk interupt did zero memory and I also did not take care
that the registers where saved properly when switching between processes. The kaem
command line interpreter stores the environment into register, which got corrupted.
It now works until the execution of cc_x86, which makes use of some instructions not
yet supported by the emulator.
In commit 00c51a9d,
I added all the additional instructions that were mentioned in cc_x86.M1. I also had to fix some bug related to the indent getting larger than the
message length of the trace functions. Now it seems to compile cc_x86 correctly and
it also seems to process M2-0.c correctly. But the resulting M2 program contains
some instructions that are not supported yet. These are mentioned in
x86_defs.M1.
In commit 8b1d7005,
I have implemented code generation. When executed, it produces a program called program.cpp
which has (it seems) the same behaviour as the emulator executing the M2. A file called
functions.txt can contain, seperate by a space, a hexadecimal number of an address and
a function name, that will be used to name functions (actually methods) in the generated
program.cpp file.
In the process of debugging the Emulator, I developed the program M1_Emulator.cpp that
can generate a program based on an M1 file. I also developed a program to compare the
two types of generated program. This program is called sdiff.cpp. While working on this,
I discovered that no code is generated for the call on EAX-register instruction. These
can be found in the commit 9fe05336.
After a lot of work, I finally was able to fix the bug in the Emulator.cpp that caused
the emulation of cc_x86 to produced difference with respect to offsets of members in
struct and unions. The problem was with some CMP-instruction having their arguments
swapped. I fixed the problem with the help of 17.2.1 ModR/M and SIB Bytes.
The commit with the fix is b7a80666
It now stops at the execution of M1-0 because this an ELF file with a section containing
symbolic information. This requires some additional work on the Emulator to process these.
With commit 544ab04c
the M2-Planet and M2-Mesoplanet are correctly generated, but the execution of these
executables to build the extra MesCC tools, does not work, resulting in empty executables.
With commit 2914a15d
the mescc-tools and mescc-tools-extra do compile correct by M2-Planet and M2-Mesoplanet.
At commit d8fee0d7 the following has been implemented:
- Store command line arguments and environment variables on the stack
- Some performance improvements with respect to memory access.
- Adapter for live-bootstrap commit 0e6133ee.
- Implemented incremental execution.
The commit be0cbbca
fixes some problems with keeping track of the current directory, getting the full path of
a file (given the current directory) and mapping the full path to either a file in the
live-bootstrap source and downloaded distributions or a file in the result directory.
Also implemented the mkdir and getcwd system calls and fixed the chdir system call,
which did not check if the target directory exists or not. This was used in the cp command
to determine if the target was a directory or a file.
The emulator now executed 228 steps and stops at loading the m2-mes file, because it has
a different ELF header.
The commit df322a8f
adds code to write the stat.txt file with information about the child-parent relationship
between step (processes) and which files are executed, read and written.
The exit status (given with system call exit) was not correctly transfered to parent
step/process (using system call waitpid).
The exit status of each step is now also stored in the skip_processes(_new).txt files.
The chmod system call was not working because it was not operating on the 'mapped' file.
Added command line option for specifying the file to log all output will be send to using the '-l' option followed by the filename. Some of the executables also print output to stdout and stderr, causing the output of the program getting 'mixed' with the output of the program.
Fixed some problems found with -Wall including some 'bug' in the memory access code that might only show up
at some rare occasions.
The commit e9a8efd5 has a number of improvements and fixes to get the GNU Mes to be running. It runs rather slow in the Emulator and not all steps in which it is used have been executed. For that reason the Emulator might still contains with respect to the execution of GNU Mes. The following changes have been made:
- Added system calls:
dup,ioctl,fcntl,gettimeofday,wait4,newuname, andclock_gettime. - Added option for enabling
program.cppgeneration:-gen <nr> - Replaced
#ifby#ifdeffor TRACE_MEMORY - Include address of function in generated code in a comment.
- Added some extra instructions. Found those with the help of the
missing_inst.cppprogram - Added
SIGNEXTin calls of store and load methods. - Generate
program.cppalso when encountering unknown instruction or unknown interupt. - Fixed
unlinksystem call: was not working on mapped file.
The execution for the following steps seems to take hours and looks like be stuck in an endless loop.
The commit 6a8802d0
introduced the scan_trace program, which can parse the output of the strace command to find out
which files are read and created. The program stops parsing the input until the first execution of
tcc-boot0.
For debugging M2-Planet, two additional compile options were added: -DCATCH_SEGFAULT (to catch
segmentation faults) and -DCALL_STACK_CHECK (to enable call stack checks).
For processing the AMD64 architecture a trace.txt file that was produced with run_chroot_AMD64
some changed to scan_trace.cpp were made to deal with unfinished lines.
- program generation is broken with respect to interupt 80. The generator takes the current value
of
aexregister for generating some specific system call. But in the GNU Mes code, it is an argument passed on, thus will be different depending on how the encapsulating function is called. - There seems to be a mismatch between the addresses mentioned in the symbol table in the ELF and the actual addresses. The cause of this mismatch is unknown at the moment.