# ELF Exploration

See [ASSEMBLER.md](../docs/ASSEMBLER.md) for more details (including links to more information about ELF files!), but the traditional input for a linker is `.o`/ELF files.

Fully parsing ELF files was out of scope for this project, but this notebook contains some exploration I did into them.

In [54]:
# We use the pyelftools library to parse ELF files

# Import dependencies
try:
	from elftools.elf.elffile import ELFFile
	from hexdump import hexdump
	# We need to import this directly so that if it's missing we trigger the except handler, because
	# rv32i will throw a different error
	from bitstring import BitArray
	from rv32i import bits_to_line
except ModuleNotFoundError:
	%pip install pyelftools hexdump bitstring # Install if missing

We need a RISC-V ELF file to play with. To get one, run:

```bash
$ ./riscv_gcc_docker.sh -march=rv32i -mabi=ilp32 -c -o elf_example.o ./csrc/elf_example.c
```

To compare this with the equivalent text-based assembly, run:

```bash
$ make asm/compiled/elf_example.s
```

In [55]:
elf = ELFFile.load_from_path('../elf_example.o')

# Some basic sanity checks
assert elf.elfclass == 32, "we only support 32 bit code!"
assert elf.little_endian, "we only support little endian"
assert elf['e_machine'] == 'EM_RISCV', "we only support RISC-V"

## Sections

ELF files are split into sections, each of which has a different name/type/purpose. [This page][sections] has some information about what some of them are. I think some of them are non-standard. [This page][riscv-elf-spec] appears to be the RISC-V ELF spec (or spec modifications), and might be helpful.

As of this writing, here's what was in the ELF file, and what I know about each section. Run the cell below to print the names of all the sections in the current ELF file.

Sections (in order):
- `[NULL]` (0 bytes): Pretty sure this can be ignored
- `.text` (212 bytes): Has executable code in it. I can't figure out what the format of this is.
	- _Very_ weirdly, if you try and disassemble it with a **1 byte** offset (ie. discard the first and last three bytes), it disassembles as mostly-valid (but totally nonsense) assembly.
- `.rela.text` (336 bytes): Also has executable code in it, but maybe relocatable (or relocated?) code?
- `.data` (0 bytes): Read-write non-executable code, contains static or global variables.
- `.bss` (4 bytes): "Read-write section containing uninitialized data", so I think maybe this never has content but might have non-zero size?
- `.sdata` (2 bytes): "This section holds initialized small data that contribute to the program memory image." ([Source][.sdata])
- `.comment` (27 bytes): Pretty sure this is a comment that can be ignored. I've only ever seen one that has information about the GCC version.
- `.Pulp_Chip.Info` (78 bytes): Pulp appears to be a specific type of chip that's safe to ignore? Google has very few results for this section. [Source][pulp]
- `.symtab` (256 bytes): Contains the symbol table (ie. maps functions to ????)
- `.strtab` (106 bytes): Contains the string table (maps string label names to ????)
- `.shstrtab` (81 bytes): This appears to be the section string table (tracks string names of sections in the ELF file)?

The above linked page also mentions:
- `.rodata`: "read-only section containing const variables"

[sections]: https://michaeljclark.github.io/asm.html
[pulp]: https://github.com/chrta/zephyr-sim3u/blob/master/soc/riscv32/openisa_rv32m1/linker.ld
[.sdata]: https://refspecs.linuxfoundation.org/LSB_3.1.1/LSB-Core-PPC64/LSB-Core-PPC64/specialsections.html
[riscv-elf-spec]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#gabi

The following cell will print the contents of each section:

In [103]:
INDENT = '   '
DISASSEMBLER_BYTE_OFFSET = 1
ATTEMPT_TO_DISSASSEMBLE = False

print("Sections in the ELF file (in order):")
for section in elf.iter_sections():
	print(f"- `{'[NULL]' if section.is_null() else section.name}` ({section.data_size} bytes):")

	if section.is_null():
		continue

	data = section.data()
	try:
		# Conceivably should be utf-8, but if it's not valid ascii then it's probably binary
		strrep = data.decode('ascii')
	except UnicodeDecodeError:
		strrep = hexdump(data, result='return')
		# This tries to interpret the data as compiled rv32i assembly and decode it, but that appears
		# not to work. Instead, we just hexdump it
		strrep = hexdump(data, result='return')

		try:
			assert ATTEMPT_TO_DISSASSEMBLE, "don't try if we were told not to"
			assert '.text' in section.name, "only try to decompile code"
			assert section.data_size % 4 == 0, "if it's not a multiple of 32 bits, it's not code"
		
			strrep = ""

			data_list = list(data)
			for i in range(DISASSEMBLER_BYTE_OFFSET, len(data_list), 4):
				try:
					cur_bytes = data[i+0:i+4]
					bits = BitArray(cur_bytes)
					strrep += bits_to_line(bits) + '\n'
				except Exception as err:
					strrep += f"Failed to decode {bits.hex}: {err}" + '\n'
		except:
			strrep = hexdump(data, result='return')

	strrep = f"```\n{strrep}\n```"
	
	print(INDENT + ('\n' + INDENT).join(strrep.split('\n')))

Sections in the ELF file (in order):
- `[NULL]` (0 bytes):
- `.text` (212 bytes):
   ```
   00000000: 13 01 01 FE 23 2E 11 00  23 2C 81 00 13 04 01 02  ....#...#,......
   00000010: 23 26 A4 FE B7 07 00 00  83 A7 07 00 13 87 17 00  #&..............
   00000020: B7 07 00 00 23 A0 E7 00  83 27 C4 FE 13 F7 F7 0F  ....#....'......
   00000030: B7 07 00 00 83 C7 07 00  B3 07 F7 00 13 F7 F7 0F  ................
   00000040: B7 07 00 00 23 80 E7 00  83 27 C4 FE 63 98 07 00  ....#....'..c...
   00000050: B7 07 00 00 83 C7 07 00  6F 00 00 03 83 27 C4 FE  ........o....'..
   00000060: 93 87 F7 FF 13 85 07 00  97 00 00 00 E7 80 00 00  ................
   00000070: 93 07 05 00 83 25 C4 FE  13 85 07 00 97 00 00 00  .....%..........
   00000080: E7 80 00 00 93 07 05 00  13 85 07 00 83 20 C1 01  ............. ..
   00000090: 03 24 81 01 13 01 01 02  67 80 00 00 13 01 01 FE  .$......g.......
   000000A0: 23 2E 11 00 23 2C 81 00  13 04 01 02 23 26 A4 FE  #...#,......#&..
   000000B0: 03 25 C4 FE 97 00 