Build your own: linker

Parsing ELF Files

The first step of writing our linker will be parsing ELF files. We'll use nom to easily parse ELF files into in-memory data structures, and along the way we'll learn about what all goes into an ELF file. I'll closely mirror the System V ABI documentation in my sections.

What is an ELF file?

An ELF file (stands for Executable and Linkable Format) is a standard format for object files, shared libraries, and executables. In the compilation pipeline, an ELF file is the output of the assembler, and both the input and output of the linker. ELF files serve a few distinct purposes

contain the machine code that will form the running program
contain information for linking separately compiled object files together
contain information for building the process image

There are 3 main types of ELF file:

Relocatable files: relocatable ELF files contain object code that is ready to be linked with other object files, static libraries (which are just archives of object files), and shared libraries. Relocatable files are not independently usable, as they need to be relocated (we'll get to exactly what that means in a little bit).
Executable files: executable ELF files contain executable code with no unresolved references. The bytes in an executable ELF file are directly loaded into memory by the operating system and executed.
Shared object files: shared object ELF files contain position independent code that are loaded directly into memory at runtime. Executable ELF files which link with shared objects declare their dependency on the shared object file in their metadata, and rely on the dynamic linker (part of the operating system) to load the associated shared objects into memory at runtime.

Setting up

We'll start by setting up a new Rust project.

cargo new linker

and filling main.rs.

use std::fs::File;
use std::io::Read;
use std::path::PathBuf;
use structopt::StructOpt;

use ld_rs::elf::ElfFile64;
use ld_rs::link;

#[derive(StructOpt, Debug)]
#[structopt(name = "basic")]
struct Opt {
    #[structopt(name = "FILE", parse(from_os_str))]
    filenames: Vec<PathBuf>,
}

fn generic_error(action: &str) -> ! {
    eprintln!("Something went wrong {} that file.", action);
    std::process::exit(1)
}

fn main() {
    let opt = Opt::from_args();

    let mut elfs = Vec::new();

    for f in opt.filenames.iter() {
        let mut buf = Vec::new();
        let mut file = File::open(f).unwrap_or_else(|_| generic_error("opening"));
        file.read_to_end(&mut buf)
            .unwrap_or_else(|_| generic_error("reading"));

        match ElfFile64::parse(&buf[..]) {
            Ok(elf) => {
                elfs.push(elf);
            }
            Err(_) => {
                eprintln!("That is not an ELF file!");
                std::process::exit(1);
            }
        };
    }

    let mut file = File::create("output.o").expect("could not create file");
    ElfFile64::write_out(link(elfs), &mut file).expect("could not write file");
}

First, I've used structopt to easily parse command-line options. The arguments to the linker are a list of relocatable object files which we'll link together into a new relocatable object file (our linker will do the equivalent of ld -r it will produce an object file which still needs to be relocated).

In main, we first parse the command line arguments. Next, for each file specified on the command line, we open it and parse an ELF file with ElfFile64::parse (which we'll define shortly). Once we have our list of ELF files, we link them together with link (see post #3 of this series) and write the result out to a new object file output.o (see post #2 of this series).

ElfFile64

The ElfFile64 type represents a 64-bit ELF file (there are also 32-bit ELF files which our linker will not support). There are two representations of an ELF file: there is the representation of the file on disk, and there is the in-memory representation we will use to link (ElfFile64). To parse, we will write parser combinators for several ElfFile64*Raw types which we will combine into an ElfFile64.

We define the ElfFile64 type in an elf module.

mod parse;
pub mod relocation;
pub mod section;
pub mod symbol;
mod write;

use parse::{ElfFile64HeaderRaw, ElfFile64Raw, ElfFile64RawParseError};
use relocation::get_relocations;
use section::{get_sections, organize_sections, Section64};
use symbol::{get_symbols, Symbol64};

pub const ELF_MAGIC: &[u8] = b"\x7FELF";
pub const EHSIZE_64: usize = 64;

#[derive(Debug)]
pub struct ElfFile64 {
    pub header: ElfFile64HeaderRaw,
    pub unorganized_sections: Vec<Section64>,
    pub symbols: Vec<Symbol64>,
}

#[derive(Debug)]
pub enum ElfFileError {
    ParseError,
    InvalidFileError,
}

impl From<ElfFile64RawParseError> for ElfFileError {
    fn from(_: ElfFile64RawParseError) -> Self {
        ElfFileError::ParseError
    }
}

impl ElfFile64 {
    pub fn parse(input: &[u8]) -> Result<ElfFile64, ElfFileError> {
        Ok(ElfFile64Raw::parse(input)?.into())
    }
}

impl From<ElfFile64Raw> for ElfFile64 {
    fn from(raw: ElfFile64Raw) -> Self {
        let sections = get_sections(&raw);

        let (mut unorganized_sections, symtab, relas, index_map) = organize_sections(sections);

        let symbols = get_symbols(&raw, &symtab, &index_map);
        for rela in relas {
            let referenced_section = &mut unorganized_sections[rela.info as usize];
            referenced_section.relocations = Some(get_relocations(&raw, &rela));
        }

        ElfFile64 {
            header: raw.header,
            unorganized_sections,
            symbols,
        }
    }
}

In this section, we first define some modules that we'll use later. Then, we define the ElfFile64 type itself, which is primarily a list of sections and symbols (we also keep the header around which we'll parse in the next section, mainly to make writing the ELF out easier). We also define a custom error enum (which isn't really required for a project like this, but is good practice in larger libraries). ElfFile64::parse is implemented by calling ElfFile64Raw::parse on the bytes of the file, and calling .into() on the result. Finally, we implement From<ElfFile64Raw> for ElfFile64. We carry forward the raw header, separate the sections which we'll turn into separate in-memory data structures (the symbol table and relocations table), and return the ElfFile64 at the end. We'll get more into what each of the helper functions does in a moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.md

1.md

Build your own: linker

Parsing ELF Files

What is an ELF file?

Setting up

ElfFile64

The ELF Header

String tables

Symbol tables

Relocations

The result

Files

1.md

Latest commit

History

1.md

File metadata and controls

Build your own: linker

Parsing ELF Files

What is an ELF file?

Setting up

ElfFile64

The ELF Header

String tables

Symbol tables

Relocations

The result