# Compiling

<div class="alert alert-block alert-info">
    You can find all of the C programs in this notebook in the subdirectory containing this notebook:
    <code>./src/compiling</code>
</div>

Compiling is the process of transforming code written in one language to another language.
When people talk about compiling C code, they usually mean transforming C code into a language that the
computer can run directly (i.e., machine code). A *compiler* is a program that compiles.

When using the GNU compiler `gcc`, the process of compiling a C program to machine code goes through several steps:

1. preprocessing
2. compiling
3. assembling
4. linking

This notebook describes each step using the following example program:

```c
// hello1.c

#include <stdio.h>

int main(void) {
    puts("Hello, world!");
    return 0;
}
```

## Preprocessing

The preprocessor reads a source code file looking for statements that begin with `#`. Such statements are
called *preprocessor directives* and they instruct the preprocessor to perform some kind of action.
The `#include` directive instructs the preprocessor to insert the contents of the included file into the
source code text.

Among other transformations, the preprocessor also normally removes comments from the source code.

We can look at the output generated by the preprocessor by instructing `gcc` to stop after the preprocessing step
using the option `-E`.
Readers will want to perform this step on the command line inside the directory `src/compiling`:

```sh
gcc -E hello1.c > hello1.i
```

You can look at the contents of the file using any text editor or directly in the terminal using `cat hello1.i`.
Notice that there is no definition of the `puts` function in the preprocessor output.

The output of the preprocessor is called a *translation unit*. For the purposes of this course, a
translation unit consists of one C source code file and all of the header files included in the source
code file.

## Compiling

The next step in the overall compilation process is to compile the preprocessor output into a lower-level
programming language called *assembly language*. Assembly language is a human readable language where the
statements map closely to the target architecture's machine code instructions.

We can look at the assembly output generated by the compiler by instructing `gcc` to stop after the compilation step
using the option `-S`.
Readers will want to perform this step on the command line inside the directory `src/compiling`:

```sh
gcc -S hello1.c
```

The compiler writes the assembly code to the file `hello1.s`. You can look at the contents of the file
using any text editor or directly in the terminal using `cat hello1.s`. Notice that there is a `call`
to the `puts` function, but the definition of the function is not part of the assembly output.

<div class="alert alert-block alert-info">
    Most programmers do not routinely work in assembly, but high performance applications will often
    use assembly in parts of the program for speed/memory optimization.
</div>

## Assembling

The next step in the overall compilation process is to translate the assembly code into *object code*. Object
code is machine code for a single compilation unit. If there are function calls to functions defined outside
of the compilation unit, then the object code contains the call to the function, but it does not contain the
object code for the function itself. In our example, the object code for `hello1.c` will not contain
the object code for the `puts` function.

We can instruct `gcc` to stop after the assembly stage using the `-c` option. Readers will want to perform this step on the command line inside the directory `src/compiling`:

```sh
gcc -c hello1.c
```

The compiler writes the object code to the file `hello1.o`. The generated object code is in a binary format
that is not easily human readable.

## Linking

The last step in the overall compilation process is to *link* together the object code for each compilation unit
into a single program. The linker takes object code and attempts to resolve calls to functions across compilation
units. For the `puts` function, the linker must link to the C standard library where the object code for
`puts` resides. Linking to the standard library is done automatically.

The full process of preprocessing, compiling, assembling, and linking is the default action of `gcc`.

```sh
gcc hello1.c
```

will produce the executable program `a.out`. The `-o` option lets you specify a different name for the
executable program:

```sh
gcc hello1.c -o hello1
```

<div class="alert alert-block alert-warning">
    When using the <code>-o</code> option, <code>gcc</code> will overwrite the output file. Be careful
    not to overwrite the input file!
</div>