In [None]:
# PURPOSE:  Writ to illustrate an assembly program and exit.
# INPUT:    None.
# OUTPUT:   A status code that can be viewed with 'echo $?' later.
# VARIABLES:
#           %eax holds the system call number
#           %ebx holds the return status
.section .data

.section .text
.globl _start
_start:
    movl $1, %eax   # call the system call for exiting the program
    movl $0, %ebx   # define the exit status number
    int $0x80       # run kernel's exit command

## Outline of an Assembly Language Program

At the beginning there are lots of lines that begin with hashes `#`. These are *comments*.

Get into the habit of writing comments in your code that will help them understand both why the program exists and how it works. Furthermore, always include the following in your comments:
- The purpose of the code.
- An overview of the processing involved.
- Anything strange your program does and why it does it.

> *You'll find that many programs end up doing strange things. Usually there is a reason for that, but unfortunately, programmers never document such things in their comments. So, future programmers either have to learn the reason by modifying and watching the code break, or just leaving it alone whether it is still needed or not. You should always document any strange behavior your program performs. Unfortunately, figuring out what is strange and what is straightforward comes mostly with experience.*

After the comments, the next line says `.section .data`.

Anything starting with a period isn't directly translated into a machine instruction. Instead, it's an instruction to the assembler itself.

These are called *assembler directives* or *pseudo-operations* because they are handled by the assembler and are not actually run by the computer.

The `.section` command breaks your program into sections. `.section .data` starts the data section, where you list any memory storage you will need for data.

Our program doesn't use any, so we don't need the section. It's just here for completeness. However, almost every program you write in the future will have data.

Right after this you have `.section .text`, which starts the text section where the program instructions live.

The next instruction is `.globl _start`, which instructs the assembler that `_start` is important to remember.

`_start` is a *symbol*, which means that it is going to be replaced by something else either during assembly or linking. `.globl` means that the assembler shouldn't discard this symbol after assembly because the linker will need it. 

`_start` is a special symbol that always needs to be marked with `.globl` because it marks the location of the start of the program. Without marking this location in this way, when the computer loads your program it won't know where to begin running your program.

Symbols are generally used to mark locations of programs or data, so you can refer to them by name instead of by their location number. The assembler and linker can take care of keeping track of addresses so that you can concentrate on writing your program.

The next line `_start:` defines the value of the `_start` label. 

A *label* is a symbol followed by a colon. They define a symbol's value and tell the assembler to make the symbol's value be wherever the next instruction or data element will be. 

This way, if the actual physical location of the data or instruction changes, you won't have to rewrite any references to it; the symbol automatically gets the new value.

Now we get into actual computer instructions. The first such instruction is `movl $1, %eax`, which transfers the number `1` into the `%eax` register.

In assembly language, many instructions have *operands*. 

`movl` has two operands: the *source* and the *destination*.

Operands can be numbers, memory location references, or registers.

> *(See Appendix B for more information on which instructions take which kinds of operands.)*

On x86 processors, there are several general-purpose registers (all of which can be used with `movl`):
- `%eax`
- `%ebx`
- `%ecx`
- `%edx`
- `%edi`
- `%esi`

In addition to these general-purpose registers, there are also several special-purpose registers, including:
- `%ebp`
- `%esp`
- `%eip`
- `%eflags`

Note that on x86 processors, even the general-purpose registers have some special purposes, or used to before it went 32-bit. However, these are general-purpose registers for most instructions. But, each of them has at least one instruction where it is used in a special way. Most of those instructions won't covered in this book.

You may be wondering, *why do all of these registers begin with the letter e?* The reason is that early generations of x86 processors were 16-bit rather than 32-bit. Therefore, the registers were only half the length they are now. In later generations of x86 processors, the size of the registers doubled. They kept the old names to refer to the first half of the register and added an `e` to refer to the extended versions of the register.

Usually you will only use the extended versions. Newer models also offer a 64-bit mode, which doubles the size of these registers yet again and uses an `r` prefix to indicate the larger registers (i.e. `%rax` is the 64-bit version of `%eax`). However, these processors are not widely used (back in 2003/2004), and are not covered in this book.

In our code, the `movl` instructions moves the number `1` into `%eax`. The dollar-sign in front of the `1` indicates that we want to use immediate *mode addressing*. Without the dollar-sign, it would do *direct addressing*, loading whatever number is at address 1 instead.

The reason we are moving the number 1 into `%eax` is because we are preparing to call the Linux Kernel. The number `1` is the number of the *`exit` system call*.

Many operations such as calling other programs, dealing with files, and exiting have to be handled by the operating system through system calls. When you make a system call, the system call number has to be loaded into `%eax`.

> *(For a complete listing of system calls and their numbers, see Appendix C).*

*Parameters* are extra data stored in other registers. In the case of the `exit` system call, the operating system requires a status code to be loaded in `%ebx`. This value is then returned to the system, and it is the value you retrieved if you typed `echo $?` afterwards.

So, we load `%ebx` with `0` by typing `movl $0, %ebx`.

Apart from system calls, registers are places where all program logic such as addition, subtraction, and comparisons take place. 

Loading registers doesn't do anything by itself. Linux simply requires that certain registers be loaded with certain parameter values before making a system call.

`%eax` is always required to be loaded with the system call number. For other registers however, each system call has different requirements. In the `exit` system call, `%ebx` is required to be loaded with the exit status.

> *(See Appendix C for a list of common system calls and what is required to be in each register.)*

The next instruction is the "magic" one. It is `int $0x80` where:
- `int` stands for *interrupt*.
- `0x80` is the interrupt number to use.

You may be wondering why it's `0x80` instead of just `80`. The reason is that the number is written in *hexadecimal*. Numbers starting with `0x` are in hexadecimal. Tacking on an `H` at the end is also sometimes used instead. 
		
> *(For more information about this, see Chapter 10.)*

An *interrupt* interrupts the normal program flow, and transfers control from our program to Linux so that it will do a system call.

Though, actually, the interrupt transfers control to whoever set up an *interrupt handler* for the interrupt number. In the case of Linux, all of them are set to be handled by the Linux kernel.

In this case, all we're doing is asking Linux to terminate the program, in which case we won't be back in control. If we didn't signal the interrupt, then no system call would have been performed.

### Quick System Call Review: 
To recap, 
- The features of the computer's operating system are accessed through system calls.
- System calls are invoked by setting up the registers in a special way and issuing the instruction `int $0x80`.
- Linux knows which system call we want to access by what we stored in the `%eax` register. 
- Each system call has other requirements as to what needs to be stored in the other registers.
- System call number 1 is the `exit` system call, which requires the status code to be placed in `%ebx`.