## Memory: Real and Virtual


_burton rosenberg, 29 june 2023_


### Table of contents.

1. <a href="#intro">Introduction</a>
1. <a href="#byte">What is a byte?</a>
1. <a href="#intrepr">Representations of Integers</a>
1. <a href="#intmem">The memory layout of integers</a>



### <a name=intro>Introduction</a>


For the purposes of this section, a computer consists of three elements,

- A memory device, called the RAM
- A control and computation device, called the CPU
- A connection between the RAM and CPU called the bus.

The CPU sends over the bus a request to the RAM to read or write a particular data item at a particular address. In a simplified model, the RAM is a collection of cells, each capable of storing a _byte_, and each cell as an integer index. The index is important in several ways,

- It locates the byte.
- Consecutive indices are useful in creating data items that are multiple bytes.
- Access patterns such as indexing, are integer calculations on the indices.

<div style="float:right;margin:2em;width:450px;border:1px solid green;padding:1em;">
<a title="MCS-80/85 Family User's Manual, Intel, October 1979" href="https://archive.org/details/Mcs80_85FamilyUsersManual/">
<img width="512" alt="Paper Tape Drive (31437412070)" src="../images/databuses_8085.jpg">
<br>MCS-80/85 Family User's Manual, Intel, October 1979.
</a>
</div>


A RAM as an integer indexed collection of bytes is mostly true for a simple computer, such as one based on the Intel 8085 CPU. Few CPU's are that simple today. The addresses seen by programmers are _virtual addresses_, which are not only a simple range of integers, multiple programs have their own space of addresses. A _physical address_ is assigned to each virtual address that merges all the virtual spaces into a physical space. 

We shall continue with the conceptualization of addresses as integers, as virtual addresses, which are closer to being physical addresses the simpler the CPU. Potential access to even this virtual space depends on the programming language. The focus of programming languages is to operate a meaningful data objects, such as integers, floating point numbbers and strings, and not raw bytes. 

Programming languages vary in how abstracted these objects are from their underlying construction. Ultimately, all data objects are built out of bytes. C language adopts a memory model which is abstract enough for most usual data objects to be used efficiently and platform independently, but can also treat the RAM memory in a raw manner as an array of bytes. 

A data location is given by a C type called a _pointer_. The pointer has integer like properties. It can be added to, allowing for the reference of a sequence of memory elements. Pointers can be compared for size in a way that is compatible with addition. On some platforms the pointer can be printed and will look like an integer expressed in hexidecimal notation.

The pointers can be derived from variables, to give reference to those variables. The pointer is then typed by what it points to. 

Pointers however have types according to their purpose. A pointer to an integer addresses a sequence of bytes, the number of bytes according to the precision of the number. A typical integer is 32 bits, hence the pointer refers to 4 bytes.

C language has defined the type `char` to be one byte. Hence a pointer to char provides our method to access the RAM natively. Operating systems are written in C language because of C language's ability to do this.

### Pointer notation

The notation for pointers is based on how the pointer will be used. The declaration of to integer, with name `p`, is `int *p`. This mean that the expression `*p` is of type `int`. The number of bytes in the data type `int` is returned by the C operator `sizeof`. This operator looks like a function call, but it is not a function, it is an operator. For 32 bit integers, `sizeof(int)` will return 4. Then `p+1` will return a location 4 larger than that of `p`. 

With `p` of type `int *`, so is `p+i` for integer `i`. Hence `*(p+i)` is type `int`. So this is retrieval or storage for the i-th element in a sequence of `int`'s, where the 0-th `int` is located at location `p`. C makes a more obvious notation for this as `a[i]`. So `a` is of type `int *`, as well as being an array.

Pointer notation can seem difficult, but it is always a picture of how the data will be used. A common situation is to have an array of strings. A string is an array of bytes with a special end of string character in the last byte. If this array name is `a`, the `a[i]` should be a pointer to char. Hence `*(*(a+i)+j)` would be the j-th letter of the i-th string. Overall, `a` can be declared as either `char ** a` or `char * a[]`.




In [1]:
%%file test.c

int main(int argc, char * argv[]) {
    
    return 0 ;
}

Writing test.c
