## References, Pointers and RAM memory


_burton rosenberg, 23 june 2023_


### Table of contents.

1. <a href="#intro">Sizeof</a>
1. <a href="#pointers">Pointers</a>
1. <a href=#arrays>Arrays</a>
1. <a href="#caveats">Caveats</a>




### <a name=intro>Sizeof</a>

We have looked at the architecture of the Intel 8085 processor, in my opinion, the must influential microprocessor design ever. We look at this microprocessor because it is simple enough to be documented in a few pages. There are a few new concepts that the 8085 did not have, notable protection and virtual memory, but it is pretty much exactly the computer we compute on today.

<div style="float:right;margin:2em;width:450px;border:1px solid green;padding:1em;">
<a title="MCS-80/85 Family User's Manual, Intel, October 1979" href="https://archive.org/details/Mcs80_85FamilyUsersManual/">
<img src="databuses_8085.png">
<br>MCS-80/85 Family User's Manual, Intel, October 1979.
</a>
</div>


We had mentioned that a basic computer had four components,

- The CPU, containing instruction sequence and decode, registers and and ALU for instruction executoin,
- A RAM memory, an integer addressible array of 8-bit bytes.
- An I/O system, which is quite various in its functionality and implementation
- And a bus to connect these three components.

This diagram from the MCS80/80 Family User's Manual helpfully shows us that the bus will have a channel for addresses, a channel for data, and a channel for control. 

C Language provides an almost direct contact with the RAM. Advanced languages such as Java and Python do not, and in fact attempt to hide the RAM through abstractions. Helpfully we will use C to understand RAM.

The architecture view of RAM is an array of integer addressible bytes, which almost certainly are 8-bits long. The byte aligns with the C languge type `char`. Both hardware and software collect up bytes with consecutive addresses to form data types that require multiple bytes. Most C compilers now support,

- The 2 byte short integer, 16 bits.
- The 4 byte integger, 32 bits.
- The 8 byte long integer, 64 bits.

The C operator `sizeof` will return the number of bytes in a type. Let us check this machine for the length of its data types. The C Language requires that a char is one byte.



In [1]:
%%file check_int_lengths.c
#include<stdio.h>

int main(int argc, char * argv[]) {
    
    printf("a char is %lu byte\na short int is %lu bytes\nan int is %lu bytes\na long int is %lu bytes\n",
          sizeof(char), sizeof(short int), sizeof(int), sizeof(long int)) ;
    return 0 ;
}

Writing check_int_lengths.c


In [2]:
%%bash
S=check_int_lengths
cc -o $S $S.c
./$S
rm $S.c $S

a char is 1 byte
a short int is 2 bytes
an int is 4 bytes
a long int is 8 bytes


### <a name=pointers>Pointers</a>

For a variable declaration such as `int i`, memory is set aside for the storage of the 4 byte integer `i`, and the compiler maintains a table binding the name `i` to a memory reference. The variable can appear in one or two contexts,

- If the variable appears in an expressing, the context is called an _R-value_, becaues this happens when the variable is on the left hand side of the assignment operator. The compiler writes machine code to retreive a value from the memory reference.
- If the variable appears in a context where it will be assigned into, the context is called an _L-value_, because this happens when the variable is on the left hand side of the assignment operator. The compiler writes machine code to store a value at the memory reference.

In these cases, the details of the memory reference are completely hidden from the programmer. The compiler has successfully abstracted away all the memory details into a name.

However, other situations require that the programmer have possession of a memory reference, and orchestrate the use of the reference to retrieve and store at the target of the memory reference. The reference is of type _pointer_, and the use of C pointers is fun but tricky.

The C pointer can give us access to memory that is very similar to the physical RAM device that implements the memory. It is a bit dangerous to use a pointer as if it were an integer address. There are both hardware and software reasons why this cannot be exactly true. To have truely platform independent code one should understand that C has taken a middle path of maintaining some abstraction in the pointer data type while exposing as much as possible the hardware aspect of the RAM.

- The first thing to know is that a pointer has a type, and the thing it refences has a type. So a pointer is a pointer to an integer, or a pointer to a char, or a pointer to a pointer to an integer. And so forth.
 - The second thing to know about a pointer is that the are declared with the `*` in a way that is a picture of how it will be used. So a pointer to int is declared `int * ip`. Therefore `* ip` is an`int`, and `ip` is a pointer to `int`.
 - And the last thing you should know about a pointer is they can be created from a variable by the `addresss of` operator `&`. For `int i`, then `&i` is an `int *`. And the reason why it is the __last__ thing you should know, is that the use of this operator, in my experience, is very rare and often a bad idea. A helpful fact is often `&` and `*` side by side will cancel each other out.
 
 
#### Operations on pointers.

There are some operations that can be performed on pointers, for use in expressions. The two most widely used are addition with an integer and comparison to the special _null pointer_, denoted `(void *) 0`.

##### Addition

Given a pointer `p` and an integer `i`, the expressions `p+i` and `i+p` are valid, The result is a pointer of the same type as `p`. How this looks in memory, is that the resulting pointer is a reference to the i-th item of the type that `p` points to, in a (possibly) boundless sequence of such items. 

##### Null pointer comparison

Two pointers of the same type can be compared. Using `==` they can be compared for equality. If equal, they point to the same place in memory. They can also be compared for inequality, discovering which data elements appear in higher memory addresses and which in lower. The rules for this are complicated beyond this C introduction.

There is a special pointer value, the _null pointer_, that is created as `(void *) 0`, however standard includes create the a macro NULL for this. A pointer can be compared for equally to the null pointer using `==`. 

#### Pointers as RAM

Given the above, at least a portion of RAM can be modeled as `p+i` where `p` is of type `char *`. One can try to get a view of all of RAM by converting to the null pointer to a `char *`, but then you'd be a pirate. 

You are also a pirate if you covert pointers to `long` and use them as integer addresses to RAM.

However, it is fun and instructive to be a pirate. 


In [3]:
%%file pirate-pointers.c
#include<stdio.h>

int main(int argc, char * argv[]){
    long int l_i = 0x0023456789abcdef ;
    char * p ;
    p = (char *) &l_i ;
    while (*p) {
        printf("%x ", (0xff)&*p) ;
        p = p + 1 ;
    }
    printf("%x ", (0xff)&*p) ;
    return 0 ;
    
}

Writing pirate-pointers.c


In [4]:
%%bash
S=pirate-pointers
cc -o $S $S.c
./$S
rm $S.c $S

ef cd ab 89 67 45 23 0 

### <a name=arrays>Arrays</a>

Given a pointer `p` and an integer `i` there is a short syntax for `*(p+i)`. It is `p[i]`, and `p` is then considered an array.

C provides a bit more than this syntatic shortcut for the use of arrays. It allows a pointer to a type to be defined and memory sufficient for the array to be allocated with a single declaration, `int a[10]`, as an example. The result is an variable `a` of type `int *`, and the `10*sizeof(int)` bytes referenced assending from the reference `a` are set set aside by the compiler for use by the array.

There is also an initalization syntax, for initialzing the array, see Harbison and Steele. One difference between the array `a` and the pointer to int `a`, is that `a` as an array cannot be assigned into. It is, in a sense, a constant pointer. 

For pointers within an array, they can be subtacted in such a way as consistent with `a[i]` being equal to `*(a+i)`. This gets a little tricky. While `a[i]-a[j]` subtracts the contents of the array at locations `i` and `j`, `(a+i)-(a+j)` subtracts two pointers and is equal to `i-j`. This is consistent with the array of bytes view of memory, except C is not all that free with memory references. There is a type and we are walking up and down in an array. In that case, we are incrementing as if memory references were integers.

In [5]:
%%file more-pirate-pointers.c
#include<stdio.h>

int main(int argc, char * argv[]){
    int a[10] ;
    int i ;
    for (i=0;i<10;i++) *(a+i) = i ; // same as a[i] = i
    printf("%d %ld\n", a[7]-a[4], (a+4)-(a+7)) ;
    return 0 ;
}

Writing more-pirate-pointers.c


In [6]:
%%bash
S=more-pirate-pointers
cc -o $S $S.c
./$S
rm $S.c $S

3 -3


In [7]:
%%file null-pointer-demo.c
#include<stdio.h>

int main(int argc, char * argv[]){
    char * s[5] = { "hello", ", ", "world", "!\n", (void *) 0} ;
    char ** s_p = s ;
    while (*s_p) {
        printf("%s", *s_p) ;
        s_p++ ;
    }
    return 0 ;
}

Writing null-pointer-demo.c


In [8]:
%%bash
S=null-pointer-demo
cc -o $S $S.c
./$S
rm $S.c $S

hello, world!




### <a name=caveats>Many caveats</a>

The notion of an address being an integer glosses over many details. 

First off, there is a difference between address that the CPU and the programmer see and addresses that the bus and the RAM see. The former are _virtual addresses_ and the later are _physical addresses_.

This is needed since many programs are running independently on the computer, making references to memory locations. Each must work in its own independent memory space, so they are not concerned if the address conflict across programs. This is accomplishes with a _virtual memory system_ that is implemented partially in hardware and partially by the operating systme in software. 

This is an advanced topic and we will refer to it only such as we have, in passing, so that you are aware. If you are interested, this is definitely a very interesting thing to learn about. Almost all computers now have virtual memory, however our Intel 8085 did not.

Second, seeking efficiceny, the memory stores and fetches tend to be in "buckets" rather than bytes. Sort of like going grocery shopping, where one makes a list of all associated items for a single trip to the store, a memory store or fetch will carry a _cache line_ of bytes, which include the byte of interest, but include other bytes, on the assumption that they also will soon be needed.

Thirdly, this bucketing is carried out on various levels, creating a level 1, level 2 and level cache. The idea of an integer addressible memory is not denied by these caches, but the notion that we are traveling to RAM with each memory request is. Data is batched even more than a cache line, to make sure that the slow process of accessing RAM is minimized when possible.

Fourth, the Intel architecture is a _segmented memory_ scheme, in which memory is tagged with a few flavors, such as Data, Text, and Stack. A memory address is a pair, an integer for the segment, and a segment descriptor. This is complicated beyond words and specific to Intel. However, other architectures are free to make exotic memory structures such as segments. These complications are best hidden, not just for the purposes of teaching, but for the purposes of C code that works on all architectures without source code changes.