# Module 2 - Memory

The concept of memory is pervasive in C. Knowledge about the layout and how to access it is vital to many operations in C. Additionally, considering the low level, there is the need to manage memory, especially for the heap.

Although the process space has different sections, the more important ones are:
- stack
- heap
- code segment
- data segment (static [read-only], initialized, uninitialized)



## Stack

Recall that the stack grows (downwards is the norm but it can also grow upwards). Additionally, the stack has stack frames (also known as activation records or activation frames) that are per-function, which keep track of locals and arguments. This is one form of a scope, the function scope. The call stack (or function stack or also just The stack), works just like the stack data structure. Recall the classic operations: push and pop. This applies for the call stack, as in pushing and popping frames, and more in detail, pushing/popping data.

At the macro level, stack frames are pushed and popped off the stack each time a function gets called (push) and returns (popped). At the micro level, variables/data are/is pushed and popped off. Recall the assembly in **Module 1**. The instruction `push` (and its counter `pop`) push/pop the data provided to it. This performs other operations underneath the hood (not within this scope) but it indicates the beginning/end of a function. Data is pushed only if it needs saving*.

*When looking at dissassembly of C code, I often see the compiler pushing data that there is no need to save but maybe it is because of the design trying to compile general code.

Data that is not statically created (this gets placed in the data segments) is placed in the stack. One clear example is in **Module 1** for the example with `highestGrades`. In `main()`, the `grades` array is declared and built. Looking at the assembly, one can see `mov DWORD PTR [rbp-0x30],0x64`. For now, `rbp` just points to the current pointer to the stack. So, it moves `0x64` to the stack at position `rbp-0x30`. Additionally, this is how arguments are passed in, although depending on the calling convention.

When a frame is popped off, nothing is being removed. The pointer is simply moved. And when a new frame is pushed, the pointer is changed, meaning the new data overrides the old. This is why it is dangerous to return pointers to stack-allocated data.

In [None]:
#include <stdio.h>

char* fn(void) {
	char n = 'C';
	return &n;
}

int main(void) {
	char* r = fn();
	printf("%p\n", r);

	return 0;
}

/tmp/tmpmhe27yel.c: In function ‘fn’:
    6 |         return &n;
      |                ^~


(nil)


As seen above, doing so will give a warning and it does not even return the address, as seen by "(nil)", meaning null. However, the error can be circumvented.

In [None]:
#include <stdio.h>

char* fn(void) {
	char n = 'C';
	char* np = &n;
	return np;
}

int main(void) {
	char* r = fn();
	printf("%p: %c\n", r, *r);

	return 0;
}

0x7fff3535d36f: C


The built-in check does not happen due to returning the pointer indirectly, instead of `return &n`. Suppose after collecting the address in `main()` and keeping the pointer `r`, another function is called and new data is written to `0x7fff8b9a182f`. When attempting to read from `r`, it will read the new data. This is just a simply scenario but it has dangerous implications.

## Heap

Recall that the heap is where "objects". To be generic, where dynamically allocated data goes. This can be as simple as a single integer.

In [None]:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
	int* heapInt = malloc(sizeof(int) * 1);
	*heapInt = 4;

	printf("Heap int@%p: %d\n", heapInt, *heapInt);

	free(heapInt);

	return 0;
}

Heap int@0x5637e4839840: 4


Accessing data placed in the heap requires the usage of pointers (see next section), since this allows the use/access of such data. However, unlike the stack when popping data, data in the heap does not get automatically collected. Hence, the need for memory management. The system varies between languages, some using garbage collection (like Java, Python, Javascript) while others require manual collection (C, C++, Zig). There also exists a "new" model, implemented by Rust via its borrowing model.

In Java, heap allocations are (implcitly) done via the `new` keyword, such as `String str = new String("Heap!");`. The runtime then periodically runs the garbage collector to check whether to deallocate `str` (and other objects). One does not need to worry as much regarding memory due to this. However, memory consumption/usage is required to think about when writing C. If no data is freed when done being used (memory leak), over time, the leaks will pile up while more memory is requested, until at some point, the operating system just stops it. This is why it is imperative to free data once it is no longer used and manage the freeing properly, as other errors such as double frees can pop up.

There is a useful tool called `valgrind` that can check memory errors, such as double frees, memory leaks, invalid reads/writes, etc. However, it sometimes can be a bit cryptic on what the error specifically means or what may have caused it.

## Pointers

Pointers are scattered all over C code and are a foundational point of writing C. 9/10 times you will encounter and/or use pointers. As mentioned before, recall that pointers are merely addresses (aka numbers). So, `0xaef0` can be considered both an address and a "normal" number. However, considering machines these days are 64 bit, meaning it can address using 64 bits, the range of addressable locations is quite big. Additionally, certain ranges of addresses are reserved for certain sections for a process (Process Address Space). For example, the Aru32 architecture has its entire memory layout as follows:
| Address Range           | Purpose    |
| ----------------------- | ---------- |
| 0x00000000 - 0x0003FFFF | Reserved   |
| 0x00040000 - 0x2003FFFF | Bootloader |
| 0x20040000 - 0xA003FFFF | User Space |
| 0xA0040000 - 0xA007FFF  | Buffer     |
| 0xA0080000 - 0xFFFFFFFF | Kernel/OS  |

with the process address space as:

| Address Range           | Section            |
| ----------------------- | ------------------ |
| 0x20040000 - 0x2007FFFF | Uninitialized Data |
| 0x20080000 - 0x2008FFFF | Constant Data      |
| 0x20090000 - 0x2018FFFF | Data               |
| 0x20190000 - 0x2098FFFF | Text               |
| 0x20990000 - 0x6098FFFF | Heap               |
| 0x60990000 - 0x609907FF | Safeguard          |
| 0x60990800 - 0x709907FF | Stack              |
| 0x70990800 - 0xA007FFF  | Unused             |

So for user code running on the IAru-0, pointers are only* valid between `0x20040000` and `0x709907ff`. If a pointer would be the number `0x20de4` and then attempted to be used, the system would see that it is ran by a user procress and the address is in a reserved space, thus it would lead to a segmentation fault (oh the dreaded segfault!) and crash. This leads to the most common type of segfault: null pointers.

Null pointers are just `0x0`. So having `int* ptr = 0x0` would be the same thing as `int* ptr = NULL` (except the compiler would complain). Similar to the Aru32 architecture, many other systems have the space from `0x0` to some address reserved, especially `0x0`. So when accesing address `0x0`, it leads to a null pointer dereference.

Nonetheless, pointers are how data is referred to from anywhere (including code), including accessing (dereferencing) the data.

A more common way of thinking pointers are references, similar to the concept of pass-by-reference found in Java. Using pointers allows passing and changing data through functions, which is helpful in certain scenarios where returning multiple values (and one value is an input) is needed.



Since pointers are just numbers, basic arithmetic can be applied. However, even though operations other than addition and multiplication are technically allowed, it would lead to bugs, safety breaches, and undefined behavior.

Adding/subtracting pointers simply allows us to "move" through memory. This was seen prior regarding arrays. Since arrays are contigious bytes of memory, iterating through them simply means adding to a pointer then dereferencing. If `int* arr;`, then `*(arr + 3)` would first add 3 to the address held by `arr`, then from there, access/dereference it. However, recall that integers are 4 bytes long. If `+3` is to be taken literally, `arr+3` would result in holding the first/last (depending on endianness) byte of the integer, which is a big no-no. By doing `*(arr+3)`, we meant to get the third elementh of `arr`. Instead of figuring out how many bytes it takes from `arr`, when performing pointer arithmetic, the compiler already takes care of the data type sizing, so `arr+3` (when taken literally) would be `arr + (4 * 3)` (integer is 4 bytes, 3rd position).

A major caveat is regarding void pointers. Since void is consider a type (or rather a lack thereof), having `void*` simply means a pointer to an arbitrary address, meaning we do not care of the type of data at that address (can be implemented to be a generic). When trying to access a `void*`, the way to go is casting to a known type. This would then provide information regarding sizing.

In [None]:
#include <stdio.h>

int main(void) {
	int a = 3;
	void* ap = &a;
	printf("%p\n", ap);
	ap += 2;

	printf("%p\n", ap);

	return 0;
}

0x7ffd237134bc
0x7ffd237134be


Doing arithmetic does it by byte, But dereferencing is an issue:

In [None]:
#include <stdio.h>

int main(void) {
	int a = 3;
	void* ap = &a;
	printf("%p\n", ap);
	ap += 2;

	printf("%p: %d\n", ap, *ap);

	return 0;
}

/tmp/tmpbf2rqyx2.c: In function ‘main’:
   10 |         printf("%p: %d\n", ap, *ap);
      |                                ^~~
/tmp/tmpbf2rqyx2.c:10:32: error: invalid use of void expression
[C kernel] GCC exited with code 1, the executable will not be executed

### Aside: Function Pointers

Since pointers are just addresses in memory, and code lives in memory, pointers for code does exist, although at higher level languages than assembly, the only valid pointers for code are function pointers. Function pointers have a slightly weird syntax, and it (obviously) requires a function. It's form is `ret_t (*fn)(params_t)`.

In [None]:
#include <stdio.h>

long func(char* str, int len) {
	int r = 1;

	// do stuff...

	return r;
}

int main(void) {
	long (*funcptr)(char*, int);

	funcptr = &func; // get the address of `func`
	// or alternatively:
	// funcptr = func;

	printf("%ld\n", funcptr("Pointers!", 9));

	return 0;
}

1


The example above is merely for demonstration purposes. One more clear example are class methods (which underneath the whole methods are simply function pointers). Although C is not a true OOP language, it can support some of its pillars and can be "emulated" via structs.

In [None]:
#include <stdio.h>

typedef struct _3DPoint {
	double x;
	double y;
	double z;
	void (*set)(struct _3DPoint*, double, double, double);
	void (*show)(struct _3DPoint*);
} _3dpoint_t;

void set(_3dpoint_t* point, double x, double y, double z) {
	point->x = x;
	point->y = y;
	point->z = z;
}

void show(_3dpoint_t* point) {
	printf("[%f, %f, %f]\n", point->x, point->y, point->z);
}

int main(void) {
	_3dpoint_t point;
	point.set = set;
	point.show = show;

	point.set(&point, 2.3, 3.2, 1);
	point.show(&point);

	return 0;
}

[2.300000, 3.200000, 1.000000]


The field `set` is merely a pointer that, instead pointing to the stack or the heap, points to the code of the given function, in this case `void set(_3dpoint_t*,double,double,double)`.

Structs will be explained in more detailed in the following module.