# Module 1 - C Basics

## Strings

Recall that strings are arrays of characters, sometimes denoted as `char[]`, ending with the null terminator `\0` (`0x0`). Getting the length of strings vary depending on how they are created and used, either in the stack or in the heap. Some ways that strings are created are as follows.

In [1]:
char str0[] = "This is a string.";
char str1[] = {'H', 'e', 'l', 'l', 'o', '\n'};
char* str2; strcpy(str2, "Heap string.");

/tmp/tmp9elcflv6.c:3:25: error: expected ‘)’ before string constant
    3 | char* str2; strcpy(str2, "Heap string.");
      |                         ^~~~~~~~~~~~~~~
      |                         )
[C kernel] GCC exited with code 1, the executable will not be executed

`str0` creates a stack-allocate string using literals. It automatically adds in the null terminator. `str1` represents more in its array form, although adding the null terminator needs to be manual. `str2` is a heap-allocated string, also known as a character pointer. More in detail in **Module 2**. There are several ways to place a string in the heap but the most common is `strcpy`.

Due to how strings are represented, getting the length is more complicated. The only way in which the compiler aids is when the string is statically declared, in the same scope, and the length is known at compile time. For example, `str0` has the length `sizeof(str0) / sizeof(str0[0])`. Note that this way includes the null terminator. Let's break it down.

- `sizeof()`: a function that gets how many bytes the object is passed in
- `sizeof(str0)`: how many bytes `str0` uses (a character is one byte), including `\0` (18 bytes)
- `sizeof(str0[0])`: how many bytes `str0[0]` (the first element) uses (1 byte)

$ 18 / 1 = 18$ bytes

Note the emphasize on same scope. When an array is passed to a function, it decomposes so a pointer, so any information is lost.

Otherwise, however, the length must be manually calculated through looping or explicitly given. Considering getting the length of a string is one of the most common needs, `string.h` includes a `strlen()` function and does not includes the null terminator in its returned length. Nevertheless, there might be other instances that requires iterating through a string, so the most common way is to use the null terminator as an ending point while changing the pointer.

For example:

In [None]:
#include <stdio.h>

int main(void) {
	char str[] = "Some string.";
	char* temp = str;

	for (int i = 0; *temp != '\0'; temp++, i++) {
		if (i % 2 == 0) printf("%c\n", *temp);
	}
}

S
m
 
t
i
g


## Function Definitions

When a function is used (called), it is assumed that it is defined prior. In other words, functions are not hoisted. For example, the following code does not work.

In [None]:
#include <stdio.h>

int main(void) {
	char r = functionA(10);

	return 0;
}

char functionA(int a) {
	printf("Hi %d!\n", a);

	return 'H';
}

/tmp/tmp9gk6tnhc.c: In function ‘main’:
    5 |         char r = functionA(10);
      |                  ^~~~~~~~~
/tmp/tmp9gk6tnhc.c: At top level:
/tmp/tmp9gk6tnhc.c:10:6: error: conflicting types for ‘functionA’; have ‘char(int)’
   10 | char functionA(int a) {
      |      ^~~~~~~~~
/tmp/tmp9gk6tnhc.c:5:18: note: previous implicit declaration of ‘functionA’ with type ‘int()’
    5 |         char r = functionA(10);
      |                  ^~~~~~~~~
[C kernel] GCC exited with code 1, the executable will not be executed

However, it may work under certain circumstances, such as:

In [None]:
#include <stdio.h>

int main(void) {
	functionA();

	return 0;
}

void functionA() {
	printf("Hi!\n");
}

/tmp/tmptvt74u5s.c: In function ‘main’:
    5 |         functionA();
      |         ^~~~~~~~~
/tmp/tmptvt74u5s.c: At top level:
   10 | void functionA() {
      |      ^~~~~~~~~
/tmp/tmptvt74u5s.c:5:9: note: previous implicit declaration of ‘functionA’ with type ‘void()’
    5 |         functionA();
      |         ^~~~~~~~~


Hi!


But the errors indicate otherwise. When a function is called but not declared, the compiler assumes it's signature to be `int f(void);`. This can be not an issue when the return type is an `int` by coincidence (or no return), and parameters are not used. Aside from those cases, any other would result in a compiler error, as seen by the prior code.

Two ways to fix this is by either moving the function before it is called or by placing a forward declaration. Forward declarations take the form of the normal function signature but end with `;` instead of `{`. So the following illustrates such.

In [None]:
#include <stdio.h>

// Method 1: forward declaration
char functionA(int a);

// Method 2: definition prior
char functionA(int a) {
	printf("Hi %d!\n", a);

	return 'H';
}

int main(void) {
	char r = functionA(10);

	return 0;
}

Hi 10!


Also note that it is completely fine to have both a forward declaration and the definition prior. Some cases are when one wants to declare all functions at the top then define them later, as a way to indicate the reader about them.

## Arrays

Arrays are a foundational data structure. Underneath the hood, arrays are just contiguous bytes of data. How many bytes an element takes up depends on the type of element, and so, arrays can hold anything, as long as all of them are of the same type, at least in certain languages, such as C.

Considering this, arrays in C are just "pointers" pointing to the first element, and so, there is no information regarding them, especially its length.
And so, getting/knowing the length can be tricky, similar to strings (after all, strings are just character arrays).

One way is to use a sentinel, that is, an element indicating the termination. For example, strings have the null character as its sentinel. Deciding what sentinel to use depends on the type of data the array might hold.

In [None]:
#include <stdio.h>

int highestGrades(int grades[]) {
	int max = -1;
	for (int i = 0; grades[i] != -1; i++)	{
		int curr = grades[i];

		if (curr > max) max = curr;
	}

	return max;
}

int main(void) {
	int grades[] = {
		100, 78, 88, 99, 89, 82, 50, 20, 74, 75, -1
	};

	int highest = highestGrades(grades);
	printf("%d\n", highest);
}

100


In this example, we are given an array of grades, `grades`. We know that grades cannot be negative, so having a sentinel value as `-1` makes perfect sense. This allows us to loop through it using `grades[i] != -1`, similar to `*str != '\0'`. Note, however, that if using `grades` in the same scope as where it was defined, getting its length is easy: `sizeof(grades) / sizeof(grades[0])`. But since it is being passed via a function, it collapses to a pointer. In fact, the following is the assembly (in x86-64):

```nasm
0000000000001169 <getMax>:
    1169:	endbr64 
    116d:	push   rbp
    116e:	mov    rbp,rsp
    1171:	mov    QWORD PTR [rbp-0x18],rdi
    1175:	mov    DWORD PTR [rbp-0xc],0xffffffff
    ... 

00000000000011d0 <main>:
    11d0:	endbr64 
    11d4:	push   rbp
    11d5:	mov    rbp,rsp
    11d8:	sub    rsp,0x40
    11dc:	mov    rax,QWORD PTR fs:0x28
    11e3: 
    11e5:	mov    QWORD PTR [rbp-0x8],rax
    11e9:	xor    eax,eax
    11eb:	mov    DWORD PTR [rbp-0x30],0x64
    11f2:	mov    DWORD PTR [rbp-0x2c],0x5a
    11f9:	mov    DWORD PTR [rbp-0x28],0x1e
    1200:	mov    DWORD PTR [rbp-0x24],0x35
    1207:	mov    DWORD PTR [rbp-0x20],0x4e
    120e:	mov    DWORD PTR [rbp-0x1c],0x63
    1215:	mov    DWORD PTR [rbp-0x18],0x45
    121c:	mov    DWORD PTR [rbp-0x14],0x54
    1223:	mov    DWORD PTR [rbp-0x10],0x5e
    122a:	mov    DWORD PTR [rbp-0xc],0xffffffff
    1231:	lea    rax,[rbp-0x30]
    1235:	mov    rdi,rax
    1238:	call   1169 <getMax>
    ...
```

Unnecessary instruction were omitted.
Although this seems daunting, let's break it down.
The array gets created (in the stack) beginning at address `0x11eb`, placing it at `rbp-0x30`. We see this is true as in `0x122a`, we see `0xffffffff`, which is `-1`. Based on the label `<getMax>` and instruction `call`, we can infer this is were calling the function happens.
What about the argument?
Although this is for later elaboration, arguments get passed in a certain convention. In this convention, the first argument gets passed in `rdi`. So, since the array starts at `rbp-0x30`, instruction `0x1231` places the address to `rax`, which then gets placed in `rdi`.
In the function, at `0x1171`, the contents of `rdi` (in this case, the address of the array, aka a pointer) is loaded to `rbp-0x18`.
The rest does not matter for now but it just works on that address.
In the end, passing an array to a function collapses to a pointer, losing any information it had.

So, another way is by simply passing in its length through functions. This is most commonly seen by the `main()` signature: `int main(int argc, char** argv)`.
In this case, `argv` is an array of strings, and `argc` just indicates how many strings it has.

As mentioned many times, arrays can be considered pointers, so another notation to arrays is using the pointer: such as `int* arr`. This is equivalent to `int arr[]`. Similar to how setting the contents of an array can be done via `arr[i] = x`, doing so using a pointer array is easy as well. This gets into pointer arithmetic (a core feature in C). Accessing an element is analogous to dereferencing. And so, `arr[i] = x` is the same as `*(arr + i) = x`.

In [7]:
#include <stdio.h>


int main(void) {
	// Initialize two arrays of length 20
	// This automatically fills it in as zeros (depends)
	int arr[20];
	int arr2[20];

	for (int i = 0; i < 20; i++) {
		arr[i] = i * 20; // assignment using array notation
		*(arr2 + i) = i * 20; // assignment using pointer/arithmetic notation

		printf("arr[i]: %d\tarr2[i]: %d\n", arr[i], arr2[i]);
	}

	return 0;
}

arr[i]: 0	arr2[i]: 0
arr[i]: 20	arr2[i]: 20
arr[i]: 40	arr2[i]: 40
arr[i]: 60	arr2[i]: 60
arr[i]: 80	arr2[i]: 80
arr[i]: 100	arr2[i]: 100
arr[i]: 120	arr2[i]: 120
arr[i]: 140	arr2[i]: 140
arr[i]: 160	arr2[i]: 160
arr[i]: 180	arr2[i]: 180
arr[i]: 200	arr2[i]: 200
arr[i]: 220	arr2[i]: 220
arr[i]: 240	arr2[i]: 240
arr[i]: 260	arr2[i]: 260
arr[i]: 280	arr2[i]: 280
arr[i]: 300	arr2[i]: 300
arr[i]: 320	arr2[i]: 320
arr[i]: 340	arr2[i]: 340
arr[i]: 360	arr2[i]: 360
arr[i]: 380	arr2[i]: 380


## Bit Operations

Bit operations are commonly used in many systems programs, as they allow the manipulation of individual bits.

These bit operators are:
- `|`: bitwise or
- `&`: bitwise and
- `^`: bitwise xor
- `~`: bitwise not

The operations behind these bitwise operators follow the same logic as their logical/boolean counterparts.
For example, the OR truth table is:
| A | B | C |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |

Bitwise OR follows the same logic but on the bit level. For example, `1 | 2` is `01 | 10` in binary terms. Thus applying the operator is:

| Bit 1 | Bit 0 |
|---|---|
| 0 | 1 |
| 1 | 0 |
| - | - |
| 1 | 1 |

So `1 | 2` results in `3`.


Considering this capability, bit operations can aid in specific bit needs. Most commonly are
- extracting a bit (or bitrange)
- setting a bit (or bitrange)
- clearing a bit (or bitrange)
- toggling a bit (or bitrange)

There are two additional bit operations:
- `<<`: logical left shift
- `>>`: logical/arithmetic right shift

These, as their name implies, shift the bits of a number.
For example, `3 << 2` means shift `3` to the left by `2` bits. In binary form: `0b11 << 2`. So the two `1`s are shifted to the left by 2: `0b1100`. Note that shifting left creates `0`s on the least significant bits.

However, there are two types of right shifts, due to the nature of signed and unsigned. Arithmetic fills in the (leftmost/most significant) bits to `1`s instead of `0`s (which logical does). This preserves two's complements.

For example:
`-90 >> 2` when `signed char -90;` is in binary form `0b1010_0110 >> 2`, arithmetic shifting would result in `0b1110_1001`. Note that `0b10` from the end got "chopped off" and `0b11` were added in the most significant bits.
Using logical shift would lead to `0b0010_1001` instead.

Using a combination of the bitwise operators and bit shifts, the aforementioned needs can be applied.

To extract a bit (or bitrange), three operands are needed: the number itself, a mask, and a shift.
A mask indicates "how many bits" one wants to work on. Extracting a single bit would result in the mask simply being `0b1`. However, extracting a range would require more `1`s. For example, if the range is to be 4 bits long, the mask would need to be `0b1111`. The shift ensures the extracting begins at the right place and results in just the needed number.

In [None]:
#include <stdio.h>

int main(void) {
	// The bits to be extracted are the middle four

	int a = 0b110110;
	int mask = 0b1111;
	int shift = 1; 
	// shift of 1 since the "middle" starts at the 1st bit (0-indexed from the right), so shifting by 1 places the range at 0

	int extracted = (a >> shift) & mask;

	printf("Expected: %d; Actual: %d\n", 0b1011, extracted);

	return 0;
}

Expected: 11; Actual: 11


To set a bit (or bitrange) means making the bit to a `1` no matter what.

Note that bit operations are not C-specific, as they exist in other languages, but they are used a lot in c code.