# The Basics of Programming in C
## *A Living Tutorial With An Eye Towards Scientific Computing*

Written by Adam George Morgan

*Last Updated June 16, 2025*

The C programming language continues to be an important part of modern consumer and scientific software. The purpose of this notebook is to provide a "one-stop-shop" for the essential information needed to write and understand C code. I'm preparing it mostly for self-study, but I hope the content below can helps others as well. 

This tutorial unashamedly leans into my own biases: 
1) scientific computing is my primary interest, so I may exclude certain features of C that someone interested in another discipline like security or commerical software would consider "essential";
2) Python is my mother tongue when it comes to coding, so much of my discussion focuses on the differences between Python and C. If you're unfamiliar with Python, this notebook may not be the right resource for you (admittedly, if you've successfully launched this tutorial notebook and are reading this message, then you probably know enough Python to appreciate the material).

Consider yourself warned!

As my understanding of C in particular and coding in general becomes more refined, I'll return to this document and provide updates. Please get in touch with me (say, by opening up a GitHub issue) if you have any concerns with the material here, or suggestions for major improvements.

This notebook would not even be possible without Brendan Rius' creation of a C kernel in Jupyter: https://github.com/brendan-rius/jupyter-c-kernel. Thanks Brendan for the excellent work! 

### Why Learn C? 

I genuinely love programming in Python, but you can't go your whole life using the same tool for every job. So, what can C do that Python can't? Here are some answers relevant to my own interests. 

1) *C provides a pathway to writing fast code*. Python is not optimally performant: for instance, Python loops are notoriously slow. While NumPy and SciPy provide many wrappers of battle-tested Fortran codes to mitigate some of these performance issues, the time will probably come when you need to write your own low-level code to achieve good performance. In my experience, the two most common low-level languages used in scientific computing codes are Fortran and C++. Since C++ arose as an expansion of C, knowing C can only help you get a better grasp on great codes (proviso: C++ and C are a lot more distinct than their names suggest, so while C learning supplements C++ learning, it can't replace it).
2) *C provides a pathway for writing code that runs on faster machines*. Thanks to the power of modern GPUs, high-performance computing (HPC) is no longer the exclusive province of supercomputing clusters. NVIDIA's GPUs can be programmed with the CUDA toolkit which is essentially a C/C++-based interface between CPU and GPU. So, C is helpful for understanding modern HPC (although Python- and Julia-based versions of CUDA are also supported nowadays, so this argument holds a little less water than the first one).
3) *C appears to be everywhere*. It seems like every big project I work with uses some C here and there, and I would like to know what the hell is going on insofar as is possible. 
4) Finally, *the more languages you know, the better you understand coding in general*. 

### Other Helpful Resources

-*Modern C for Absolute Beginners* by Slobodan Dmitrović. Apress, 2024. 

-*Guide to Scientific Computing in C++*, ed. 2 by Joe Pitt-Francis and Jonathan Whiteley. Springer, 2017.

-*Solving PDEs in C++* by Yair Shapira. SIAM, 2006. 

### Some Unsolicited "Wisdom"

Here are some things I've learned to keep at the front of my mind with C programming:

1) Every byte used must be accounted for.
2) Every byte used must be cared for.
3) For his spiritual betterment, John the Baptist gave up the comforts of civilization and sustained himself on locusts and wild honey. For the same reason, we must (sometimes) give up the comforts of classes. 

### Our First C Program

Let's start with a "Hello, World!" program. This will illuminate the basic structure of all C scripts. 

In [1]:
#include <stdio.h>

int main() {
    printf("Hello World!");
    return 0; 
}

Hello World!

Let's dissect this code snippet. 

1) The first line (`#include <stdio.h>`) allows us to use a header script (.h file) containing the necessary functions for handling inputs and outputs, including printing. The `std` in the filename is short for "standard", and the `io` is short for "input/output". The angle brackets `<, >` are reserved when including scripts from the C standard library. `#include` is a **directive** for the compiler, telling it to use a script's contents during compilation.

2) You can probably figure out what `printf` does from context (note: it's taken from `stdio.h`), but what is `int main()` doing? The parentheses suggest it's a *function*, but why does it return `0` if the point of the script is just to print "Hello World!"? The answer is simply that all C scripts *must* include an `int main()` function returning `0` (and **NOT** `0.`! more on this later). In the same way that `__init__.py` files are integral to Python's directory structure, `int main()` is integral to C's script structure.

3) To be clear: `main()` is indeed a function. Note how the `return` statement is placed inside {braces}. 

4) `int` is short for "integer", and specifies the data type of `main()`'s `return` value. In C, you must declare your variables and their data types.

5) Note how each statement in the code terminates with a semicolon `;`. Deletion of any one of these semicolons will give a compilation error.

6) Unlike in Python, formatting and indenting are *optional*. You can pick from a variety of conventions regarding where braces and line breaks go, and how many spaces to use when indenting. The important thing is to stick with the choice you make and be consistent! In this notebook, we use four-space indents for nested code, keep { opening braces on current lines, and make new lines for } closing braces.

Strictly speaking, `main()` is a little flexible. You can pass the keyword `void` into `main` to avoid having to give a `return` value: 

In [2]:
#include <stdio.h>

int main(void) {
    printf("Hello World!");
}

Hello World!

Think of `void` as being short for "void output". Below, I'll prefer the `void` approach since it always saves us one line of code. As noted on p.10 of Dmitrović, the `void` approach is also slightly more bug-robust than the `int main() ... return 0;` one. 

Before moving on, I'll discuss some limitation of Jupyter C notebooks. For a typical Python notebook, the notebook is a "workspace" and variables from one block can be called in another. However, in a C notebook each block is treated as an individual script: in particular, it requires a `main()`. Additionally, in another departure from Python, C is *not compiled at runtime*. However, using a notebook makes it really seem like it is! I think the clarity that we gain by adding commentary in rich text is well worth the conceptual weirdness of appearing to make C compile and run simultaneously. 

### Declaring Variables. Data Types. 

In Python, I can instantiate a whole bunch of different sorts of variables without being very careful: I can just type `x = 1` to store the integer `1` in memory, I can type `x = ["32", 32.]` to have a list whose elements contain both string and float data types, and so forth. 

As already mentioned, however, in C you must be more careful. You need to be explicit about data types when you *declare* your variables. Here is an example where we allocate memory for three integers (by declaring them), assign two of them values, and set the third equal to the sum of the first two: 

In [3]:
#include <stdio.h>

int main(void) {
    int a, b, c; /* remember, int = integer */
    a = 1;
    b = 2; 
    c = a + b;
    printf("%d", c); /* %d is a stand-in for an int that comes in the second arg. of printf */
}

3

Here is a similar code where `a,b` are double-precision floating-point reals: 

In [4]:
#include <stdio.h>

int main(void) {
    double a, b, c; /* double = double-precision floating-point real number */
    a = 1.1;
    b = 2.2; 
    c = a + b;
    printf("%f", c); /* use %f for printing doubles instead of ints */
}

3.300000

C also has a type `float` for single-precision floating point reals. Declarations of these require a suffix "f". That is, you must write `float x = 1.1f` and *not* `float x = 1.1` because, counterintuitively, the latter choice surreptitiously upgrades `x` to a double-precision float. 

Often, we'll also have to work with the type `char`: C has no "basic" `string` type, but `char` is as close as it gets. 

In [5]:
#include <stdio.h>

int main(void) {
    char a, b;
    a = 'a'; /* chars are enclosed by SINGLE quotes ONLY */
    b = 'b'; 
    printf("%c \n", a); /* use %c for printing chars. Also, \n means "new line" */
    printf("%c", b);
}

a 
b

In each of the above examples, values were assigned to `a,b,c` after they were declared. Below, we'll also see that you can assign a value during declaration. 

In addition to C having fewer data types than Python, it has no option to write custom classes! This is, to my knowledge, a big reason why C++ is more widely used than C in the scientific computing world. 

Jargon alert: the `d, f, c` appearing after the `%` in the args. of `printf` above are called **format specifiers**. C is pretty sensitive about these, & funny things can happen if you use the wrong one, so be careful!

A comment on style: in C, it's typical to use snake_case for names of variables, functions, etc. That is, it's widely considered cleaner to do this

In [6]:
#include <stdio.h>

int main(void) {
    char my_first_var, my_second_var;
    my_first_var = 'a';
    my_second_var = 'b'; 
    printf("%c \n", my_first_var);
    printf("%c", my_second_var);
}

a 
b

and not this 

In [7]:
#include <stdio.h>

int main(void) {
    char myFirstVar, mySecondVar;
    myFirstVar = 'a';
    mySecondVar = 'b'; 
    printf("%c \n", myFirstVar);
    printf("%c", mySecondVar);
}

a 
b

even though both scripts work and have the same output (I shouldn't have to say that names like `my_First_Var` constitute blasphemy instead of mere style violations). 

### The `struct` Data Type

In Python, we can organize a bunch of complicated data into a single variable using a `dict`: a Python `dict` maps key strings to values, which can take any type. In scientific computing, `dict`s are useful for cleanly organizing function inputs and outputs. Is there something similar in C we can take advantage of? Yes! C has a rough analogue of `dicts` implemented with the `struct` (short for "structure") data type. Let's see how `struct`s work by example. 

In [8]:
#include <stdio.h>
struct MyStruct {
    int my_int;
    char *my_string;
};

typedef struct MyStruct MyStructTypeName;

int main(void) {
    MyStructTypeName s = {.my_int = 32 , .my_string = "octopus"};

    printf("%d \n", s.my_int);
    printf("%s", s.my_string);
}

32 
octopus

Breaking it down...
1) We use the `struct` data type to create a family of structures called `MyStruct`. An instance of `MyStruct` contains two **members**: an integer `my_int` and a string `my_string`. Notice that a `MyStruct` instance is not declared before `main()`! 
2) The line beginning with `typedef` lets the compiler know that declaring `MyStruct` instances with the custom type name `MyStructTypeName` is OK. This allows us to easily create instances of `MyStruct`. Strictly speaking, the "alias" `MyStructTypeName` should just be `MyStruct`, but I think it's pedagogically cleaner to avoid the repetition in the `typedef` when seeing this for the first time.
3) Inside `main`, we instantiate a `MyStruct` instance and store it in memory using the variable `s`. Note how we initialize `s` in an array-like fashion: order is not important because we're using member labels (keep those dots in mind!).
4) The values of the members of `s` are accessed for the `printf` calls with dot-notation: `s.member_name`. In fancy language, `.` becomes the **member access operator** when acting on structures and member names. This is analogous to how Python dict values are accessed via `dict["key"]`. Strictly speaking, the syntax is closer to how the attributes values of a Python class instance are accessed, but unlike Python classes, `structs` do not support methods. All in all, structures aren't exactly Python `dict`s or classes, though they share plenty of similarities to both of these familiar ideas. 
5) Finally, it's convention to use an "EachFirstLetterIsCapitalizedNoUnderscores" convention when naming C structures.  

It's easy to prescribe or edit the member values for a `struct` using the dot notation. For example, the code block below produces the same output as the previous one. 

In [9]:
#include <stdio.h>
struct MyStruct {
    int my_int;
    char *my_string;
};

typedef struct MyStruct MyStruct; /* now that we know about alias naming, repetition is OK */

int main(void) {
    MyStruct s;

    s.my_int = 32; /* no type needed because this is done in the definition of MyStruct */
    s.my_string = "octopus";

    printf("%d \n", s.my_int);
    printf("%s", s.my_string);
}

32 
octopus

When a `struct` is created, the computer makes room in memory for all of its members. If you only need to modify one member at a time, there is a memory-saving trick: use the `union` data type. See Dmitrović ch.16 for more on `union`.

### The `#define` Directive

If your code involves a repeated parameter used all over the place, you may wish to save on memory with the following trick: 

In [10]:
#include <stdio.h>
#define MY_PARAM 32

int main(void) {
    int x = MY_PARAM, y = 2 * MY_PARAM;

    printf("MY_PARAM = %d\n", MY_PARAM);
    printf("x = %d\n", x);
    printf("y = %d", y);
}

MY_PARAM = 32
x = 32
y = 64

Thanks to `#define`, `MY_PARAM` is a **macro** and not a variable: it is never stored in memory! Instead, during compilation, every time the text `MY_PARAM` appears it is replaced by 32. Thus we avoid storing the repeated parameter "32" *inside* a variable, saving (in this case) an `int`'s worth of memory.

Like `#include`, `#define` is a directive for the compiler: directives are indicated with a hash symbol `#` prefix. There is an entire zoo of helpful directives and built-in macros that we don't have time to cover here. For a survey of these additional directives *et cetera*, have a look at Dmitrović ch.23.

### Simple Logical Statements

Here is an example of an `if` statement in C: 

In [11]:
#include <stdio.h>

int main(void) {
    int a = 1, b = 2;
    if (a > 1) {
        printf("I am printing a = %d", a);
    } else {
        printf("I am printing b = %d", b);
    }
}

I am printing b = 2

Remember to close any braces you open! 

We can form more complex logical statements using the "and" operator `&&`, the "or" operator `||`, and the "not" operator `!`. I remark that "or" is a mathematical/nonexclusive or.

In [12]:
#include <stdio.h>

int main(void) {
    int a = 1, b = 2, c = 3;
    if (a > 1) {
        printf("I am printing a = %d", a);
    } else if (b != 1 && c < 3) {
        printf("I am printing b = %d", b);
    } else if (a > 1 || b == 32 || c > 1) {
        printf("I am printing c = %d", c);
    } else {
        printf("I'm all out of ideas!");
    }
}

I am printing c = 3

The `switch` statement allows for conditional execution in the case where each condition can be phrased on terms of a single variable being equal to a single value. `switch` is much less flexible than `if/else`, but its rigidity is compensated by its elegance. 

In [13]:
#include <stdio.h>

int main(void) {
    int x = 1;

    switch(x) {
        case -1:
            printf("x is -1.");
            break;
        case 0: 
            printf("x is precisely 0.");
            break ; 
        case 2:
            printf("x is 2.");
            break;
        default:
            printf("Oops! x is not equal to any of the given options."); 
            break; 
    }
}

Oops! x is not equal to any of the given options.

For simple conditional execution, C also includes a very handy **ternary operator** or **conditional expression** `statement ? a : b` which returns `a` if `statement` is true and `b` otherwise.  

In [14]:
#include <stdio.h>

int main(void) {
    int a = 1, b = 2, c;
    c = (a > 1) ? a : b;
    printf("%d", c);
}

2

**Warning**: C recognizes `0` as "false" and `1` as "true", but does not have a built-in boolean data type. If you really want to use boolean datastypes, put `#include <stdbool.h>` into your script. We won't use `stdbool` in this tutorial.  

### Loops 

The only thing to worry about with `for` loops in C is setting up the index properly. Defining a loop index consists of three statements:

1) the index declaration and initialization,
2) the range of values the index can take, and
3) the rule for incrementing the index between passes through the loop.

Here's an example that prints the sum of the first `N = 10` natural numbers (endpoint-inclusive). 

In [15]:
#include <stdio.h>

int main(void) {
    int N = 10, sum = 0;
    for (int k = 1; k <= N; ++k) { /* "++k" = shortcut for "k = k + 1" */
        sum += k; /* shortcut for "sum = sum + k" */
    }
    printf("%d", sum);
}

55

In general, `k++` and `++k` do not mean the same thing! The former puts `k` into an expression and returns `k + 1`, & the latter increments `k` and *then* puts it into an expression and returns `k + 1`. 

The same result can be obtained with a `while` loop, at the cost of turning the index into a global (script-wide) variable: 

In [16]:
#include <stdio.h>

int main(void) {
    int N = 10, sum = 0, k = 1;
    while (k <= N) {
        sum += k;
        k++; 
    }
    printf("%d", sum);
}

55

### Functions with Arguments 

So far, we've see that every C script must include a function `main()`. Defining your own C functions is pretty straightforward once you understand that declarations are imperative: naturally, we must specify the data type of the *inputs* and *outputs* of any function we want to define. Here is an example: 

In [17]:
#include <stdio.h>

int my_func(int n) {
    /* Takes in an integer "n" and returns "32 x n" */
    return 32 * n;
}

int main(void) {
    printf("%d", my_func(2));
}

64

I emphasize that the custom function `my_func` is *global*, hence it can be called within `main()`. We could also declare `my_func()` but not define it, use it in `main()`, and define it after `main()`. Thus C supports an elegant development process where you build your `main()` out of a bunch of helper function declarations, and then define the helpers after `main()` is written (such a design philosophy is conducive to efficient, modular testing). We could *not*, however, define `my_func()` or any other helpers inside `main()` (you can declare other functions in `main()`, though).

As in Python, functions can have multiple inputs, separated by commas. If you want a function `my_func()` to accept no input arguments, you can declare it as `[placeholder_type] my_func(void)` (the `void` is optional). If, on the other hand, you want a function to have no *outputs*, it should be declared with the type `void`. 

### Worked Example: Checkerboard

*Based on Chapter 1, Problem 5 in Shapira's book* 

Let's put together everything we've learned about logic, loops, and functions to perform a classical coding exercise. The goal is to write a script that prints an 8$\times$8 "checkerboard" where red tiles are represented by the symbol "+" and black tiles are represented by the symbol "-". We assume that the top-left tile on the board is red. 

I've chosen to solve this problem by defining two helper functions that tell us what symbol to place in the i,j-th tile of the board. From there, `main()` loops through each tile and determines its symbol. 

In [18]:
#include <stdio.h>

int even_and_odd(int i, int j) {
    /* True if first arg is Even and second arg is Odd */
    return i % 2 == 0 && j % 2 == 1; /* a % b = a mod b, as in Python */
}

char board_tile(int i, int j) {
    /* Get the i,j ^th tile in the checkerboard */
    int even_row_odd_column = even_and_odd(i, j);
    int odd_row_even_column = even_and_odd(j, i);
    return even_row_odd_column || odd_row_even_column ? '-' : '+'; /* Use single quotes for chars! */
}

int main(void) {
    /* Define number of tiles per axis in the checkerboard
    The qualifier "const" helps the compiler by saying that 
    a var's value will not change during execution */
    const int n = 8; 

    /* Fill up the checkerboard one tile at a time*/
    for (int i = 0; i < n; i++) {
      for (int j = 0; j < n; j++) {
        printf("%c ", board_tile(i, j)); /* note the white space */

        /* Do a linebreak at the end of each row of the board */
        if (j == n - 1) {
           printf("\n");
        }
      }
    }
}

+ - + - + - + - 
- + - + - + - + 
+ - + - + - + - 
- + - + - + - + 
+ - + - + - + - 
- + - + - + - + 
+ - + - + - + - 
- + - + - + - + 


### Addresses and Pointers

C is all about fine-grained control over memory. We've seen this idea appear in practice already: our C variables need to be *declared* so our memory knows how much space it needs to deal with them. Declarations, however, are only the tip of the iceberg. The computer has a finite number of bytes available in its memory at any given time. Each of these bytes can be labelled with a [hexadecimal](https://en.wikipedia.org/wiki/Hexadecimal) integer index, making this index the "address" of the particular byte. Note that C distinguishes hexadecimal notation with a prefix `0x`. For practical reasons that will become clear soon, one often needs to know the hex address of a given variable `x`. Given `x`, we can immediately determine its address using the operator `&`, as the script below demonstrates. 

In [19]:
#include <stdio.h>

int main(void) {
    double x = 7.89; 
    
    printf("The value of x is %f \n", x); 
    printf("The address of x in memory is %p", &x); /* addresses require the "p" format specifier */
}

The value of x is 7.890000 
The address of x in memory is 0x7ffd476c6988

There are a few things to note here. 
 
1) While `x` is a `double`, `&x` is a hexadecimal integer.
2) If you run the code box several times, the value of `&x` changes.

What's going on here? 

When the script above is run, the computer figures out an address where it can store the value of the `double x`. This address is almost guaranteed to change when the script is re-run: the byte we used on the previous run has probably been re-assigned to help with other tasks occupying the computer (automatically checking for software updates, for example). So, the output from the code block above *does make sense*!

The memory taken up by the variable `x` in this code is **statically allocated**, meaning that it is filled up at compilation time. Each time `main()` is called, then, we don't have to worry about re-defining `x`'s value in memory (of course, in this notebook compilation and execution are done simultaneously, so static allocation can't be seen by just printing addresses).  

A **pointer** is a variable that stores the address of some other variable. For example, if we initialize an integer 
`int x = 2`, then the variable `p_x` defined by `int *p_x = &x` (or: `int* p_x = &x`) is a pointer containing the address of `x`. We then say that `p_x` "points to `x`". 

If `p_x` is any pointer, we can access the value of the variable `p_x` points to with `*p_x`. That is, the **dereferencing** operator `*` acting on pointers is the inverse of the operator `&` acting on "normal" variables. 

Let's put the discussion from the previous two boxes into practice with a very basic script. 

In [20]:
#include <stdio.h>

int main(void) {
    int x = 2; 
    int *p_x = &x;
    
    printf("The value p_x points to is %d. \n", *p_x); 
    printf("The address of x (equivalently, the value of p_x itself) is %p.", p_x); 
}

The value p_x points to is 2. 
The address of x (equivalently, the value of p_x itself) is 0x7ffe41dd0d04.

There is a generic pointer class `void*`. Instances of `void*` can be assigned pointer values, and only after such an assignment can they be dereferenced. 

I have read that some folks have to cast a pointer to a generic pointer before printing via `p_x = (void*)p_x`, but it looks like this isn't necessary in a C Jupyter notebook (try it out in the above code box, for example). 

If, for some reason, you really want a pointer `p_x` to be assigned a value but aren't ready to *really* assign it a value, you can delcare and initialize `p_x` as the null pointer `NULL`. This is a special, reserved keyword, like Python's `None`. 

### Worked Example: Switching Two Variables with Pointers

*Based on Problem 4.2 in Pitt-Francis and Whiteley*

Suppose we have two integers `j` and `k` that have been assigned some values. Our task is to switch these values using pointers. 

In [21]:
#include <stdio.h>

int main(void) {
    /* Initialize two integer vars */
    int j = -1;
    int k = 32;

    /* Make a clone of j's value for later 
    The qualifier "const" tells the compiler
    that j_clone is not going to change 
    during execution */
    const int j_clone = j; 

    /* Define pointers */
    int *p_j = &j, *p_k = &k;

    /* Stage 1: Replace j with k using pointers */
    *p_j = *p_k;

    /* Stage 2: Replace k with original j-value via j_clone */
    *p_k = j_clone;

    // Check all works as expected.
    printf(" Actual result: j = %d, k = %d \n ", j, k);
    printf("Expected result: j = 32, k = -1");
}

 Actual result: j = 32, k = -1 
 Expected result: j = 32, k = -1

Fun fact: thanks to the `new` keyword, in C++ the above exercise can be done by introducing an auxiliary `int*` instead of the auxiliary `const int` we called `j_clone`. That is, we can switch using pointers *only*! This illustrates that, even though C++ is an "offshoot" of C, the two languages have some nontrivial foundational differences. 

### Arrays

I have a hard time conceiving of any scientific computing code that *doesn't* use some version of an "array". Those of us who are comfortable in MATLAB or NumPy may take arrays for granted owing to their ubiquity in these settings, and therefore will probably not think much about how arrays are stored in memory. As you probably expected to hear, however, in C the relationship between arrays and memory is essential knowledge.  

An array is just a chunk of memory that stores repeated instances of a particular data type. Each instance stored in the array, be it an `int`, `double`, or `char`, is called an **entry** of the array (just like in MATLAB, NumPy, etc.). The relationship between entries is encoded by their proximity in memory. 

Often, it's helpful to treat an array like a special type of pointer. An array of length `n` points to the zeroth entry in the array (C uses zero-indexing), and all subsequent array entries are stored consecutively in the next `n-1` chunks of memory. In fact, C supports you treating arrays like pointers. 

Don't believe me? Check out this code where we initialize an array of length 3, then compare the addresses of its *entries* to the *array* itself. 

In [22]:
#include <stdio.h>

int main(void) {
    int a[3] = {0, 1, 2}; /* array length is specificed with [3] here */

    for (int k = 0; k < 3; k++) {
        printf("%d \n", a[k]); /* k^th entry: just like in Python */
        printf("%p \n", &a[k]); /* address of k^th entry */
        printf("%p \n", a + k); /* this will also give the address of k^th entry, & convince you arrays are pretty much pointers! */
    }
}

0 
0x7ffde6b73630 
0x7ffde6b73630 
1 
0x7ffde6b73634 
0x7ffde6b73634 
2 
0x7ffde6b73638 
0x7ffde6b73638 


We've found that `&a[k] = a + k`! In particular, `&a[0] = a`, & you now 100% believe that arrays are very similar to pointers that point to the array's zeroth entries.

Note that the addresses of `a[0]` and `a[1]` (and `a[1]` and `a[2]`) differ by four. This is because the entries of `a` are of type `int`, and so take up four bytes of memory. If you change the code to make the entries of `a` the `double` type, you'll find that consecutive addresses differ by eight because doubles take up eight bytes. You should also experiment with the code above when `a` has `char` entries to reverse-engineer how many bytes `char` variables take up... you can check your answer using the `sizeof` operator. 

Finally, you should be asking "why isn't `a + 1` actually outputting the `int` address `a` plus one?". The answer is that adding one to a pointer takes you to the *next available address*, which depends on the type being pointed too. In light of the discussion of the previous paragraph, then, in the above code block we expect the integer `a + 1` to be `a` plus *four*, and this is indeed what we see. Similarly, adding an `int n` to a pointer takes you `n` memory chunks away. 

For more on the delicateness of the relationship between arrays and pointers, see the discussion on [array-to-pointer decay here](https://stackoverflow.com/questions/1461432/what-is-array-to-pointer-conversion-aka-decay). It's also important to note that arrays are automatically converted to **actual** pointers when used as function arguments!

At this point, you know enough to be careful with arrays. Let's try a similar experiment on a two-dimensional array (or matrix). We can think of a 2D array `a` as an "array whose entries are arrays", or (with a small grain of salt) as a pointer to a pointer. 

In [23]:
#include <stdio.h>

int main(void) {
    int a[3][3] = {{0, 1, 2}, {3, 4, 5}, {6, 7, 8}}; /* C does row-first indexing */

    for (int j = 0; j < 3; j++) {
        for (int k = 0; k < 3; k++) {
            printf("%d \n", a[j][k]);
            printf("%p \n", &a[j][k]);
            printf("%p \n", a[j] + k);
        }
    }
}

0 
0x7ffe34d05f20 
0x7ffe34d05f20 
1 
0x7ffe34d05f24 
0x7ffe34d05f24 
2 
0x7ffe34d05f28 
0x7ffe34d05f28 
3 
0x7ffe34d05f2c 
0x7ffe34d05f2c 
4 
0x7ffe34d05f30 
0x7ffe34d05f30 
5 
0x7ffe34d05f34 
0x7ffe34d05f34 
6 
0x7ffe34d05f38 
0x7ffe34d05f38 
7 
0x7ffe34d05f3c 
0x7ffe34d05f3c 
8 
0x7ffe34d05f40 
0x7ffe34d05f40 


Unfortunately, C isn't natively capable of printing any given array all at once. If you want to see all entries of an array, you'll have to use `for` loop through all entry indices as we've done above (but see below for an exception in the case of `char` entries). To a Pythoner this seems like an awful idea, but remember that C is so fast that `for` loops don't cause performance issues. 

In C, strings are arrays with character entries: since we've by now internalized that arrays are basically pointers tailored for storing consecutive memory slots, this makes perfect sense. Thus, the "entries" of a string can be accessed with the square bracket subscript notation. 

In [24]:
#include <stdio.h>

int main(void) {
    char my_string[] = "A string is an array of chars"; /* arrays can be initialized with [] instead of [num_entries] */
    printf("%c\n", my_string[0]);
    printf("%c\n", my_string[1]);
    printf("%c\n", my_string[5]);
    printf("%c\n", my_string[13]);
    printf("%s\n", my_string); /* unlike generic arrays, strings can be printed all at once! */
}

A
 
i
n
A string is an array of chars


Before going further, it's important to note that C stores 2D arrays in memory by unwinding row-wise. That is, the zeroth row is first in memory, then the first row, and so on. As noted in section 1.14 of Shapira, this is important when it comes to efficiency considerations while looping through arrays. 

### Worked Example: Pascal's Triangle

*Adapted from section 1.19 in Shapira*

Given an integer $n$, we want to print array giving the top $n\times n$ slice of [Pascal's triangle](https://en.wikipedia.org/wiki/Pascal%27s_triangle). We orient our triangle so that the "1" at the tip is in the bottom-right corner of the array, necessitating a little bit of care in our `for` loops. 

In [25]:
#include <stdio.h>

int main(void) {

    const int n = 3;

    int a[n][n];

    /* Fill the left col and bottom row with 1's */
    for (int i = 0; i < n; i++) {
      a[0][i] = a[i][0] = 1;
    }

    /* Pull existing array elements to form the triangle from the bottom-up */
    for (int i = 1; i < n; i++) {
      for (int j = 1; j < n; j++) {
        a[i][j] = a[i][j-1] + a[i-1][j];
      }
    }

    /* Now we have to print the triangle from the top-down */
    for (int i = n - 1; i >= 0; i--) {
      for (int j = 0; j < n; j++) {
        printf("%d ", a[i][j]);
        if (j == n-1) {
           printf("\n");
        }
      }
    }
}

1 3 6 
1 2 3 
1 1 1 


Admittedly, the formatting of the print looks a little wonky for $n>3$, but the point is to learn the practical mechanics of array construction and manipulation. 

### Recursion

C also supports **recursion**, meaning that a function is allowed to call itself in its definition. Here's an example where we use a naive recursion to compute $j^k$ where $j,k$ are integers. 

In [26]:
/* can use the "unsigned" type to stress that we're only dealing with positive exponents */
int power(int base, unsigned exp) {
    return exp ? base * power(base, exp - 1) : 1;  
}

#include <stdio.h>

int main(void) {
    printf("2^9 = %d", power(2, 9));
}    

2^9 = 512

### Dynamic Memory Allocation

Recall that so far all of the memory for our variables has been statically allocated: during compilation, the memory for a particular variable is carved out and filled up. While the particular value of the variable is (pretty much) allowed to change throughout program execution, the amount of memory taken up by the variable cannot change. For instance, we cannot statically allocate memory for an array of length three and then change the array's length to 32 later in the program. This is a big limitation in scientific computing, where we often want to continually append entries to an array as we pass through a loop. 

The limitations of static memory allocation can be bypassed using **dynamic memory allocation**. In dynamic allocation, memory is allocated during execution instead of during compilation. Dynamically-allocated memory is flexible, but needs to be carefully tracked in a script. So, a simple variable like `const int x = 32` can be statically allocated with no issues (indeed, dynamically allocating memory for `x` is counterintuitive and needlessly confusing!), but a changing string will definitely need to be dynamically allocated (see below). 

The total portion of memory available for dynamic memory allocation is called the **heap**. 

A (possibly) helpful analogy: static memory is like the food a *voyageur* packs in his canoe when he begins his trip, and dynamic memory is like the food he finds by foraging or trapping during the trip. 

Let's see how to perform dynamic memory allocation via example. 

### Worked Example: Changing Strings

Consider a string variable `my_string` taking the value `"a string"`. We want to change the value of `my_string` to `"a new string"`. 

Since strings (remember of course that by "C strings" I mean "C arrays whose values are `char`s") cannot be assigned (though they can, confusingly, be initialized), we can't just declare `my_string` initialized with value `"a string"` and then set `my_string = "a new string";`. Try it yourself if you don't believe me!  

Even though we can't assign arrays with values, the `string.h` headerfile includes a helpful function `strcpy(s1, s2)` (short for "string copy") that copies the value of the string `s2` into the string `s1`. This solves the problem, right? 

In [27]:
#include <stdio.h>
#include <string.h> /* header for working with strings */

int main(void) {
    char my_string[] = "a string"; 

    printf("Before: %s\n", my_string); 

    strcpy(my_string, "a new string"); /* copy the second argument string into the first one */

    printf("After: %s\n", my_string); 
}

/tmp/tmpivhiolht.c: In function 'main':
    9 |     strcpy(my_string, "a new string"); /* copy the second argument string into the first one */
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/tmpivhiolht.c:5:10: note: destination object 'my_string' of size 9
    5 |     char my_string[] = "a string";
      |          ^~~~~~~~~


Before: a string
After: a new string


[C kernel] Executable exited with code -11

Wrong! This code throws up warnings because `"a new string"` has 12 characters while `"a string"` has 8 characters, so when we try to copy `"a new string"` into `my_string` there's not enough room. The error message is helpful here! Note that the error message talks about sizes 13 and 9 instead of sizes 12 and 8 because each C string includes an extra "terminator character" to signify its end. 

Holy moly! Even with a helper function we can't do the problem with static memory allocation. Let's see an alternative solution using dynamic memory allocation. As usual, discussion will follow after the code. 

In [31]:
#include <stdio.h>
#include <string.h>
#include <stdlib.h> /* needed for dynamic memory allocation */

int main(void) {
    char* my_string = malloc(9 * sizeof(char)); /* dynamically allocate memory for my_string */

    /* If the memory allocation succeeded, continue */
    if (my_string) {
        strcpy(my_string, "a string"); /* assign my_string the value "a string" */

        printf("Before: %s\n", my_string); 

        my_string = realloc(my_string, 13 * sizeof(char)); /* reallocate the memory for my_string to consist of more bytes */

        /* Proceed only if reallocation was successful */
        if (my_string) {

            strcpy(my_string, "a new string"); /* reassign my_string the desired final value */

            printf("After: %s\n", my_string); 
        }
    }
    free(my_string); /* finally, free the memory you allocated previously z*/
}

Before: a string
After: a new string


This gives a correct, clean answer with no size-related warnings! Let's understand why. 

1) Instead of statically allocating `my_string` as in the first "solution", here we dynamically allocate this variable using the function `malloc()` (for "memory allocation"). You tell `malloc()` how much memory you want to allocate for a given variable (in this case, (8 + 1) times `sizeof(char)`, the number of bytes taken up by a `char`) and it sets this memory aside at runtime. More specifically, `malloc()` returns a pointer to a reserved chunk of heap memory with the given size. 
2) Since we've set aside memory for `my_string`, we can use `strcpy()` to give it the value `"a string"`.
3) To change the value of `my_string`, we re-allocate the memory allotted to this variable using `realloc()` (for "re-allocate"). `realloc()` takes in two arguments: the first is the variable whose memory budget we want to change, and the second argument is the new memory budget. In our example, we want to change the size of `my_string` to (12 + 1) times `sizeof(char)`.
4) Once memory is successfully reallocated, we can use `strcpy()`again. Easy enough!
5) Finally, **any memory that is dynamically allocated must be freed (using `free()`) before the program terminates**. In this simple program we could technically get away with forgetting `free()` but in general this is not a good idea. Consider a modification of the above exercise where we had to change the values of a huge number of strings. If we dynamically allocated memory for each string but didn't free it when the string's value was changed, we could easily run out of usable memory and crash the program! The unnecessary loss in available heap memory due to an imbalance between allocations and `free()` calls is known as **memory leak**.

There is an additional dynamic allocation function called `calloc()`. This works like `malloc()`, but initializes all values in the allocated memory to 0. 

*A final remark on the changing string exercise*: in my brief experience with C I've seen lots of programmers pooh-pooh the use of `strcpy()` because, as we've discovered, incorrectly tracking the number of characters in the source and destination strings can lead to warnings/errors. For this reason, you could make the argument that a function like [`memcpy()`](https://www.geeksforgeeks.org/memcpy-in-cc/) would be more robust here. However, for the purposes of demonstrating the efficacy of dynamic memory allocation, the problems with `strcpy()` are pedagogically desirable! 