# The (very) basics of C, and how it relates to Python

The reference implementation of Python is built on C. Many of the high performant libraries e.g. Numpy are coded in C. Furthermore, fundamentally there are several patterns that appear in the fringes of Python that are easier to understand if you have a grasp of the basic pattens of C.

So, to avoid future issues and problems that can arise from making a duck typed, interpreted, and IMO 'loose language' your only touchstone for programming, we are going to dive into some very basic C. Some of this C will be directly relevant to Python and some will be designed to deepen your conceptual understanding of how computers work.

C is a good touchstone, but if you have an understanding of any language that requires manual memory management you will get most of the benefits to understanding that should be conveyed by the following.

### Breaking down hello world in C 

Looking at the code from the introduction
[hello_world.c](../../../edit/Intermediate%20Python/Intermediate%20Python/c_scripts/hello_world.c)

```C
#include <stdio.h>

int main() {
    printf("Hello World!");
    return 0;
}
```

This is an incredibly simple script that will teach us little more than how to lay out C. 
The first line, the `#include <stdio.h>` is the `standard` `input` `output` library.
This is directing the compiler to include the functions, variables, types, and constants that are in the `stdio` library. 
This library will be stored on your computed and the compiler will know, or be told, where to find it.
We need it here to be able to access the `printf` function that we use to output the string to the command line.

The `main` function is the code entrypoint, when the code is run it starts here. 
Preceding main is `int` or the functions return type, this will mean that on success the function will return 0. Returning an integer is standard behavior and is used to do error handling.
The `return 0;` is where the main function exits and returns 0 to signal success. Anything other than 0 is assumed to be an error of some kind.
Another common pattern is the `void` return type, where the function does not return.
The function content is defined within `curly braces`. Unlike in Python the indentation does not matter, but we maintain it for readability.
The final line of code `printf("Hello World!");` is largely self explanatory.
The last notable feature here is the semicolon `;` that you may have noticed on the end of each line. This terminates the expression. In Python the terminating character is newline, but C will ignore newlines and spaces altogether.
Given that, this function could be written:


```C
#include <stdio.h>
int main() {printf("Hello World!");return 0;}
```

This is valid C, its just harder to read and is thus not preferred. Readability is important!

### Creating Variables

In Python a variable is created by simply assigning it a value.

```Python
my_string = 'hello world'

print(my_string)
```

We can then use print to output that variable by simply sending it to print.

The equivalent C script is as follows:

```C
#include <stdio.h>

int main() {
    char my_string[12] = "hello world";
    printf("%s", my_string);
    return 0;
}
```

Not only is this longer, it's also vastly less flexible.

The following two scripts contain these codes:

[variable_hw.py](../../../edit/Intermediate%20Python/Intermediate%20Python/python_scripts/variable_hw.py)

[variable_hw.c](../../../edit/Intermediate%20Python/Intermediate%20Python/c_scripts/variable_hw.c)

The next cell will compile the C and run the codes, there are a few exercises to get you started modifying C code.

1. Try to modify the printed string in Python, and in C, to a message of your choice.
2. Try to use the `void` pattern, where the C function does not return.
3. Try to make the functions return an integer value instead of a string.

The cell below the cell which compiles and runs the functions will have hints and discussion points, but you are strongly encouraged to try and get these three problems working without. __Expect errors__.

In [None]:
%%bash

python3 ./python_scripts/variable_hw.py

gcc -o ./c_programs/variable_hw ./c_scripts/variable_hw.c
./c_programs/variable_hw

<details>
<summary>Task 1</summary>

<details>
<summary>Help I'm getting errors</summary>

warning: initializer-string for array of ‘char’ is too long

Look at the number you use when declaring the string variable.

Weird looking output �

Look at the number you use when declaring the string variable.

Anything else...

Go back to the original code and only change line 5.

</details>

<details>
<summary>Solution</summary>

``` python
my_string = 'my new string'

print(my_string)
```

``` C 
int main() {
    char my_string[14] = "my new string";
    printf("%s", my_string);
    return 0;
}
```

Why? 

C needs to know beforehand how much memory the string will take to store. 
Or more accurately, it needs to be told to allocate a piece of memory that is the same size or larger than the string. 
So this `char my_string[50] = "my new string";`, is valid, if inefficient, code.
Also also, we can allow the C compiler to do some work for us.
So this `char my_string[] = "my new string":`, is also valid code.

More on this later.

</details>

</details>

<details>
<summary>Task 2</summary>

```C
void main() {
    char my_string[14] = "my new string";
    printf("%s", my_string);
}
```

ERRORS!

This is where we find the danger of a void main and the state in the jupyter notebook gets problematic.
Without the return type Jupyter's bash magic gets upset and just breaks.

</details>

<details>
<summary>Task 3</summary>


<details>
<summary>Help I'm getting errors</summary>

error: invalid initializer

or

17224 Segmentation fault: 11 

Try swapping char with int.

Still broken probably with Segmentation fault

Look at the printf function `%s` refers to string `%d` we use for decimals.

</details>


<details>
<summary>Solution</summary>

``` python
my_int = 42

print(my_int)
```

``` C 
int main() {
    int my_int = 42;
    printf("%d", my_string);
    return 0;
}
```

</details>

</details>

### What can we take from this, other than C is more picky than Python?

This helps us jump to the main lesson you should learn from working with C when it comes to programming.
This lesson will help us write more reliable Python and more performant Python.


## Think about memory

Programming, is as much or more about programming the computer memory than it is about programming the calculation.

In this example, we have done two very simple tasks. Each of which is a few things.

1. Set `my_string` to the value "hello_world".

``` C 
char my_string[12] = "hello world";
```

Deconstructing this, the important points are:
- `char`, this tells the compiler that the datatype we are storing is a 'character'
- `[12]`, this tells the compiler that we will be storing 12 characters in a row

The actual data stored ("hello world") is largely irrelevant here; both of these items are there to let the compiler 'reserve the correct chunk of memory'.
Each `char` is one byte, which is 8 bits (A bit can be 0 or 1). There are 11 characters in the string we want to store, plus we need one null character ('\0') which is sort of an invisible full stop for the computer (it tells the computer the text has ended and to stop reading).
The `[12]` is thus required to inform the compiler to reserve 12 lots of 8 bits, or 96 bits total.
If you leave the number out the compiler can calculate this.

Taking the Python equivalent:

```Python
my_string = 'hello world'
```

The memory of the computer still needs to take 12 bytes to store 11 characters + null character, however on top of this Python uses additional bytes to store other information, so using the following code

```Python
import sys
my_string = 'hello world'
print(sys.getsizeof(my_string))
```
```
60
```

Python is using 60 bytes where C used 12.

2. Print `my_string` to stdout

```C
printf("%s", my_string);
```

What the `printf` function will do is start reading out the bytes in the string until it encounters a null character which it knows should stop output. However, to know to do this it needs to know it's looking at a string in the memory allocated in `my_string`, which we inform printf of by giving the code `%s`. There are other codes for other types (int, float, etc.) but we won't go into that.

```Python
print(my_string)
```

Python, by comparison, is much more simple. 
`print` can query the additional information stored in the extra bytes in the string, determine the data-type and data length, then output.


### So what does this mean

In C we must be explicit about our memory, we need to define it's size and also how to read it. 
In Python this is all automatic but comes at the cost of memory and speed.

## Pointers

This will over simplify pointers, but it will give you just enough information to understand some of the most important aspects of C/Python.

To work with memory at a high level we need to know two things.

1. How large is the memory object in bits?
2. What is the location of the first bit?

To make sense of the second point here we can think about memory as a long long list of boxes:

#IMAGE

Each box has a label, and can store a 0 or a 1.

`pointer` is the name we give to the variable that stores this memory address. 
However, pointers go one further and point to a memory location of a given type, the compiler can then use this to do smart things that makes coding easier and memory access safer.

### Why?

The answer here again comes down to memory. If we have GB of data in our RAM, and we wanted to pass that data to a function, it is far more efficient to pass the location of that memory then a copy of the values of that memory.
The function can then dictate the access pattern to the memory to optimize for efficient read/write.

There is far more to this topic but for now this will suffice.

### How?

The C pointer syntax needs to be able to do the following things:

1. Define a pointer to a memory location
2. Get a pointer to a memory location

These use the `*` and `&` syntax:

An `*` denotes that we are creating a pointer, or are expecting a pointer.
An `&` preceding a variable gets the pointer to the memory location of that variable

For example:
```C
// define a variable for an integer and a pointer for an integer
int A_var;
int * A_ptr;
// At this point the pointer and variable are unlinked 
// We have given them similar names because we intend 
// to link them later


// Assign A the value 8
A_var = 8;
// Get the memory location of A_var and assign it to A_ptr
A_ptr = &A_var;
// Now the pointer points to the memory (We could have done this without assigning A_var)
```
3. Update or modify a pointer to a new value


```C
// define a variable for a integer and a pointer for an integer
int A_var;
int * My_Int_ptr;
// At this point the pointer and variable are unlinked 
// My_Int_ptr is just a pointer to an integer, same in all ways
// as A_ptr but named differently to note its use case


// Assign A the value 8
A_var = 8;
// Get the memory location of A_var and assign it to My_Int_ptr
My_Int_ptr = &A_var;
// Now the pointer points to the memory containing 8

// Define a new variable and assign it
int B_var = 4;

// We can update My_Int_ptr to point at B_vars memory location
// just like we could update the value of a variable
My_Int_ptr = &B_var;

```

4. Access the data at a memory location referred to by a pointer

```C
//Reusing the first example:
// define a variable for a integer and a pointer for an integer
int A_var;
int * A_ptr;
// At this point the pointer and variable are unlinked 

// Assign A the value 8
A_var = 8;
// Get the memory location of A_var and assign it to A_ptr
A_ptr = &A_var;
// Now the pointer points to the memory (We could have done this without assigning A_var)

// We can 'dereference' the pointer to get to the value
// in essence *A_ptr is the same as A_var
printf("%d", *A_ptr);
```

This is where pointer syntax can get a bit murky, as we are using * both to say 'this here is a pointer', and 'make this pointer act like a variable'*
<details>
<summary>*</summary>
Not strictly true
</details>

To convince yourself of how this works have a look at the supplied code and try the challenges:

This following cell will compile and run the code
[pointer_basics.c](../../../edit/Intermediate%20Python/Intermediate%20Python/c_scripts/pointer_basics.c)


In [None]:
%%bash
gcc -o ./c_programs/pointer_basics ./c-scripts/pointer_basics.c
./c_programs/pointer_basics

## Pointer Challenges

1. Remove (`//` will comment the line out) the assignment of A_var (Line 10). Try to predict what will happen. Then try to explain in words why it happened.

2. Remove the pointer assignment (Line 13). Try to predict what will happen. Then try to explain in words why it happened.

Put lines 10 and 13 back.

3. After printing the value of *A_ptr on line 18, update the value of A then print the value of *A_ptr again. Try to predict what will happen. Then try to explain in words why it happened.

4. Now add a line to directly update the value in the memory (so don't use A_var) and then add the line of code `printf("%d\n", A_var);`.

5. We can use `printf("%p\n", A_ptr)` to print a pointer. This is commented out in the code but you can uncomment it (one place at a time). Try to predict and explain what happens and why.

## More more more

There is much more to pointers then this, we have not in fact seen their most frequent use case yet. 
What we have learnt is that there are two ways of thinking about variables and therefore memory.
We can use the **value** or we can use the address or **reference**. 
The pointer is the variable that explicitly handles this behavior. 

### I just want to do data science / plotting / maths

Leaving C here will give you the glimmer of understanding that will allow you to better utilize Python.
The interaction between programmer and memory is the thing that is being abstracted by Python.
We will conclude with some examples of how and why this is happening before moving into the intermediate Python topics in the following notebook.


Consider the following Python code, and try to add comments around how it is working, and what overhead this is incurring compared to C:

In [None]:
# Create a variable:
a = 1
# Your analysis: 

# Print the variable:
print(a)
# Your analysis:

a = 1.1
# Your analysis:

print(a)
# Your analysis:


def my_function_no_arg():
    a = 2
    print(a)

my_function_no_arg()
print(a)
# Your analysis:

def my_function_with_arg(a):
    a = 2
    print(a)

my_function_with_arg(a)
print(a)
# Your analysis:

Lets do something similar with a different data structure:

In [None]:
print("a = b---------")
# Create a variable:
a = [1, 2, 3, 4, 5]
# Your analysis: 

# Print the variable:
print(f"a = {a}")
# Your analysis:

a = [1.1, 2.2, 3.3, 4.4, 5.5]
# Your analysis:

print(f"a = {a}")
# Your analysis:

b = a

print(f"b = {b}")
# Your analysis:

print("Popping b------")

b.pop()

print(f"a = {a}")
# Your analysis:
print(f"b = {b}")

print("the same but in functions----------")

def my_function_no_arg():
    a.pop()
    print(f"a in function = {a}")

my_function_no_arg()
print(f"a = {a}")
print(f"b = {b}")
# Your analysis:

def my_function_with_arg(c):
    c.pop()
    print(f"c in function = {c}")

my_function_with_arg(a)
print(f"a = {a}")
# Your analysis:

def my_function_with_internal(c):
    _c = c
    _c.insert(1,1)
    print(f"c in function = {c}")

my_function_with_internal(a)
print(f"a = {a}")
print(f"b = {b}")
# Your analysis:


print("and finally this---")
def my_function_with_reassignment(d):
    _d = d
    _d = [1, 2, 3, 4, 5]
    print(f"_d = {_d}")
    _d.pop()
    print(f"_d = {_d}")

my_function_with_reassignment(a)
print(f"a = {a}")
print(f"b = {b}")
# Your analysis:

See if you can understand what Python is doing here, if you can well done because its not obvious.

<details>
<summary>The answer and why this matters is here:</summary>

For the case of int and float (and others) they are always assigned and referenced as values. This means 

``` Python
a = 1
b = a
```

Takes the value of a, creates a _new_ variable b and assigns it that value.
Note Python here is handling data types for us, and on reassignment to a new type Python will reallocate the memory.
Think about the difference between:

```Python
a = 1
print(a)
a = 1.1
print(a)
```

and

```c
int a = 1;
printf("%d", a)
free(a)
float a = 1.1;
printf("%f", a)
```

Same outcome, more work in C.

Back to the unintuitive behavior, with lists (and others) they are a weird pointer-y blend.

```Python
a = [1,2,3,4,5]
b = a
```

here `b` is a pointer to `a`. Why? Because it's quicker, potentially a lot quicker.

As a result, when you modify b you modify a, because both variables are referencing the same memory.

This creates all of the behavior seen above. 

However the last example reassigns, cutting the clutter. It does the following:

```Python
a = [1,2,3,4,5]
b = a
b = [1,1,1,1,1]
b.pop()
```

and here `a` is unchanged. This is because the line `b = [1,1,1,1,1]` contains a new constructor in the form of `[1,1,1,1,1]`. This assigns _new_ memory and thus the `=` updates the pointer to that memory location, so the memory location `a` references is left alone.
</details>

In a nutshell Python is trying to reduce the amount of memory it creates. You must know and understand this, or the following will happen in some form:

In [None]:
a = [1, 2, 3, 4, 5]
print(f"  a: {a}")

b = a
for i in range(len(a)):
    b[i] = b[i] + i
print("add: +")
print(f"  b: {b}")

c = []
for i in range(len(a)):
    c.append(a[i] + b[i])
print(" eq: =")
print(f"  c: {c}")

Briefly the unaware programmer was trying to create a new array `b` where each element takes the value of `a`'s corresponding element plus the value of the index and sums the two arrays.
Not wishing to understand why the unaware programmer would want to do this just wishing for them to receive the correct answer we can be sympathetic when they are disappointed to find the answer is given as twice `b` and not `b+a`.

We as the newly informed programmers know that `b` was a pointer to `a` and thus the first loop by modifying the elements of `b` was actually modifying the memory location referenced by `a` and `b` so the solution to us is logical although not intuitive.

Challenge: Update the following code cell to produce the expected answer by modifying the way we create b.
Extended challenge: Look at the results of %%timeit to make your code as performant as possible, just remove the print statements first.

In [3]:
a = [i for i in range(1000)]
print(f"  a: {a}")

b = a
for i in range(len(a)):
    b[i] = b[i] + i
print("add: +")
print(f"  b: {b}")

c = []
for i in range(len(a)):
    c.append(a[i] + b[i])
print(" eq: =")
print(f"  c: {c}")

  a: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,

Example %%timeit cell
Run this and make note of the time. A good solution will both be quicker, give the correct answer, and produce the correct output from print statements (when they are not commented).

In [4]:
%%timeit
a = [i for i in range(1000)]
# print(f"  a: {a}")

b = a
for i in range(len(a)):
    b[i] = b[i] + i
# print("add: +")
# print(f"  b: {b}")

c = []
for i in range(len(a)):
    c.append(a[i] + b[i])
# print(" eq: =")
# print(f"  c: {c}")


69 μs ± 895 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [5]:
%%timeit
a = [i for i in range(1000)]
# print(f"  a: {a}")

b = []
for i in range(len(a)):
    b.append(a[i] + i)
# print("add: +")
# print(f"  b: {b}")

c = []
for i in range(len(a)):
    c.append(a[i] + b[i])
# print(" eq: =")
# print(f"  c: {c}")

70.2 μs ± 1.05 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [6]:
%%timeit
a = [i for i in range(1000)]
n = len(a)
# print(f"  a: {a}")

b = [a[i] + i for i in range(n)]
# print("add: +")
# print(f"  b: {b}")

c = [a[i] + b[i] for i in range(n)]
# print(" eq: =")
# print(f"  c: {c}")

63.8 μs ± 748 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Solutions

<details>
<summary>Slow but correct</summary>

```python
%%timeit
a = [for i in range(1000)]
#print(f"  a: {a}")

b = []
for i in range(len(a)):
    b.append(a[i] + i)
#print("add: +")
#print(f"  b: {b}")

c = []
for i in range(len(a)):
    c.append(a[i] + b[i])
#print(" eq: =")
#print(f"  c: {c}")
```

</details>

My quicker solution was around 9% faster.

<details>
<summary>Quicker and correct</summary>

```python
%%timeit
a = [i for i in range(1000)]
n = len(a)
# print(f"  a: {a}")

b = [a[i] + i for i in range(n)]
# print("add: +")
# print(f"  b: {b}")

c = [a[i] + b[i] for i in range(n)]
# print(" eq: =")
# print(f"  c: {c}")
```

Making use of list comprehensions reduces the number of times arrays are resized

</details>

For a more visual overview of pointers and how Python uses them, you can watch [this video](https://www.youtube.com/watch?v=0Om2gYU6clE) (15 minutes long).

## Next section 
Now we can learn Python, safe in the knowledge that when we find something acting like a pointer we can debug it. 
We also know that Python will be 'helping' with our memory, but sometimes our design patterns lead it to do the wrong thing.

[02-other-data-structures](./02-other-data-structures.ipynb)

