# Theory and perspective

## Introduction

Python is a Turing complete scripting language, this means it is a very useful tool to solve problems with logical algorithms. Simpler said it tells your computer how to crunch data or do computations for you.

If you are reading this you are most likely a biology student and wish to analyze a few data.
This is what I intend to teach you here. In my personal experience the adoption of  programming languages is not hindered by the amount of “How to code”- literature. The internet is full of different sources that teach you how to code, what they do not teach you is how to think like the people that created the languages or tools you are using. This matters for **you** especially,
because your training as a biologists differs in some fundamental aspects from that of mathematicians, computer-scientists and physicists. These differences in training enable you to understand and investigate biological processes in a way that I could never, but they also hinder you in developing and utilizing *code*. For this reason I will focus more on the way of thinking than on syntax and libraries. Once you have understood the fundamental concepts you should be able to close the gaps using the [official documentation](https://docs.python.org/3/) and [tutorial[(https://docs.python.org/3/tutorial/index.html).

Let me begin by quoting the first paragraph from the [official tutorial[(https://docs.python.org/3/tutorial/index.html).

> Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

I assume this paragraph did not teach you much, which can be attributed to two gaps in your knowledge. The first gap is your vocabulary or your knowledge of definitions. This is knowledge that you can easily acquire by consulting textbooks or Wikipedia. I attempted to mark the terms that are stronger influenced by this here:

> Python is an easy to learn, powerful *programming language*. It has *efficient* *high-level* *data structures* and a simple but effective approach to *object-oriented* programming. Python’s elegant *syntax* and *dynamic typing*, together with its *interpreted* nature, make it an ideal language for *scripting* and rapid *application* development in many areas on most *platforms*.

So understanding these terms is a hurdle, especially because most of the terms are suspiciously familiar, like *efficient* for example. I doubt however that you measure efficiency in *operations* and *memory* consumption. The term *efficient* in this paragraph should be understood in these terms however, but as mentioned before this is not the major challenge. Let me highlight a two more words for you:

> Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a **simple** but effective approach to object-oriented programming. Python’s **elegant** syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

The terms here show the second gap in your knowledge, the way you approach, understand and interact with **problems**. Yes for you a problem is probably a hurdle, an inconvenience, an obstacle, for me it is the fundament of scientific study. For me sciences consists of **problems** and **solutions**, these fit together like puzzle-pieces enabling us to understand and control the world. Only if you have both a **problem** and a **solution** can you publish, if you publish an observation you publish something you believe will help someone else to **solve** a question. So this person has a question or **problem** and  you provide a **solution**. This question-answer-relationship is anchored rather deeply in computer-science, where you as the user want to achieve something, thereby defining a **problem** and then creating your **solution** based on the work of other software-developers.

I chose the previous example to show you the rather alien of thinking. It may seem familiar to you now, but unless you studied a mathematic related subject previously I assure you it is not. To stress this point let us discuss **simple** and **elegant**. What do they mean here? I would claim they refer to a very mathematical concept of perfection, which we may embody by the sentence: “Say precisely enough”. This means everything that needs to be said is said and anything else is not. This is often achieved by very few strict and solid definitions. Python for example was initially lauded for its rather small number of [reserved terms](https://docs.python.org/3.13/reference/lexical_analysis.html#keywords) and [built-in functions](https://docs.python.org/3/library/functions.html). It was considered **simple** and **elegant**, because it use **few** terms and combined them into a bigger system. It is **simple**, because it has **few** constituent parts and **elegant**, because they achieve the larger goal by their interaction. It is in the interaction of the objects, concepts and definitions that beauty can be found and is referred to here, but teaching a course on mathematic and logical aesthetics is beyond me so I will summarize with: It takes roughly 3 years to learn and you are fine hobbling along and ignoring it. What you can not ignore however is the way of thinking around it.

Let me [quote](https://en.wikiquote.org/wiki/Donald_Knuth) Donald Knuth a rather well known computer scientists on what makes a good programmer:

> The psychological profiling [of a programmer] is mostly the ability to shift levels of abstraction, from low level to high level. To see something in the small and to see something in the large. 

What he refers to can be described as the ability to change perspective, to switch from a cellular few to the organ to the organism and to the entire ecosystem depending on the question asked and more importantly to stay quiet about the aspects that do not matter for the question. In other words to **say precisely enough**.

This way of thinking makes great engineers and physicists, as they are able to find the common characteristic for a brown bear, a moose and a sack of grain on the road. The all are an obstacle that slows down traffic. They all have a position on the road, which can be expressed by the distance along it. This information is sufficient to steer traffic away from it and therefore precisely enough for a road manager.

The central mechanism they use to describe things are *equivalence*-classes. *Equivalence* means we can change entities in a context without anything changing. If some one is thirsty it does not matter if we hand them water or ice-tea, therefore water and ice tea are *equivalent* in the context of quenching thirst. Just like a 100 €-bill and a 10 €-bill are equivalent for buying one bag of flour. It does not matter which one you have it buys a bag of flour, at least until inflation ruins this example. An *equivalence*-class describes all things that are *equivalent* within a context. Since a brown bear, a moose and a sack of grain all block the road they are part of the same *equivalence*-class.

So once it is clear what they share they also *decide* to perceive only the characteristics of things they deem relevant. So if something blocks the road they only need to know were to drive. Since the road is a long line they only need to know how far from the beginning of the road the obstacle is placed. The pitfall is already integrated in the example here: Removing a sack of grain and a living animal are two quite distinct tasks requiring distinct equipment. Note that they would not attempt to attach the nature of the obstacle to their description, but prefer to say something like “bear spray required for removal”, since while a bear always requires bear-spray there might be other obstacles that can be removed with bear-spray, so the “bear spray required for removal” is the larger *equivalence*-class.

Now you may wonder why you were trained to think different, then answer is quite simple. This way of thinking makes quite poor observers. If you hand them a piece of paper and tell them to observe a rat for 30 minutes there is a good chance they wrote five sentences: “Rat walked to the far end of the cage. Rat walked to the center of the cage. Rat ran around the cage. Rat walked back to far end of the cage. Rat walked around the cage.” You know that is is a not “precisely enough”, but they only will only know after you begin asking questions.

The correct use is of abstraction is next to a few fundamental mathematical concepts used to abstract problems the difference between a capable and an incapable programmer. The difference between endless despair and flying levity, it is what I hope to demonstrate and teach you in this course.

## Course plan

After the philosophical part let us get an example so we know what we are talking about. Let us assume you have a friend Alice and a friend Bob. Alice is a computer-vision specialists, this means she can find things in images. Bob researches cancer. Bob is interested in the growth of cancer-cells, so he gets five Petri dishes and adds his cell culture and for twelve days he photographs them every day with a microscope. Assume it looks like this image from [wikipedia](https://commons.wikimedia.org/wiki/File:384_microwell_plate_imaged_with_2.5_x_magnification_in_3_channels_with_ZEISS_Celldiscoverer_7_%2830614936632%29.jpg)

![ExampleImage](https://upload.wikimedia.org/wikipedia/commons/6/64/384_microwell_plate_imaged_with_2.5_x_magnification_in_3_channels_with_ZEISS_Celldiscoverer_7_%2830614936632%29.jpg)

Since counting all this cells would take a long time and also risks introducing human mistakes Bob asks Alice for help. Alice runs an computer-vision program over the images and creates a few comma-separated-values-files (.csv) as output, she then goes on vacation. Considering that he has no idea how to evaluate the files Alice sent him Bob approaches you to help him.

So the goal of the course is to take a few comma-separated-values-files and create a few plots to learn something about the data they contain. To begin this process we will first talk about data, **variables**, **values** and **operators**. These are the fundamental building blocks of our code. Afterwards we will talk about **control-structures**, like **loops** and **conditional-instructions**. These allow us to react to our code and process an arbitrary amount of data. We then will talk about **functions**, **classes** and **modules**. These help us organize our code and ensure we only deal with the things we really need to think about. In the end we will use external modules to visualize and analyze the data. This is the final step that helps us answer Bobs questions.

So the course is separated into five parts with separate content and goals:

| Part | Content                     | Goal                                              |
| ---- | --------------------------- | ------------------------------------------------- |
|    1 | Theory and perspective      | You have an idea how programmers think            |
|    2 | Fundamental building blocks | You can add numbers                               |
|    3 | Control structures          | You can write short simple programs               |
|    4 | Classes and modules         | You can write more complex programs               |
|    5 | External modules            | You visualized data and can now learn on your own |

We will follow this example through the course and use it to illustrate the different steps.

## Approaching a problem

So let us begin with the first question: How do we start?
What questions should are relevant?
What do we need to proceed?

Please partner up into groups of two or three people and write down in the next cell what you need to know to begin your plan. Please remember “Say precisely enough”.

Behind this spoiler is my suggested answer. It is not the right answer, but I will claim that it is an right answer, as in there is more than one. Please do not open the spoiler until you are confident that your answer in the cell above is correct.

<details>
  <summary>Click to reveal suggestion</summary>

    I propose that we need to understand the **problem** before we can design the **solution**. So we have to ask Bob what he wants or how he measures growth first and second what Alice already provided us with.

</details>

So first let us ask Bob what he means by growth. “Well the size and the number.”  he answers. What does this mean for us? It means we have to find a way to calculate or visualize the change of the number of cells and the size they occupy in the images.  To do this we first have to calculate it. If we just consider the number if cells we want to have a table describing the number of cells in each Petri dish. So we want to create something like:

| Day | Dish 1 | Dish 2 | Dish 3 | Dish 4 | Dish 5 |
| --- | ------ | ------ | ------ | ------ | ------ | 
|   1 |     12 |     21 |     31 |     15 |     27 |
|   2 |     20 |     39 |     57 |     24 |     62 |
|   3 |     43 |     76 |    112 |     61 |    112 |
|   4 |     87 |    151 |    209 |    119 |    205 |
|   5 |    172 |    299 |    421 |    235 |    398 |
|   6 |    351 |    612 |    871 |    472 |    772 |
|   7 |    721 |   1224 |   1721 |    932 |   1398 |
|   8 |   1404 |   2450 |   3554 |   1791 |   2765 |
|   9 |   2900 |   5011 |   7212 |   3451 |   5132 |
|  10 |   5832 |  10182 |  14781 |   6827 |  10091 |
|  11 |  10915 |  19923 |  28732 |  13001 |  19872 |
|  12 |  19983 |  35871 |  50321 |  25874 |  38762 |

Remember the numbers are made up, they just visualize what we want to achive.

So now that we know where we are going we should ask were we are starting or in other words, what Alice provided. So we took at the files Alice sent an see that they all named similar:

```
Day_1_dish_1_zoom_3.csv
Day_1_dish_2_zoom_3.csv
Day_1_dish_3_zoom_3.csv
Day_1_dish_4_zoom_3.csv
Day_1_dish_5_zoom_3.csv
Day_2_dish_1_zoom_5.csv
Day_2_dish_1_zoom_3.csv
```

It seems like Alice encoded so called **meta-data** into the file names. **Meta-data** describe something that is not described in the data itself, in this case the day of recording, the dish that was used and the zoom-factor of the microscope. This is rather convenient for use, since these data are often provided in separate files or within the file itself, which requires a little extra effort to combine the information again. 

So now let us take a look into one of the comma-separated-value-files to see what data she managed to extract from his images. Assume we see something like this:

Since I made up the example I have a rough idea what they are supposed to mean. In this example wish to show how ambiguous data is often represented and how we can work with it anyway. 

So now that we looked at our inputs and our goal we can begin creating a plan on how to get there. It is now the time to exercise some abstraction and impose a structure on our solution, so I want you to get together in groups of two or three people and write a simple set of instructions to get to the table we defined as our goal. The goal is to turn them into code later, so you should stay abstract enough in your abstractions. Please remember “Say precisely enough”.

Once again, there usually is no perfect solution, not only because we are all human and we all make mistakes, but because the way you approach a problem mirrors the way you think. Considering that the code you write is read by other people, just like the plan you just designed together it should not only follow your way of thinking but also theirs. This means that in software there are solutions that are preferred by convention and tradition. They are used, because they always were and are therefore familiar to most users. This does neither mean they are good nor that you have to follow them, just that other people will recognize them and understand them faster.

<details>
  <summary>Click to reveal suggestion</summary>

What we are doing is essentially counting so:

1. For every file we do the following:
	1. We open the file
	2. We figure out what day and dish it is
	3. We create a counter for the number of cells
	4. We create a counter for the area covered by the cells
	5. We ignore the first line
	6. For every line we do the following:
		1. We increase the cell counter
		2. We add the cell area to the cell-area counter
	7. We save the cell-counter
	8. We save the area counter
</details>

The solution is an algorithm, a set of clear instructions that lead to a defined result. The difference between the algorithms you were taught when you did worked on cell-cultures, or mixed chemicals to prove that something existed within a cell and computer-algorithms is the participation of the executing party. If I tell you to grab a glass of water you and put it on my table you will grab it and out it upside-up with the water on my table, because you inferred what I wanted. To achieve this you used your human experience and your ability to reason about the world to assume “He wants to drink the water later, so I have to ensure it is not spilled.”. Computers can not do this, because they have no human experience and can not reason like you. A **computer is a logical machine**, a predictable system. It has, by design, no ability to act outside the strict parameters it was given. 

To visualize the difference between the way you reason and a computer does imagine a landscape full of hills and creeks. When you think you wander through this landscape you walk up onto a hill and look around, before you explore a little valley and then you may slowly wander to your final destination. Now imagine a large metal ball roughly 3 meter in size and place it on a slope. What will it do? It will roll down the hill, not sideways not upwards just down until it comes to rest. This is how computers reason. If they begin at a certain point in the landscape they always reach the same final solution the same final state. The commands you will give them decide where the hills and valleys are and the parameters where they start. This is why some people tell you, that instructions for a computer have to be very detailed, or in our analogy that the landscape has to very finely crafted with tiny valleys in the slopes a few centimeters wide. This is not correct. Instructions for computer **do not need to be detailed**, but **you have to know how the logic will flow** or in our mental image how the ball will roll.

Neither your program nor your computer will deceive you, purposefully misunderstand your instructions or arbitrarily deviate, only humans have enough freedom in their thinking to do this. You may have the impression that your program or machine learning algorithm do these things, but they just follow their logical path, this means that the burden of fault lies entirely with us, that instruct the machines and in a few rare cases cosmic rays that might flip a bit. For this burden we gain control, it is our choice how the logical-flows of our programs are organized, how they react to their inputs. A small lab-rat may refuse to participate in an experiment or sabotage it, but today even our most powerful computer will do as instructed.

After this little side-trip let us return to the algorithm we wrote. The rhetorical question is: Was this algorithm made for a computer? The answer is not really. It is the set of instruction made for us so that we can teach a computer how to do it.

The first thing this little lists give us are **smaller problems**. “The man who moves a mountain begins by carrying away small stones.” – Confucius. To solve our **big problem** we need to break it up into smaller one. Then we look at the smaller one and repeat the process recursively until the solution is either obvious or someone already solved it. Most problems you encounter were already solved, unfortunately you do not know which ones, so choosing which problem to focus on is challenging. Considering I want to teach you something I will focus on the problem that is in my opinion the most educational at the moment.

## How computers work

The first problem I would like to discuss is increasing a counter, because it teaches us a lot about computers. So let us play bad news, good news and terrible news. Bad news: “ This will be rather theoretical and more difficult to understand.” Good news:”In almost all cases it does not matter”. Terrible news: ”When it matters it will cost you between a few hours and a paper.”. So what am I talking about?

### Data representation and memory

I talk about **data representation** or how computers “remember” things. You may have heard that computers are all “ones and zeros”. This refers to the way **memory** works in a computer. Remember the people that built them are always trying to find the simplest solution. So what is the most basic, simplest thing to remember? The answer they arrived at was truth. So the memory of a computer is based on an element called a **memory-cell** that is either filled or empty, “true” or “false”. Everything else is built from this cells. Your phone number, your program and the pictures on your phone are all saved in such cells. This is kind of a hassle, since it is difficult to figure out what which cell actually means. Therefore there are some **conventions** on **interpreting** them.

The probably simplest thing to memorize after a True/False or **boolean** value is a **number**. You all know decimal numbers like “19”, “15” or “9”. If you recall your earlier education you should remember that the value of a number depends on the number and position of characters. So “19” means calculate 1*10 + 9*1. This becomes relevant because there are simpler system. If you wish to reduce the amount of characters to remember you can simply read “15” as 1*9 + 5*1. You can continue this process until you are left with two characters “1” and “0”. In theory you could reduce further but then you no longer have an elegant way to write larger numbers. Since this system has only two characters is called binary. Now if you paid attention the cells have two states, “filled” and “empty” and we have two numbers “1” and “0”, we can therefore use them to write binary numbers like “1001”, which translates to 1 * (2*2*2) + 0 * ( 2*2) + 0 * (2) + 1 * 1. 

So a **computer** saves numbers as **binary numbers** made up of ones and zeros. There is a wrinkle however. Since a computer is a real physical object int can only save a limited number of ones and zeros, so we have to squeeze multiple numbers in the same memory. How do we do this? Simple answer we write them behind each other and remember how long they are. It is convention to define the **length of a memory-cell power of 8**. So we have numbers that are 8 cells, 16 cells, 32 cells or 64 cells long. There are also longer numbers but they are uncommon. There are advantages to choosing 8, like its proximity to 10 and the fact that 8 digits in binary correspond to two digits in hexadecimal numbers, but these are not relevant now.

Considering that this was a lot of rather dense information I would suggest you take a **break** and let the new information settle. Maybe you have same questions that appear later and you would like to have answered. Let us attend to them before we dive deeper into the inner workings of a computer.

Now that we learn how data like numbers are stored we should discuss how we operate on them. First we have to find our data-cells. To achieve this we need a easy way to give every cell a number or **enumerate them**. We call this number the **address** of the cell. Because giving every cell a number would give us a lot of numbers for all cells we combine them into addressable units. So if the addressable unit contains 64 memory-cells or **bits** it is a 64-bit architecture. If the unit contains only 32-bits then it is a 32-bit architecture. This is relevant if you need to use code from the last millennium, but for python it does not matter, opposed to the next step.

Once we have an address for every cell we can do a little magic. Since every cell can contain a number and an address is a number we can save addresses in cells. The cell containing the address is called a reference or a pointer, because it references or points at another cell.  If we store a bunch of numbers behind the referenced memory-cell we can find them all with just one address and a second cell that tells us how many cells there are. However running around with a bunch of numbers is rather cumbersome, so we **give** our single or array of **memory-cells** a **name** like ```cell_counter```. This is called a **variable**. A **variable** is something that stands for something else. In computer-science always something that is saved in a memory-cell.

### Computation

Now we can start doing something with our cell. The first thing to do is probably replacing whatever is in it with something we want, like the number ```0```. So we write ```cell_counter = 0``` and read it as “Take the **memory_cell** called ```cell_counter``` and assign it the value 0.” The later part is meant literally as [pointing and calling](https://en.wikipedia.org/wiki/Pointing_and_calling) helps to actively focus on the process and hopefully improving retention. So please do it in the beginning until your brain does it on its own.

As you saw ```=``` is not the mathematical “=” but the so called **assignment-operator**. An **operator** is something that takes one or multiple things and turns them into another. A better example than the **assignment-operator** is the addition operator ```+```, which works as you would expect it from mathematic. If we write ```cell_counter + 1``` we read it as “be whatever is in the **memory_cell** called ```cell_counter```in addition to ```1```". If we write ```1 + cell_counter``` we read it as “be ```1``` in addition to whatever is in the **memory_cell** called ```cell_counter```".

You may note that the **addition-operator** does not write anything to a **memory-cell**, it just stores the result temporarily, to  store it we need to use the **assignment-operator** again. So we write ```cell_counter = cell_counter + 1``` and read “Take the **memory_cell** called ```cell_counter``` and assign it the value you obtain by taking the **memory_cell** called ```cell_counter``` in addition to ```1```”. Operator get their power because they can simply be combined into more complex things. We will later encounter more operators like ```-```, ```*``` or ```==```. The last one ```==```is the comparison operator and in is true if the things on the left and right are equal and false otherwise. So ```1==2``` would be read “be ```true``` if ```1``` is equal to ```2```, be ```false``` otherwise. W

We will do some reading exercises now. Please partner up and read the following lines to each other. Then write down what you read out in the empty lines below. I advise you to put the finger below the symbol you are reading. It might seem foolish but it should make the task easier and also stress that we read from left to right.

```a = 5```

<details>
  <summary>Click to reveal expected answer</summary>

Take the **memory_cell** called ```a``` and assign it the value ```5```.

</details>

```a + b```

<details>
  <summary>Click to reveal expected answer</summary>

be whatever is in the **memory_cell** called ```a```in addition to whatever is in the **memory_cell** called ```b```

</details>

```c = (a + b) + c```

Here I smuggled in a new pair of symbols ```()```. During your work new concepts will appear all the time and then you have to research them. In this case you may recall from your math education, that whatever is in the brackets gets done first so you have to adapt the reading accordingly.

<details>
  <summary>Click to reveal expected answer</summary>
    
Take the **memory_cell** called ```c``` and assign it the value whatever you get by taking what is in the  **memory_cell** called ```a``` in addition to whatever is in the **memory_cell** called ```b``` before adding whatever is in the **memory_cell** called ```c```

</details>

Now after we have some practice let us return to our counter. Remember we wanted to increase it and wrote ```cell_counter = cell_counter + 1```. Now this is a little verbose so let us just write ```cell_counter += 1```, which means the same thing but is written differently. Saving us a little thought and time.

Now let us experience emergent behavior by combining what we already learned about memory cells. Let us assume we use 8-bits to represent our cell-counter and add ```1```to it.
Let us consider a few cases.

1. The ```cell_counter```value is ```0```.
2. The ```cell_counter```value is ```52```.
3. The ```cell_counter```value is ```255```.

The computer does simple addition so if he adds ```1``` to ```5``` you can emulate the process by writing them below each other and use [carry-arithmetic](https://en.wikipedia.org/wiki/Carry_(arithmetic)), as shown in the example below. Remember that binary only has 0 and 1. So if you add 1 to 1 your result is 0 and you carry 1.

| Bit-value  | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
| ---------- | --- | -- | -- | -- | - | - | - | - |
| ```1```    | 0   | 0  | 0  | 0  | 0 | 0 | 0 | 1 | 
| ```5```    | 0   | 0  | 0  | 0  | 0 | 1 | 0 | 1 | 
| Carry over | 0   | 0  | 0  | 0  | 0 | 0 | 0 | 1 | 
| Result     | 0   | 0  | 0  | 0  | 0 | 1 | 1 | 0 | 

As you can see the result is ```00000110```or in other words 128*0 + 64 * 0 + 32 * 0 + 8 * 0 + 4 * 1 + 2 * 1 + 1 * 0 = 4 + 2 = 6. 

Please repeat the above process for the cases 1 - 3 in the cell below.

<details>
  <summary>Click to reveal expected answer</summary>
    
1. 0 corresponds to ```00000000``` so the result of the addition should be ```00000001``` or 1. 
2.  52 corresponds to ```00110100``` so the result of the addition should be ```00110101``` or 53.
3. 255 corresponds to ```11111111````so the result of the addition should be ```100000000```or 256. This is an issue, since the cell is only 8-bits wide but we would need 9-bits to represent it.

</details>

Now what happens if we add up so many numbers, that we can no longer fit them into our memory-cells? Well the computer simply does as it is told and adds is up anyways before it saves, so it writes down the 8-bits if found. So 255(```11111111```) + 1(```00000001```)  suddenly becomes 0 (```00000000```). This is an error called an overflow. It is typical for something that is not obvious when you write a program, does not cause your program to crash and runs the data you process with it. To avoid it you have to ensure there is always enough space in your memory-cells. This is not a problem for normal python but in practice you will use a lot of code other people wrote, where it is relevant. “numpy” is a good example. It is one of the things you do not have to worry about until they appear in front of you and wreck your results. There will be more, there is no complete list, but rest assured you are not the first person to encounter them and there are solutions for whatever problem you may face.

### Instructions

Now we know how data is stored and manipulated, but we did not yet talk about how a **computer knows what to do**, or how our algorithm is existing in the computer. As you can imagine it is connected to the **operators** we discussed earlier.

If you look at the algorithm we wrote before you may realize that is written line by line. This is a very simple and straightforward concept, so we may as well adopt it for storing the algorithm. So we create a **list** or **array** of **instructions** and follow them from top to bottom.

This raises two questions **how** do we store instructions and **what** instructions do we store. Remember, the people that created this wanted few simple parts to build this language. So they decided to store the **instructions** as numbers in **memory-cells**, then they used one specific **mempry-cell** to remember where in the list they were. This one is called the *program counter* or *instruction pointer*.

Regarding what **instructions** to store they chose the smallest possible one. The resulting **instruction**-set is called [assembly](https://en.wikibooks.org/wiki/X86_Assembly/GNU_assembly_syntax).  We are not looking at it, because you will use it, but because it helps us to understand some concepts better, so instead of the real commands we will use a few **operators**.  The first concept we gonna explore is the **atomic** **instruction**. **Atomic** means no longer divisible, so an **atomic instruction** or **atomic operation** is one that can not be split into smaller **instructions**. You can begin a long philosophical debate about which **instructions** should be **atomic** and which ones should not be, but this is pointless, because an **atomic instruction** is whatever the computer-chip-manufacturer built as an **atomic instruction** or your programming language declares as **atomic**. 

To solve the next exercises we will define a few **atomic instructions** in the form of **operators**. The new operator is the *goto*-operator. While it still exists in older programming languages like C++ it is already extinct in newer one like python. Since I sincerely hope you will never have to use it outside of the following exercise, teaching you how computers deal with complex programs, we will simplify it as a “goto line” setting the program counter to a specific line. 

As you might have guessed the exercise includes going to different lines in a small code. If we do this always it is quite pointless, so we need a way to **control** how we move through the code. First something needs to be decided, so we introduce the ```<``` or  “lesser than”-operator. It is ```True```if whatever is on the left is smaller than the whatever is on the right. So ```4 < 5```is ```True``` and ```6<6```is ```False```. Next we are introducing a **keyword** that executes **instructions** if it is ```True```.  Then we combine them into an **operator** we call ```if_less```
that executes the next line if whatever is first on its right is smaller than the second on the right. 

To better illustrate look at the next example:

```
cell_counter = 0
if_less cell_counter, 2
  cell_counter += 1 
if_less cell_counter, 1
  cell_counter += 1 
```

So line by line:
1. “A memory cell called ```cell_counter``` is set to the value ```0```”
2. “Since the current value of ```cell_counter``` (```0```) is less than ```2``, execute  the next line”
3. “Add to the memory-cell called ```cellcounter```the value ```1```
4. . “Since the current value of ```cell_counter``` (```1```) is not less than ```1``, do not execute  the next line”
5. Is not executed.

This after the last step ```cell_counter``` has a value of ```1```.
Now let us take a look at the curated list you may need to solve the excersise.

Now let us summarize the **operators** this excersice is going to use:

| Operator      | Example                 | Description                                                                        |
| ------------- |  ---------------------- | ---------------------------------------------------------------------------------- |
| ```+=```      | ```cell_counter += 1``` | increase the variable on the left by the value on the right                        |
| ```-=```      | ```cell_counter -= 1``` | decreases the variable on the left by the value on the right                       |
| ```if_less``` | ```if_less 4 < 5```     | executes the line below if the first value on the right is smaller than the second | 
| ```goto```    | ```goto 4```            | sets the *instruction-pointer* to the value on the right and continues from there  |

Now I will provide an example that reduces ```cell_counter```until it is ```10```.

```
if_less 10, cell_counter
    cell_counter -= 1
if_less 10, cell_counter
    goto 0
```

So if ```10``` is smaller than ```cell_counter``` we reduce it by ```1```. If ```10``` is still smaller than ```cell_counter```, we start again from the beginning. So we **loop** around. 

Now it is your turn, form groups of two or three and write some "pseudo-assembly-code", that
sets the ```cell_counter```to ```0```and increases it in a **loop** until it reaches ```50```.

<details>
  <summary>Click to reveal expected answer</summary>
    
    There is more than one code that is correct as usual.
    Here is my suggestion:


```
cell_counter = 0
if_less cell_counter, 50
    cell_counter += 1
if_less cell_counter, 50
    goto 1
```
    
</details>

Good, after this you know more about assembly than most programmers. So why did I pull you through this side-quest?  First of all a  lot of people struggle with *multi-threading*, because they do not grasp the concept of **atomic instructions**, you now know what to google. Second you will encounter **functions** while you program and the concept of the *instruction-pointer* helps with understanding some of the more arcane ways to use them. Third you may read about **instruction**-pipelines being a bottleneck in modern central-processing-units.

Let us begin with the latter word-salad first. As you have learned all this instructions are just numbers in **memory-cells**, which tell a chip what to do. The challenge comes once you realize just how many numbers there are and how long it takes to move the content form a **memory-cell** on your hard-drive or in your *random-access-memory* to your chip. If you are interested feel free to read about [memory-latency](https://en.wikipedia.org/wiki/Memory_latency), but I advise you not to bother that much.  Short summary: “You can not have all the instructions on the chip and you do not want to fetch a new one every time, so you order a package.” This package then goes into an **instruction** -pipeline, but if there are a lot of ```if``` or ```goto```like **instructions** in it most of the stuff in the pipeline is useless and the central-processing-unit has to wait for new **instructions** thereby loosing time.

Now to the major point **functions**. To recall: “Our **instructions** are written in a list. We execute the **instruction** referred to by the *instruction-pointer*. We have an **instruction** to move the *instruction-pointer* to an arbitrary point in the list.” So what you may not see yet is that we can use ```goto```to move freely through the code and thereby reuse it. Just like with a “chose-your-own-adventure”-book we can separate our program into chapters, where every chapter is responsible for something we want to do and then just select the chapters. This is essentially the purpose of **functions**.

This does not really justify the cost of learning this until you realize you can write ```goto a```, setting the *instruction-pointer* to ```a```. This means your code can decide which **function** to run based on the value of a **variable** or spoken the other way around a **variable** can be a **function**. This may seem minor at the moment but it is a powerful concept used in more advanced programs.

So what should you remember from this little excursion:
- There are **atomic operations** from which everything else is built
- There are methods to control what is executed like ```if```
- There are **functions**, that contain code that can be reused
- **Variables** can contain **functions** as values

### Abstraction and interpreter

Now we have discussed in great detail how we increase the ```cell_counter```, so we can count the cells Bob recorded and Alice detected and it seemed very complicated. This is where we return to **abstraction**.  You may have already noticed that some **operators** could be expressed using different operators. For example ``` a += 1 ``` can be written as ```a = a + 1```.  This is what happens on a small scale, but there were a lot of PhD thesis in computer science dedicated to the task of making this simpler. Python is one of the results of this thesis.

For example we wanted to **loop* over a **variable** and increase it, while it was smaller than a value. For our “pseudo-assembly” this looked like :

```
cell_counter = 0
if_less cell_counter, 50
    cell_counter += 1
if_less cell_counter, 50
    goto 1
```

In Pyhton they combined the ```goto``` and the ```ìf``` into  a single statement ```while```.
So they write:

```python
cell_counter = 0
while cell_counter < 50:
	cell_counter += 1
```

To actually show something they use a **function** ```print```. So the final code should look like this:

```python
cell_counter = 0
while cell_counter < 50:
	cell_counter += 1
print(cell_counter)
```

Please enter this in the cell below and execute it by pressing the ```Shift```and ```Return``` keys at the same time.

In [None]:
# Your code goes here

Now some magic happened. Not only did it display the final value of ```50``` it also increased the number in the square brackets to the left of the cell, that tracks the execution order, but lets focus more on the magic. Jupyter took the text you wrote into this cell and sent it to the Python-**interpreter**, which **executed** instructing Jupyter to print “50”. But what does all this mean?

Well you already know what text is so lets start at Jupyter. Jupyter is the software that send what you are currently reading to your browser. It consists of some part that displays things and another part that **interprets** Python the Python-**interpreter**. Now what is an interpreter and what does it do? Let us use a different word for **interpreter**, translator. The Python-**interpreter** translates your text into **instructions** or numbers that can be sent to the central-processing-unit were they are executed. Its job is to turn text to **instructions**.

Part of the interpretation is  finding incorrect **syntax**. This means it finds instructions that can not be translated. If you write ```cellCounter```instead of ```cell_counter``` in the code above at one place you will get a **Syntax-error**, telling you either ```cell_counter``` or ```cellCounter``` are “not defined”. What the **interpreter** tells us here is that he does not know what you are talking about when you say ```cellCounter``. 

As a human we can easily deduce that ```cellCounter``` is a misspelling of ```cell_counter``` and so we may not even notice it. But for the **interpreter** these are totally different things not related to each other. This is why programming can be quite frustrating. The computer does not think like you do. It is just a dumb rock rolling down a logical path and if you mistype it will be the wrong path.

So far so good. The Python-**interpreter** interprets my text according to some rules so the chip can execute it, but how was it programmed? Not the question you were asking maybe but a very relevant one. So the **interpreter** translates and sends every instruction as it comes. This means it keeps reading the text, there is an alternative however. Instead of keeping the text you could translate all and store it as numbers or in *binary-form*, the programs that do this are called *compilers*. While you can interpret or compile any programming language some are built to be **interpreted** and some are built to be **compiled**, both has advantages and disadvantages.

Typically **interpreted** programming languages like Python are used for high-level tasks because they are easier to adapt, while **compiled** programming languages are used for lower-level tasks like processing-images on the byte level. Most often they are combined however. “Numpy” for example is used in Python for fast calculations but is written in a **compiled** programming language.  The reason for this difference is the time both **interpreter** and **compiler** get to process your text. The **interpreter** has to send **instructions** now so the central-processing-unit does not idle and should not waste the central-processing-units time by processing your text. For the **compiler** it is the other way round, it has all the time in the world to process your text and find a faster way to run it later. This means **compilers** often have special options to create faster programs. **Interpreters** do something similar, were they attempt to optimize during runtime, but a program will usually be slower if it is executed in a **interpreter** than in a **compiler**, assuming it was reasonable optimized to run on both. 

So what you want is to use an **interpreted** programming language like Python to evaluate your experiments and make your plots and have specialists develop write optimized code to handle general details. To achieve this you have to use their work, which in Python is delivered to you in modules, but more about this in a few days.

### Data and file-system

So now you have an idea how computers process data and how you can coerce them to solve your problems.  What we touched on was how they store data. The machines we use today are mostly [Von Neumann architectures](https://en.wikipedia.org/wiki/Von_Neumann_architecture). This does not only mean that things are strictly separated, but that all data is stored in a “memory-unit”. This data includes the data we work on, it this case the comma-separated-value files but also the programs we run, in this case the jupyter-notebooks (ipynb-files). 

Since you are biologists you already know reality is never as simple as the model. Here your attention to details may help you remember somethings that programmers often forget. Unfortunately it will not matter that much for most of your problems. As mentioned in the instruction-section the central-processing-unit that does all our calculation has a rather small memory. Imagine it like a clipboard were you only carry what you really need, while the majority of the data lies in a library. Do not think central university library here, but rather a small room filled with folders and books. This is the random access-memory or **RAM** if you need something you stand up go there and put it on your clipboard. 

If you were ever forced to work with paper you know that one never takes only one paper, usually a folder or book is taken and placed on the desk and then put on the clip-board. For computer this area of information storage would be the **cache**. **Caches** are adjacent to the processors of the central- or graphical-processing-unit and keep the data that was used last. Due some technical reasons they can only copy a **page** at a time. A memory-**page** is simply a part of the **RAM**.  Imagine it like having a huge Newspaper size printout of a book. Now only one print-out fits on the desk at any-time, if you need a new one you have to get it. 

Why do I mention **caches** at all? If you ever stumble about terms like *cache-coherence*, *cache-collision* or * cache-miss* you know have a rough idea what is happening. Also if you ever have to work with a lot of data you want to process fast, you may remember that you should try to keep all the data used in your calculation close to each other, to fasten things up. Do not overthink this however, because there are usually easier ways to make code run faster than to reorder your data. 

Back to out library image, so if the **RAM** is a small library what is the big library you have to take the bus to get to? It is usually the hard- or some external drive. It is very difficult to get data from there but it is large. Now some interesting quirk. Except the drives everything else gets wiped once the computer shuts down, just imagine flushing the entire building down the toilet and only leaving the university-library standing. In other words the drive is **permanent** storage and the rest is **non-permanent** storage.

Now since we just learned that only the drive survives the computer shutdown, it becomes clear why people do not bother to much with organizing anything else. On the drives however we want to organize things, therefore we need a system a **file-system**.

The duty of a **file-system** is to know where on a drive data in the form of **files** is stored and who is permitted to see or change them. This leads us to two essential concepts **folders** and **users**. A **folder** is something you put stuff in like a container. You can put everything into a **folder** especially other **folders**.  This idea is quite universal and once you stop thinking of **folder**-icons on a desktop you will discover, that this concept reappears everywhere in computer science. Stuff that contains itself is a fundamental organizational structure which you will use later in this course in the form of ```list``` or ```dict``.

Now we discussed **folders** they make it easier to find stuff by limiting the what you or your program can see inside of them, to ensure you do not see what you are not supposed to see, like password-hashes or personal information we use **users**. A **user** is essentially like a key for a lock. Once you have entered you password and authenticated yourself you are handed the keys of that **user**. So you can log in as “YourName”, but also as “GenericPrinterUser”. The first one can access your personal files the second one can use the printer. 

To ensure **users** have only access to the things they are supposed to use we use permissions and ownership. So the research-data may be owned by “YourPostDoc”, but you have permission to **read** them. This becomes relevant if you attempt to **write** into them, because then the operating-system will ask the **file-system** if you have **permission** and then politely tell you that you can not do that, so you stand in front of a locked door. Also known as “Help, my program does not write any output into the folder, but it does in another.” Please remember this it will save you a day and potentially a rather embarrassing moment in your future.

Good so now we know there are **users** and **permissions**, but that all sounds rather difficult and people do not want to use a password every-time they want to use the printer, instead we use user-groups. User-groups are simply lists with users in them and all the users get the permission the group has. This is cumulative so if you in the “writeData”-group and in the “readData”-group you can both read and write. So know you know what IT is on about, when they talk about **permissions** and **users**.

This concludes today’s lesson. Tomorrow we will begin a more hands-on lesson about Python itself. Some of you may have hoped that things become clearer, while the course advances, I have to warn you things build upon each other. If you have doubts about core-concepts and you should have after you were just forced through half a semester computer-science in a few hours please try to resolve them now. 

**The rest of the time is dedicated to questions and I advise you to ask as many as possible now, so you do struggle less tomorrow.**