# Control Structures

## Table of contents

1. [Recall](#Recall)
2. [If-statement](#If-statement)
3. [While loop](#While-loop)
4. [For loop](#For-loop)
5. [Break and continue](#Break-and-continue)
6. [Range and len](#Range-and-len)
7. [Updating algorithm](#Updating-algorithm)
8. [Need for order](#Need-for-order)
9. [Functions](#Functions)
10. [Writing a program](#Writing-a-program)
    1. [Design](#Design)
    2. [Skeleton](#Skeleton)
    3. [Test](#Test)
    4. [Flesh out the skeleton](#Flesh-out-the-skeleton)

## Recall

Last unit we learned about **operators** and data-**types**.
**Operators** like ```+```, ```/``` or ```<```, allow us to work with **variables** containing different **values**.
With the help of **lists** we van group **ints**, **floats**, **strs** and other data-**types** together.

The combination of these allowed us to formulate a first partial version of our final algorithm.

```Python
units_file = open("./data_neuron/session_2023111501010_units.csv")
# Skip the first row
# Figure out when the last spike occurs in seconds
seconds_column = list()
seconds_counter = 1
# For every second between 1 and the last spikes oocurance:
    seconds_column.append(seconds_counter)
    seconds_counter += 1
table = [seconds]
unit_columns = dict()
# For every unit:
    unit_column = seconds_column.copy()
    # For every second in unit_column
        unit_column[second] = 0
    unit_columns[unit_if] = unit_column
# For every row in the unit file:
    rat_id, unit_id, channel, spike_time = row.split(",")
    unit_columns[unit_id][int(spike_time)] += 1
# For every unit:
    table.append(unit_column)
immobility_file = open("./data_neuron/session_2023111501010_immobility.csv")
immobility_phases = list()
# Skip the first row in immobility file
# For every row in the immobility file after the first:
    begin_seconds, end_seconds = rown.split(",")```
    immobility_phases.append((float(begin_seconds), float(end_seconds)))
immobility_column = seconds_column.copy()
# For every second in the immobility-list:
    is_in_phase = False
    # For every phase in immobility-list
        begin_in_seconds = phase[0]
        end_in_seconds = phase[1]
        is_in_phase = second > begin_in_seconds and second < end_in_seconds
        # If yes stop
table.append(immobility_column)
```

To fill out the missing gaps and find the oxytocin unit, we should recall the fist module.
In the first module we used ```if_less``` and ```goto``` to jump around in our pseudo-assembly-code.
We now want to learn how this is done in Python.

## If-statement
The if-statement is rather simple it consists of the keyword ```if``` followed by something that is or can be converted into a **bool**, ```:``` and then an indented block of further instructions.
These instructions are executed if the condition between the keyword and ```:``` is ```True```.
Here is a code snippet illustrating the use of the if-statement. 
```Python
if True:
	print("Hello")
if False:
	print("world!")
```

Please predict what this snippet will print and then try it in the next block.

In [None]:
# Copy the code here
if True:
    print("Hello")
if False:
    print("world!")

The if-statement can be expanded with two optional statements.
```else``` and ```elif``` short for “else-if”.
The elif-code is executed if the if-statement is not executed and its condition is ```True```.
The ```else```-statement is executed if no if- or elif-statement was executed.
Here is a short snippet to demonstrate the use of an if-statement with ```elif``` and ```else```:
```Python
a = 5
b = 6
if a < b:
	print("a is smaller than b")
elif a == b:
	print("a is equal b")
else:
	print("a is bigger than b")
```

Copy the code into the next boy and change ```a``` and ```b``` until all three messages were printed.

In [None]:
# Copy the code here and then modify it

a = 5
b = 6
if a < b:
    print("a is smaller than b")
elif a == b:
    print("a is equal b")
else:
    print("a is bigger than b")

## While loop
The next statement is the while-loop.
It consists of the keyword ```while``` followed by something that can be converted into a **bool**, ```:``` and then an indented block of instructions.
These instructions are executed while the condition between the keyword and ```:``` remains ```True```.
Here is a short example of a while loop:
```Python
counter = 0
while counter < 20:
	print(counter)
	counter += 1
```

Please predict what this code will print before you copy and execute it.

In [None]:
# Copy the code here
counter = 0
while counter < 20:
    print(counter)
    counter += 1

## For loop
The last control-structure is the for-loop.
It works similar to the while loop, but it iterates over a sequence like a **list** or **tuple**.
It consists of the keyword ```for``` followed by the variable name the current element will have followed by the keyword ```in``` followed by a sequence (e.g. **list**), ```:``` and an indented block of instructions.
Here is an example of a for-loop:
```Python
elements = ["Hello", "", "world", "", "!", 42, 3.0, True]
for element in elements:
	print(element)
```
Please predict the output of this code snippet before executing it in the next block.

In [None]:
# Copy the code here
elements = ["Hello", "", "world", "", "!", 42, 3.0, True]
for element in elements:
    print(element)

## Break and continue
Within loops two special-statements can be used the ```break```-statement breaking out of the loop and the ```continue``` statement jumping to the beginning of the next loop cycle.
Since they change the flow of the loop they are almost always encountered within an if-statement.
Here is an example of a for-loop with ```break``` and ```continue```.
```Python
elements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for element in elements:
	if element in [2, 3, 5, 7]:
		continue
	if element > 8:
		print("Breaking")
        break
	print(element)
```

Please predict what this code prints before you execute it and compare your prediction to the actual results.

In [None]:
# Copy the code here


elements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for element in elements:
    if element in [2, 3, 5, 7]:
        continue
    if element > 8:
        print("Breaking")
        break
    print(element)

## Range and len
We often want to iterate over a range of numbers.
We can use a for-loop and ```range``` for this.
```range``` is a **function** taking 3 **arguments** ```start```, ```end``` and ```step```.
So if we write ```small_numbers = range(0, 10, 1)``` we get all number beginning from ```0``` to ```9```.
So ```end``` is not included in the range. 
This may seem weird, but can be attributed to the way **lists** work.
Since **lists** start at ```0``` a list with 10 elements is indexed by ```0```, ```1```, ```2```, ```3```, ```4```, ```5```, ```6```, ```7```, ```8``` and ```9``` or ```range(0, 10, 1)```. 
This becomes more convenient if we use ```len```. ```len``` is a **function** that return the length of its **argument**. 
Now let us see both of them in action:
```Python
names = ["John Doe", "Erika Musterfrau", "Max Mustermann", "Karl Dosenkohl", "Hein Janmaat", "Juan Pérez", "Kalle Svensson", "Fred Bloggs"]
for index in range(0, len(names), 1):
	print(names[index])
```

Please copy and execute the code, then adapt it so that only every second name is printed.

In [None]:
# Copy the code here

<details>
  <summary>Click to reveal solution</summary>

```Python
names = ["John Doe", "Erika Musterfrau", "Max Mustermann", "Karl Dosenkohl", "Hein Janmaat", "Juan Pérez", "Kalle Svensson", "Fred Bloggs"]
for index in range(0, len(names), 2):
    print(names[index])
```

</details>

## Updating algorithm

With this knowledge we can now finish our script:

```Python
units_file = open("./data_neuron/session_2023111501010_units.csv")
firts_row_unit_file = True
last_spike_occurance = 0
units = set()
for row in units_file:
    if firts_row_unit_file:
        firts_row_unit_file = False
        continue
    rat_id, unit_id, channel, spike_time = row.split(",")
    unit_id = int(unit_id)
    if unit_id not in units:
        units.add(unit_id)
    spike_time = float(spike_time)
    spike_time_rounded_up = int(spike_time) + 1
    if last_spike_occurance < spike_time_rounded_up:
        last_spike_occurance = spike_time_rounded_up
units_file.close()

seconds_column = list()
seconds_counter = 1
for second in range(1, last_spike_occurance):
    seconds_column.append(seconds_counter)
    seconds_counter += 1
table = [seconds]

unit_columns = dict()
for unit_id in units:
    unit_column = seconds_column.copy()
    for table_index in range(0, len(seconds_column)):
        unit_column[table_index] = 0
    unit_columns[unit_id] = unit_column

units_file = open("./data_neuron/session_2023111501010_units.csv")
firts_row_unit_file = True
for row in units_file:
    if firts_row_unit_file:
        firts_row_unit_file = False
        continue
    rat_id, unit_id, channel, spike_time = row.split(",")
    spike_time = float(spike_time)
    unit_id = int(unit_id)
    unit_columns[unit_id][int(spike_time)] += 1
for unit in unit_columns.keys():
    table.append(unit_columns[unit])
    
immobility_file = open("./data_neuron/session_2023111501010_immobility.csv")
immobility_phases = list()
firts_row_immobility_file = True
for row in immobility_file:
    if firts_row_immobility_file:
        firts_row_immobility_file = False
        continue
    begin_seconds, end_seconds = row.split(",")
    immobility_phases.append((float(begin_seconds), float(end_seconds)))
immobility_column = seconds_column.copy()
for table_index in range(0, len(seconds_column)):
    second = seconds_column[table_index]
    is_in_phase = False
    for phase in immobility_phases:
        begin_in_seconds = phase[0]
        end_in_seconds = phase[1]
        is_in_phase = second > begin_in_seconds and second < end_in_seconds
        if is_in_phase:
            break
    immobility_column[table_index] = is_in_phase
table.append(immobility_column)
```

## Need for order

As you can see the code while complete is now rather long and complex.
This makes it more difficult to read.
Considering that, code is more often read than written, improving readability will be our next topic.

## Functions

After we used ```range``` and ```len``` previously, let us talk more about what they are: **functions**.
From a language perspective **functions** are similar to **operators** they take a number of **values** and often become/**return** a **value**.
So instead of ```sum = a + b``` we might write ```sum = add(a, b)```. 
From a code-structure perspective they are organizational units or **abstractions** that combine multiple lines of code into a single thing.
They are therefore constructed from other **functions** and **operators**.
Let us write an ```add``` **function** so we can investigate its parts.
```Python
def add(a, b):
	sum = a + b
	return sum
```
As you can see a **function** looks quite similar to the other control-structures.
It starts with the keyword ```def``` (like define) followed by the name of the **function**,
```(``` the **arguments** of the **function**, ```)```, ```:``` and an indented block of instructions.

The name of the function is used to call it later so our example **function** is called ```add``` and can be called like ```c = add(2, 4)```.
The **arguments** are what into a **function**, like food goes into your mouth or raw material into a factory.
Often they are processed into a final product that is **returned**, but some functions do only modify their **mutable** inputs, like adding **values** to a **list**.
Let us practice this by creating a new **function**.
It will be the [fizz-buzz](https://de.wikipedia.org/wiki/Fizz_buzz)-**function**.
It is supposed to print either the number or “Fizz” if the number we put in is divisible by 3 and “Buzz” if it is divisible by 5.
If both is the case it should print “Fizz Buzz”.
This is a common test in programming interviews and a nice example.
To test is something is divisible we use ```%``` which gives us the rest of a division.
So let us begin with defining our **function**. Its name should obviously be “fizzBuzz” and its argument a number.
So we write:
```Python
def fizzBuzz(number):
```
Now we have to do something in it.
Let us first get the rest of the division by ```3``` and print it out to test our function.
We are interested in the rest of the division, because it is zero if the number is divisible by ```3```.

```Python
def fizzBuzz(number):
	rest_division_by_three = number % 3
	print(rest_division_by_three)
# We also have to call the function so it gets executed
fizzBuzz(5)
```
Now copy the code and predict what it prints. Afterwards get the rest for a division by 5.

In [None]:
# Your code goes here

<details>
  <summary>Click to reveal solution</summary>

```Python
def fizzBuzz(number):
    rest_division_by_three = number % 3
    rest_division_by_five = number % 5
# We also have to call the function so it gets executed
fizzBuzz(5)
```

</details>

Now that we have the rest we should try to print “Fizz Buzz” if the number is divisible by ```3``` and ```5```, “Fizz” if it is divisible by ```3```, “Buzz” if it is divisible by ```5```, else we just print the number.

Please recall the comparison-**operators** from the last unit and what we learned so far to adapt the function, so it prints what was described above:

In [None]:
# Your code goes here

<details>
  <summary>Click to reveal solution</summary>

```Python
def fizzBuzz(number):
    rest_division_by_three = number % 3
    rest_division_by_five = number % 5
    divisible_by_three = rest_division_by_three == 0
    divisible_by_five = rest_division_by_five == 0

    if divisible_by_three and divisible_by_five:
        print("Fizz Buzz")
    elif divisible_by_three:
        print("Fizz")
    elif divisible_by_five:
        print("Buzz")
    else:
        print(number)
# We also have to call the function so it gets executed
fizzBuzz(5)
```

</details>

Now that we have a working function we should test it by running on a larger set of numbers.
Please build a loop around your **function** so it runs on all integers from 0 until 100.

In [None]:
# Your code goes here

<details>
  <summary>Click to reveal solution</summary>

```Python
def fizzBuzz(number):
    rest_division_by_three = number % 3
    rest_division_by_five = number % 5
    divisible_by_three = rest_division_by_three == 0
    divisible_by_five = rest_division_by_five == 0

    if divisible_by_three and divisible_by_five:
        print("Fizz Buzz")
    elif divisible_by_three:
        print("Fizz")
    elif divisible_by_five:
        print("Buzz")
    else:
        print(number)

# Create a for loop from 0 until 100
for number in range(0, 101, 1):
    fizzBuzz(number)
```

</details>

The last step before we can fully utilize **functions** is the keyword ```return```.
Similar to ```break``` in loops ```return``` signals that the function should be left, with a little twist.
The **function** becomes or returns the value behind the ```return```.
If there is nothing it returns ```None```.
To better illustrate this we will create a **function** that checks if a number is even and returns the result as a **bool**.

```Python
def is_even(number):
	divisible_by_two = number % 2 == 0
	return divisible_by_two
```

Now we will use a for-loop to print our all even numbers between 0 and 20:

```Python
for number in range(0, 21, 1):
	if is_even(number):
		print(number)
```

Please copy both samples in the next cell and execute them.

In [None]:
# Combine the code here

<details>
  <summary>Click to reveal solution</summary>

```Python
def fizzBuzz(number):
    rest_division_by_three = number % 3
    rest_division_by_five = number % 5
    divisible_by_three = rest_division_by_three == 0
    divisible_by_five = rest_division_by_five == 0

    if divisible_by_three and divisible_by_five:
        print("Fizz Buzz")
    elif divisible_by_three:
        print("Fizz")
    elif divisible_by_five:
        print("Buzz")
    else:
        print(number)

# Create a for loop from 0 until 100
for number in range(0, 101, 1):
    fizzBuzz(number)
```

</details>

Congratulations, you know all the necessary building blocks to write simple programs.
So we will now redesign the existing code into a more readable format.

## Writing a program

Writing a program consists of a few steps:
1. **Design**: Here we figure out what our program needs and how it should run. We did this from the intorduction one until now.
2. **Skeleton/smallest parts**: You should always begin with a very small part that is easy to understand and write.
    This was our check-list
4. **Test regularly**: As soon as you can run your code you should do so. This helps you to find mistakes while it is still small. Later you may wish to write [automated tests](https://en.wikipedia.org/wiki/Test_automation) and expand into [test-driven-development](https://en.wikipedia.org/wiki/Test-driven_development).
5. **Get feedback**: After you have written something ask another competent person to look at your solution, they might find mistakes you did not see.
6. **Incremental improvement**: Do not try to solve your problem as a whole. Work **function** by **function**, **line** by **line**, otherwise you will be overwhelmed  and confused by your own work.

What we built so far is a rough first draft.
During working with this draft we learend a if what we wanted to achieve could be done an how it could be done.
Now we should rewrite it into a more permanent state, so we can improve on it later.

Unfortunately, a lot of code never gets over its first trial phase, because people believe they have no time to improve it.
This means their code cannot be understood, change or improved making it useless for everyone else and themselves after a few weeks.
Creating a well-structured code or rewriting unstructured code is essential to maintain a productive code base.

If you believe this be achieved “small-changes” you might be right, after all it only takes a “small-changes” to turn a lobster into an elephant or a toaster into an airplane.
I would however ask you to consider that it is easier to build an airplane from scratch than turn a toaster into one.
For this reason will now repeat the design process described above.

### Design

We have to remember the rough structure that we used before:
Please try to get summarize what we did in roughly 3-5 steps.

<details>
  <summary>Click to reveal suggested solution</summary>
    
    1. Get maximal duration and units
    2. Create a a list with all the seconds
    3. Create empty list for every unit
    3. Get amount of spikes for every second
    4. Get immobility for every second

</details>

### Skeleton

Now we want to create a small numer of simple commands that correspond to our desing.

```Python
units_file_path = "./data_neuron/session_2023111501010_units.csv"
immobility_file_path = "./data_neuron/session_2023111501010_immobility.csv"

def get_maximal_duration_and_units(units_file_path):
    longest_duration = 0
    units = set()
    # Add proper implementation
    return (longest_duration, units)

def get_a_list_with_all_the_seconds(longest_duration):
    seconds = list()
    # Add proper implementation
    return seconds

def get_empty_unit_spike_counts(seconds, units):
    empty_spike_counts = dict()
    # Add proper implementation
    return units

def get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts):
    # Add proper implementation
    return

def get_immobility_for_every_second(immobility_file_path, seconds):
    is_immobile = list()
    # Add proper implementation
    return is_immobile

# Main part of the script
table = list()
longest_duration, units = get_maximal_duration_and_units(units_file_path)
seconds = get_a_list_with_all_the_seconds(longest_duration)
table.append(seconds)
units_spike_counts = get_empty_unit_spike_counts(seconds, units)
get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts)
for unitid in units_spike_counts.keys():
    table.append(units_spike_counts[unit_id])
table.append(get_immobility_for_every_second(immobility_file_path, seconds))

print(table)
```

This is what is often called a skeleton implementation.
The rough structures are here but the details are not fleshed out.

I hope you the code looks less overwhelming and more logically structred now.
This is one purpose of skeleton si giving us a piece of code we can understand more easily.
Another purpose is giving us a minimal code snippet we can run and therefore test.

### Test

The reason we wish to have code that can run as early as possible ist that it permits us to fail as early as possible.
This failure migh help us uncover an error in our solution or a misconception about our problem.
These flaws will not disappear if we discover them later, we will just have lost more work pursuing the wrong path,
therefore we wish to run our code as early as possible to see if it behaves as expected.

Please run the code above to see if there are any errors in it.

In [None]:
# Copy code to test here

<details>
  <summary>Click to reveal the explanation</summary>

Considering that misconception or error would have required a design change,
I decided to hide a typo in the code above.
I hope you found it an managed to correct it.

</details>

### Flesh out the skeleton

After testing it, we can now begin to flesh out our **functions**.
First we write down our plan from the comments,
before we implement it line by line, incrementally approaching the final function.
This mirrors our approach to the full program,
so we apply our solution method recursively,
until the final problem becomes trivial.

Let us begin with ```get_maximal_duration_and_units```


```Python
def get_maximal_duration_and_units(units_file_path):
    longest_duration = 0
    units = set()
    # 1. Open the file
    # 2. Iterate over every row:
    #    1. Skip first row
    #    2. Extract unit id and spike time
    #    3. Convert their types
    #    4. If the spike time is bigger than longest duration update it
    #    5. If the unit Id is not in units add it
    # 3. close the file once we no longer need it
    return (longest_duration, units)
```

Since we already did our research regarding what we need to use erarlier,
we can now replace the comments with code.

```Python
def get_maximal_duration_and_units(units_file_path):
    longest_duration = 0
    units = set()
    # 1. Open the file
    units_file = open(units_file_path)
    # 2. Iterate over every row:
    first_row = True
    for row in units_file:
        # 1. Skip first row
        if first_row:
            first_row = False
            continue
        # 2. Extract unit id and spike time
        rat_id, unit_id, channel, spike_time = row.split(",")
        # 3. Convert their types
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_up = int(spike_time) + 1
        # 4. If the spike time is bigger than longest duration update it
        if longest_duration < spike_time_rounded_up:
            longest_duration = spike_time_rounded_up
        # 5. If the unit Id is not in units add it
        if unit_id not in units:
            units.add(unit_id)
    # 3. close the file once we no longer need it
    units_file.close()
    return (longest_duration, units)
```

Normally we remove useless comments to avoid information overload on the reader.
This means we only leave information that is not written directly in the code.
In this case it means we will remove all comments leaving uns with:

```Python
def get_maximal_duration_and_units(units_file_path):
    longest_duration = 0
    units = set()

    units_file = open(units_file_path)
    first_row = True
    for row in units_file:
        if first_row:
            first_row = False
            continue
        rat_id, unit_id, channel, spike_time = row.split(",")
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_up = int(spike_time) + 1
        if longest_duration < spike_time_rounded_up:
            longest_duration = spike_time_rounded_up
        if unit_id not in units:
            units.add(unit_id)
    units_file.close()
    return (longest_duration, units)
```

Please use the cell below to repeat this process for the other three functions thereby completing the program.
Remeber the steps above and work line by line. 
Use ```print``` to check your results.

In [None]:
# Write your code here

<details>
  <summary>Click to reveal suggested solution</summary>

```Python
units_file_path = "./data_neuron/session_2023111501010_units.csv"
immobility_file_path = "./data_neuron/session_2023111501010_immobility.csv"

def get_maximal_duration_and_units(units_file_path):
    longest_duration = 0
    units = set()

    units_file = open(units_file_path)
    first_row = True
    for row in units_file:
        if first_row:
            first_row = False
            continue
        rat_id, unit_id, channel, spike_time = row.split(",")
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_up = int(spike_time) + 1
        if longest_duration < spike_time_rounded_up:
            longest_duration = spike_time_rounded_up
        if unit_id not in units:
            units.add(unit_id)
    units_file.close()
    return (longest_duration, units)

def get_a_list_with_all_the_seconds(longest_duration):
    seconds = list()
    for second in range(1, longest_duration + 1):
        seconds.append(second)
    return seconds

def get_empty_unit_spike_counts(seconds, units):
    empty_spike_counts = dict()
    for unit_id in units:
        empty_spike_count = seconds.copy()
        for index in range(0, len(empty_spike_count)):
            empty_spike_count[index] = 0
        empty_spike_counts[unit_id] = empty_spike_count
    return empty_spike_counts

def get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts):
    units_file = open(units_file_path)
    first_row = True
    for row in units_file:
        if first_row:
            first_row = False
            continue
        rat_id, unit_id, channel, spike_time = row.split(",")
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_down = int(spike_time)
        index = spike_time_rounded_down - 1
        units_spike_counts[unit_id][index] += 1
    units_file.close()
    return

def get_immobility_for_every_second(immobility_file_path, seconds):
    is_immobile = seconds.copy()
    phases = list()
    immobility_file = open(immobility_file_path)
    first_row = True
    for row in immobility_file:
        if first_row:
            first_row = False
            continue
        begin_in_seconds, end_in_seconds = row.split(",")
        begin_in_seconds = int(begin_in_seconds)
        end_in_seconds = int(end_in_seconds)
        phase = (begin_in_seconds, end_in_seconds)
        phases.append(phase)
    immobility_file.close()
    for index in range(0, len(is_immobile)):
        second = seconds[index]
        is_in_phase = False
        for phase in phases:
            begin_in_seconds, end_in_seconds = phase
            if second > begin_in_seconds and second < end_in_seconds:
                is_in_phase = True
                break
        is_immobile[index] = is_in_phase
    return is_immobile

# Main part of the script
table = list()
longest_duration, units = get_maximal_duration_and_units(units_file_path)
seconds = get_a_list_with_all_the_seconds(longest_duration)
table.append(seconds)
units_spike_counts = get_empty_unit_spike_counts(seconds, units)
get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts)
for unit_id in units_spike_counts.keys():
    table.append(units_spike_counts[unit_id])
table.append(get_immobility_for_every_second(immobility_file_path, seconds))

print(table)
```

</details>

## Readable code

Introducing functions has made our code already more reader friendly.
However you may have noticed that there are still same questions left unanswered
and some methods that make the solution more friendly to the reader we did not discusse yet.
This is the reason why we will now talk about **Readability**.

**Readability** refers to the ability of a reader to understand the text/code.
As a general rule if the reader takes more than half the time to read your text than you needed to write it you goofed.
If they take the same time to read it as you to write you should seriously consider professionalizing your writing style.

I mention this because academia has a relevant fraction of fully self-taught-"programmers", believing hard-work is required to understand code.
They usually conclude that whoever fails to understand their incoherent excuse for code "just cannot program".
Industrial and trained wisdom usually attributes a failure to understand code to incomplete documentation or lack of "domain-knowledge",
instead of programming skill. "Domain-knowledge" refers to knowledge about the subject, like cancer-cells, astrophysics or neuroscience.
So in other words if someone fails to understand your code the following explanations are seen as legitimate:

1.	They do not understand the area the code is applied in (e.g. they do not know what cancer-cells are)
2.	They do not know basic language structures (e.g. for-loop)
3.	The code is not written well enough (It is usually this)

Please remember this if people ask you how your code works.
It usually means you failed to explain your goal and methods well enough. 
So how do you make your code more readable:

- Use clear **variable** and **function** names. So your cell-area should be called ```cell_area``` and not ```car```.
- Try to write simple lines. One line should ideally do one to three things if it does more split it.
- Use intermediate values. So if you add a few numbers before another step introduce a **variable** storing the sum.
- Keep your functions short, they should fit on a small laptop screen.
- Use comments or [docstrings](https://peps.python.org/pep-0257/) for functions to explain, what goes in,
      what goes out and what the function should achieve.
- Use comments in the code to explain why things are done a certain way.
- Use comments in the code to reference to papers or online sources you read to understand the code.
- Use comments in the code to introduce concepts that might be novel to another programmer.
- Use a [sytleguide](https://google.github.io/styleguide/pyguide.html) once once you start a bigger project.
- Avoid writing [unmaintanable code](https://github.com/Droogans/unmaintainable-code).

If your code starts reading like a paper with references to different web-resources and additional explanations you are doing things mostly right.
JupyterNotebook is designed to support this approach, known as [literate programming](https://en.wikipedia.org/wiki/Literate_programming).

Please take your solution and add the comments to it.
Then let someone else read the code, so they can tell you what is not fully clear to them.
If they claim to fully understand your code they are most probably to shy to critizize, so keep asking until they find a flaw.

In [None]:
# Your code with comments should be here

<details>
  <summary>Click to reveal suggested solution</summary>

```Python
def get_maximal_duration_and_units(units_file_path):
    """!
        @brief This function reads in a csv-file containing the recorded units and returnt the maximal duration and unit ids
        @details We assume that we get an csv-file with a header line and 4 fields per row.
            The name and contents of the csv-file should be defined as given in param, so we can extract the unit id and spike time.
        @param units_file_path the path to the csv-file as a str
            The 2nd field should contain the id of the unit.
            The 4th the spike time.
        @return the highest spike time and a set of discovered unit-ids
    """
    longest_duration = 0
    units = set()

    units_file = open(units_file_path)
    firts_row = True
    for row in units_file:
        if firts_row:
            firts_row = False
            continue
        rat_id, unit_id, channel, spike_time = row.split(",")
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_up = int(spike_time) + 1
        if longest_duration < spike_time_rounded_up:
            longest_duration = spike_time_rounded_up
        if unit_id not in units:
            units.add(unit_id)
    units_file.close()
    return (longest_duration, units)

def get_a_list_with_all_the_seconds(longest_duration):
    """!
        @brief Gets a list containing all the seconds between 1 and the longest durtion
        @param longest_duration the last second in the list as an int
        @return a list with all the seconds
    """
    seconds = list()
    for second in range(1, longest_duration + 1):
        seconds.append(second)
    return seconds

def get_empty_unit_spike_counts(seconds, units):
    """!
        @brief Creates a dicionary containing a list filled with 0s for every unit-id
        @details the lists in the dict are intended as counters for the spikes
        @return a dict containing a list filled with 0s for every unit-id
    """
    empty_spike_counts = dict()
    for unit_id in units:
        empty_spike_count = seconds.copy()
        for index in range(0, len(empty_spike_count)):
            empty_spike_count[index] = 0
        empty_spike_counts[unit_id] = empty_spike_count
    return empty_spike_counts

def get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts):
    """!
        @brief This function counts the spike in every seconds
        @details it reads in the units row by row and fills up the corresponding entries in the dict
            the dict should contain an entry for every unit-id containing a list with an int for every second
        @param units_file_path a path to the unit-csv-file
    """
    units_file = open(units_file_path)
    firts_row = True
    for row in units_file:
        if firts_row:
            firts_row = False
            continue
        rat_id, unit_id, channel, spike_time = row.split(",")
        unit_id = int(unit_id)
        spike_time = float(spike_time)
        spike_time_rounded_down = int(spike_time)
        index = spike_time_rounded_down - 1
        units_spike_counts[unit_id][index] += 1
    units_file.close()
    return

def get_immobility_for_every_second(immobility_file_path, seconds):
    """!
        @brief Gets a list expressing immobility for every second
        @details This function takes the phases given in the immobility file
            and creates a list of bools marked accordingly.
            The list is crated by coping seconds
        @param immobility_file_path the path to the immobility-csv
        @param seconds a list enumerating the seconds
        @return a list of bools marking immobility of every second
    """
    is_immobile = seconds.copy()
    phases = list()
    immobility_file = open(immobility_file_path)
    firts_row = True
    for row in immobility_file:
        if firts_row:
            firts_row = False
            continue
        begin_in_seconds, end_in_seconds = row.split(",")
        begin_in_seconds = int(begin_in_seconds)
        end_in_seconds = int(end_in_seconds)
        phase = (begin_in_seconds, end_in_seconds)
        phases.append(phase)
    immobility_file.close()
    for index in range(0, len(is_immobile)):
        second = seconds[index]
        is_in_phase = False
        for phase in phases:
            begin_in_seconds, end_in_seconds = phase
            if second > begin_in_seconds and second < end_in_seconds:
                is_in_phase = True
                break
        is_immobile[index] = is_in_phase
    return is_immobile

# Main part of the script

# Name the file we are going to use
units_file_path = "./data_neuron/session_2023111501010_units.csv"
immobility_file_path = "./data_neuron/session_2023111501010_immobility.csv"

# Create a table
table = list()
# Add a column for the seconds
longest_duration, units = get_maximal_duration_and_units(units_file_path)
seconds = get_a_list_with_all_the_seconds(longest_duration)
table.append(seconds)
# Add columns for the spike-counts
units_spike_counts = get_empty_unit_spike_counts(seconds, units)
get_amount_of_spikes_for_every_second(units_file_path, units_spike_counts)
for unit_id in units_spike_counts.keys():
    table.append(units_spike_counts[unit_id])
# Add a column for the immobility
table.append(get_immobility_for_every_second(immobility_file_path, seconds))

print(table)
```

</details>

## Looking at data

Now we know how to make the code easy to read.
This is good for us as programmers, but we are also scientists working with data, so we should discuss the datat too.
What are data?
Usually they are quantifiable observations or measurements,
like the body-temperature of an animal or the number of times it pressed a button.
So they are numbers.
We have to present these numbers in a way, that they can be used in the program and understood by us and our peers.

This is means we have to learn how to look at our data, how to write them down.
Usually the numbers belong together, like multiple temperature measurements of the same animal, in this case we could call this a [dimension](https://en.wikipedia.org/wiki/Dimension_(vector_space)).
Dimension usually refers to something you can measure along, like time, temperature or button-presses.
So they correspond to variables.
Imagine a dimension as a [number line](https://en.wikipedia.org/wiki/Number_line).
On it goes everything an instrument produces, so if you have two thermometers you have two lines or two dimensions.
If you have a clock this is another number line.

Everytime you take readings from your instruments you write down all this numbers, getting a data point.
So if you read of a clock and a thermometer you get a temperature and a time on two axis, a temperature and a time axis.
If you read fast enough your points start to form patterns, if you connect them you get lines or plots.

If you want to compare two experiments or two things you lay the axes over each other.
So you put the time and temperature axes of experiment 1 over those of experiment 2, giving you two overlapping plots.
The question when two points truly belong on the same axis and are in the same dimension and when they are not depends on the question you ask,
but the principle stays the same.

In general we have to discriminate between continious data like temperature and discrete data like number of button presses.
Discrete means that there are steps or boxes, so in our case we have unit 1, unit 2, unit 3 but no unit-2.23 and the units are therefore discrete.
We only observed potentially interesting units in this case and filtered out most of the others.
We also view our time as discrete despite it being continious, because we jump from one second to the other.
This is a deliberate choice we made to get the spikes per second.

As an exercise please think about the units we want to analyze.
How many dimensions do you find there and are they discrete or continuous.
Please note your results in the next cell.

<details>
  <summary>Click to reveal possible solution</summary>

Remember looking at data is interpretation and while there are many wrong interpretations, there are also many right ones.
You will learn with time which perspective or interpretation is beneficial to your work.

For each unit we can measure a spike frequency, so we have three dimensions:
- Time
- Units
- Amount of spikes

All of them are discrete.

</details>

## Structuring data

Now that we have identified our dimensions, we can now begin to store them.
Usually you can split your dimensions/variables into controlled and resulting.
Controlled variables are the ones you or the experiment designer decides on, like the number of days or the number of dishes.
Resulting variables are usually what we are interested in like total cell-area or cell-count.
These are usually hypothesized to [depend]( https://en.wikipedia.org/wiki/Dependent_and_independent_variables)  on the controlled ones. 

If you store your data you can rely on the fact, that there is a [countable]( https://en.wikipedia.org/wiki/Countable_set) and practical finite set of measurements.
Countable means we can assign every measurement a number, so we can store them into a **list**.
So the simplest way to store measurements is in **lists** sorted by the precise time or recording.
This is usually not really helpful, so we return to the controlled variables and think about them.

So we should store our data in a way we can access it by the controlled variables for easier access.
So we store by unit and second.
The seconds are contigious, meaning there are no gaps in our discrete numbers,
therefore we can store them in **lists**.
The uints are discontingious so we store them in sparse arrays or **dicts** instead.

Lastly we have to deal with the immobility.
It belongs to every second in every unit but only exists once, so we only use one list.
For later evaluation we have to add it to every second in every unit.
This is treated separatley, because it does not contain spike counts, so we end up with two tables,
which we combined into one for convenience.

Next we should think about how we should store the contents of the table.
Here I would advocate for lists so we can iterate though them,
the question is should we first select the unit and then the second or first the second and then the unit. 
The consideration here is what we will more likely access together, all data of one unit or all data of one second,
since the units are physically separated and we want to investigate their spike counts,
we may wish to see changes overtime meaning that the seconds should stay together.
This means we will have a **dict** that contains the units and the units themselves are **lists** containing the values for the seconds.
In code this would look like this:

This may seem irrelevant at the moment because we changed nothingon the strucutre of our code.
It becomes very relevant later however, when you deal with more complex data that not just slide into a convenient form.
It is also faster in some languages to iterate along the lowest level of order, in our cause the seconds,
because the **memory-cells** are neighbors,
meaning they are fetched and read together.

I assume you have some questions.
Please ask them now, so you have understood all relevant concepts before we move on to classes and using publicly availabel code in the next unit.