# Classes and modules

## Recall

Last unit we learned how to use control structures like ```for```, ```if``` and ```functions```.
We used this knowledge to complete the code to read in the comma-separated-values-files.

<details>
  <summary>Show Code</summary>

```Python
def process_csv(csv_file, dishes):
    """!
        @brief This function reads in specific csv-file and adds the contents into the dict dishes
        @details We assume that we get an csv-file with a header line and 7 fields per row.
            The name and contents of the csv-file should be defined as given in param, so we can extract the day and dish number.
            The contents are then stored in a dict passed as dishes defined in param.
        @param csv_file the path to the csv-file as a str, it should follow the form Day_[day]_dish_[dish]_zoom_[zoom].csv,
            where all [] denote numbers extracted as meta data.
            The 5th field should contain the area of the cells.
        @param dishes a dict that will be filled with the contents of the csv-file.
            The first layer of keys will be the dish-numbers, the values belonging to them are dicts.
            These dicts contain the days as keys and dicts as values. The lowest layer dicts contain
            the "cell_count" and "area" as keys. Their values are the total cell-count for that day and dish and the
            total area for that day and dish respectivley.
        @return None
    """
    with open(csv_file, "r") as csv_file_handle:
        _, day, _ , dish_number, _, zoom_factor = csv_file.split("_")
        cell_counter = 0
        cell_area_counter = 0
        line_counter = 0
        for line in csv_file_handle:
            if line_counter != 0:
                cell_counter += 1
                cell_id, nucleus_x, nucleus_y, nucleus_area, cell_area, center_of_area_y, center_of_area_y = line.split(",")
                cell_area_counter += float(cell_area)
            line_counter += 1
        if dish_number not in dishes.keys():
            dishes[dish_number] = {}
        dishes[dish_number][day] = {
            "cell_count": cell_counter,
            "area": cell_area_counter
        } 
    return

csv_files = [
    "./data/Day_1_dish_1_zoom_3.csv"
]

# Create something to save the dishes
dishes = {}

# Go through all files
for csv_file in csv_files:
    process_csv(csv_file, dishes)
```

</details>

We then restructured the data to make the more accessible.

<details>
  <summary>Show Code</summary>

```Python
area = []
count = []
cells = {"area": area, "count": count}
# We know that the dishes are numbered so we iterate over them
for dish_number in range(1, len(dishes) + 1, 1):
    dish = dishes[str(dish_number)]
    dish_area = []
    dish_count = []
    # We know that the days in the dishes are numbered
    for day_number in range(1, len(dish) + 1, 1):
        value_pair = dish[str(day_number)]
        day_area = value_pair["area"]
        day_count = value_pair["cell_count"]
        dish_area.append(day_area)
        dish_count.append(day_count)
    area.append(dish_area)
    count.append(dish_count)
print(cells)
```

</details>

You may have expected that reading in a csv-file is a common task and there is a more comfortable solution, then coding it yourself. In Python, we often use code provided via the python package index. This code usually comes in the form of modules. Most modules use classes. So this unit deals with first with classes and then modules.

## Classes

In the last unit you learned how to use **functions** to structure your code. **functions** usually take **arguments** as inputs. Programmers learnt that most **functions** are very closely associated with a set of **arguments**.  So they concluded that these arguments should be bundled in [structures]( https://en.wikipedia.org/wiki/Struct_(C_programming_language) and associated with their **functions**. This lead to **classes** and the emergence of [object-oriented-programming](https://en.wikipedia.org/wiki/Object-oriented_programming).

Object oriented programming tries to understand the world as a limited set of abstract ideas. It tries to simplify, by finding a set of shared attributes and behaviors. Alice for example decided to describe all her cells by:

- their ID
- the position of the nucleus
- the area of the nucleus
- the center area of the cell
- the cell area

So she defined herself a "cell" **class**. 
The objects belonging to or the **instances** of this **class**  are the entries in the csv-file she gave Bob.

As mentioned before classes are a combination of **functions** and **values**. So we could add a **function** to this cell-**class** if we want, like calculating the difference between nucleus and total cell-area. This **function** belonging to the class would then be called a **method**.

There are a few **methods** that should always exist. If we do not write them Python will create some dummy method instead. A **method** that almost every class will have is the constructor, that transfers the **values** into the object / **instance**. In Python it is called ```__init__```.

To create a **class** we begin with the keyword ```class``` followed by its name and ```:```. We then begin listing its attributes like the **methods**. So we write a **function** called ```__init__```. The first argument of every **method** in Python has to be the **instance** of the **class** itself, therefore it is usually named ```self```, then the other **arguments** follow.

Let us put this into code:
```Python
# Define the class with the name "Cell"
class Cell:
    # Write the constructor with all the arguments we need to store our attributes
    def __init__(self, identification, position_nucleus, area_nucleus, center_cell_area, cell_area):
        # I use to name the attributes in the cell like the arguments to avoid confusion
        # You may as well named them differently and write:
        # self.id = identification
        self.identification = identification
        self.position_nucleus = position_nucleus
        self.area_nucleus = area_nucleus
        self.center_cell_area = center_cell_area
        self.cell_area = cell_area
```

This snippet defines the **class**, but does not create an **instance**. The computer knows what cell is, but it does not know any specific cell. To create a specific cell we have to create an **instance**. In our case, we will create an **instance** called ```cell1```:

```Python
cell_1_id = 1
cell_1_position_nucleus = (45, 74)
cell_1_nucleus_area = 65
cell_1_area = 231
cell_1_center_of_area = (49, 71)
cell_1 = Cell(cell_1_id, cell_1_position_nucleus, cell_1_nucleus_area,  cell_1_center_of_area, cell_1_area)
```

If we no wish to gain the area of the cell we can access the attribute or member by following up the name of our **instance** with ```.``` and the name of the member:

```Python
cell_1_area = cell_1.cell_area
```

This approach become useful if we deal with a large number of **instances**, in our case a few hundred cells for example.

Now you know how classes are generally used. Let us start with a few exercises.  First create write your own Cell **class** and add a method to calculate the ratio between nucleus area and cell area. To access the attributes in the class use ```self```. If you are stuck at this exercise try to search for inspiration on the internet.

In [None]:
# Your code goes here

<details>
  <summary>Click to reveal solution</summary>

```Python
class Cell:
    def __init__(self, identification, position_nucleus, area_nucleus, center_cell_area, cell_area):
        self.identification = identification
        self.position_nucleus = position_nucleus
        self.area_nucleus = area_nucleus
        self.center_cell_area = center_cell_area
        self.cell_area = cell_area
    
    def nucleus_to_total_area_ratio(self):
        return self.area_nucleus / self.cell_area

cell_1_id = 1
cell_1_position_nucleus = (45, 74)
cell_1_nucleus_area = 65
cell_1_area = 231
cell_1_center_of_area = (49, 71)
cell_1 = Cell(cell_1_id, cell_1_position_nucleus, cell_1_nucleus_area,  cell_1_center_of_area, cell_1_area)

print(cell_1.nucleus_to_total_area_ratio())
```

</details>

In the previous example we used **tuple** to represent the positions. We could do this with a class instead. Please write a **class** that represents a point and contains a **method** to calculate the distance to another point.

In [None]:
# Your code goes here

<details>
  <summary>Click to reveal solution</summary>

```Python
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def distance(self, other_point):
        distance = ((self.x - other_point.x) ** 2 + (self.y - other_point.y) ** 2) * (1/2)
        return distance

point_1 = Point(45, 74)
point_2 = Point(49, 71)
distance = point_1.distance(point_2)
print(distance)
```

</details>

ToDo:
- Doc strings
- Typehints
- Mention inheritance and data-classes
- Mention class-methods
- Module on the example of csv-reader
- What is module
- How to write module
- Create the algorithm that creates the csv files