# External modules

Last unit we visualized Bobs data. Now we have to analyze them a little bit, afterwards we will talk about simulation, before we work with real data.

So let us first take a look at the data again:

In [None]:
import csv
import pathlib
import matplotlib.pyplot

def process_csv(csv_file, dishes):
    with open(csv_file, "r") as csv_file_handle:
        _, day, _ , dish_number = str(csv_file.stem).split("_")
        day = int(day)
        dish_number = int(dish_number)
        cell_counter = 0
        cell_area_counter = 0
        reader = csv.DictReader(csv_file_handle)
        for row in reader:
            cell_counter += 1
            cell_area_counter += int(row[" Cell Area"])
        if dish_number not in dishes.keys():
            dishes[dish_number] = {}
        dishes[dish_number][day] = {
            "cell_count": cell_counter,
            "area": cell_area_counter
        } 
    return

csv_files = list()
data_folder = pathlib.Path("./data")
for csv_file in data_folder.iterdir():
    if "dish_" in csv_file.stem:
        csv_files.append(csv_file)


dishes = {}
for csv_file in csv_files:
    process_csv(csv_file, dishes)

area = []
count = []
cells = {"area": area, "count": count}
# We know that the dishes are numbered so we iterate over them<
for dish_number in range(1, len(dishes) + 1, 1):
    dish = dishes[dish_number]
    dish_area = []
    dish_count = []
    # We know that the days in the dishes are numbered
    for day_number in range(1, len(dish) + 1, 1):
        value_pair = dish[day_number]
        day_area = value_pair["area"]
        day_count = value_pair["cell_count"]
        dish_area.append(day_area)
        dish_count.append(day_count)
    area.append(dish_area)
    count.append(dish_count)

figure, axes = matplotlib.pyplot.subplots(2,1)
days = [day for day in range(0, len(cells["count"][0]), 1)]
for dish in range(0, len(cells["count"])):  
    axes[0].plot(days, cells["count"][dish], label=f"Dish {dish}")
    axes[1].plot(days, cells["area"][dish], label=f"Dish {dish}")
figure.suptitle("Cell growth")
axes[0].set_title("Cell count")
axes[0].set_xlabel("Days")
axes[0].set_ylabel("Number of cells")
axes[1].set_title("Cell area")
axes[1].set_xlabel("Days")
axes[1].set_ylabel("Area covered by cells")
axes[0].legend()
axes[1].legend()
matplotlib.pyplot.show()

## Histogram
Todo

## Simulation

As you know, Alice and Bob are not real and neither are their data, so how did I generate the csv-files.  The answer to this is simulation. Especially in Physics simulations are often used tool to answer questions that cannot be answered with simple experiments. If we want to know how galaxies form we can neither make one in our own backyard nor can we observe it in our lifetimes, so we build a mathematical model in a computer and investigate it.

In biology, computer simulations are more difficult to perform, because we lack a sufficiently advanced mathematical understanding of the problems we investigate. Expressed in a simpler way “it is easier to calculate how two galaxies collide, than how two cell interacts with each other. You can see this on the way I simulated our cells.

I first created a big empty dish. Then I placed a cell in it and let its nucleus grow while growing a cell body around it. Whenever the nucleus split by accident, I considered this a normal cell division. If a cell lost most of its body or was too small, I considered it dead. If you find this simplification revolting, you have already understood why biology is not easy to simulate. There are a lot more nuances and rules to consider than in Physics.

I mention that simulations exist, because I believe that during your career you will may encounter questions that can be answered by writing a short program and running it instead of using a plant or animal and that the use of simulation will slowly proliferate within biology. For the latter case always remember that a simulation is a simplified mathematical model and therefore flawed, so if you use it always ask which corners were cut and how this will influence your research.

The code I used to create the csv-files can be found in ```cell_simulation.py```. 

## Real data

Todo

# Rework t

After the philosophical part let us get an example so we know what we are talking about. 

Our group focuses how oxytocin, a neurohormone synthesized in the hypothalamus, influences maternal care. A major feature of mammalian maternal care is milk supply through the mammary gland. You may have already heard that oxytocin is secreted into the blood stream in the pituitary gland following suckling of the offspring. It elicits milk ejections by evoking contractions of smooth muscle cells in the mammary ducts.

Interestingly, oxytocin is not continuously secreted during suckling but released in bursts. This burst-like secretion is caused by bursts of oxytocin neural activity. We try to understand how this activity is generated, by investigating the behavior of rats and activity in their brains.

The example we picked for you is the analysis of such neural activity with behavioral data. You will work with our experimental data. This data consists of extracellular spike (neuroscientists call action potential "spikes") recordings in which many cells have been recorded simultaneously, as well as pre-analyzed video-based behavioral data. We analyzed the movement of the dam, because the animal stops moving before milk ejection. 

Only one of the cells recorded is a potential oxytocin neuron. The final goal of the course will be to find this oxytocin neuron by identifying the characteristic firing pattern shown in the following image.

![Sketch of the experiment and resulting plot](img/experiment.jpg)

A. Recording setup with an electrode that has many contacts along one [single shank](https://en.wikipedia.org/wiki/Neuropixels).
B. Top: Maternal behavior (immobility vs. mobility). Bottom: Neural activity. Note that we see the spikes per second (Hz). We also see that there are seven distinct bursts measuring around 100 spikes per second. Also, note that the burst pattern seen here has been described to coincide with immobility. The combined investigation of both neural activity and behavior thus provides converging evidence that the recorded neuron is an oxytocin neuron.

![The action potential of a Oxytocin cell showing a burst with multiple spikes](img/OTburst.png)

An example of an oxytocin burst consisting of multiple spikes. You can see a large number of spikes in the center of the image.

# ToDo

- Introduce to histograms
- Introduce to pandas
- Add neural data