# 7 Input/Output in vanilla python, numpy and pandas

In this notebook we'll cover reading and writing files using the built-in functionality, and two third party packages that you need to have installed. This course doesn't cover `pandas`, but since the file-reading functionality is quite convenient, I'll show you the application briefly. In this notebook we'll start with the project as well.

## 7.1 Reading and writing files with open()

If you read or wrote into a language-agnostic file before (e.g. while logging your experiment), you've probably seen or written something like this:

```
f = fopen("sub1.log", "w");
fprintf(f, "reaction took %f seconds\n", rt);
fclose(f);
```

What you're doing here is opening up the file with the specified permission. In the case above, we opened the file with writing permission (`"w"`). You can also use read permission (`"r"`), append permission (`"a"`, which unline `w` doesn't overwrite the previous content and multiple at once (e.g. `w+` or `r+` to both read and write). Closing the file is important to prevent corrupted files (i.e. something unexpected happens to the content.

If you were to translate the part above directly into Python, it would look like this:

```python
f = open("sub1.log", "w")
f.write(f"reaction took {rt} seconds\n")
f.close()
```

So you see the principle is the same, but notice the differences:
 * `open` instead of `fopen`
 * `.write` and `.close` are methods of the object which represents the stream into the file.
 
One nifty trick to prevent having to close the file again is to use a `context manager`. Those are more fit for a late intermediate course than a beginner's course, but the bottom line in the context of opening a file is, that it gets closed for you once you leave the code block:

```python
with open("sub1.log", "w") as f:
    f.write(f"reaction took {rt} seconds\n")
```

Depending on the permission, the text stream object provides you with a few functions to read from or write to the file:

 * in read mode
    * `.read()`
    * `.readlines()`
 * in write mode
    * `.write()`
    * `.writelines()`

In [8]:
with open("tmp", "r") as f:
    lines = f.readlines()
    l = f.read()

In the data folder, there's a .csv file that we'll use to get to know the scientific python stack.

**Exercise**

Open the file with reading permission and read the content both using `.read` and `readlines`. You'll have to open the file twice, because if you ran either one of these methods, the file has already been read "until the end".

In [12]:
# your code here


## 7.2 Using numpy

With `open` we can read any content in a file as a string. For numeric data, you'll have to process the string in order to do any analysis. Numpy offers solutions to this. We're going to assume that the data is present in a human-readable way. There are also ways to store numpy arrays in another form, but we won't cover that.

The function we'll use to read the data using numpy is `.genfromtxt`. It takes a bunch of potential arguments that make it very flexible, but also complicated. Have a look:

In [13]:
import numpy as np
np.genfromtxt?

To direct your attention towards some important ones: `delimiter` specifies the character between values in the file (e.g. "," or "\t"). `skipheader` specifices whether the first line should be ignored. 

**Exercise**

Read the same file as above, but using `numpy` this time. Make sure that you specify the right arguments.

In [15]:
# your code here

## 7.3 Using pandas 

I said before that we con't cover `pandas` here, which is almost true. The one exception I'm making is the file-reading part, because it shows you what a DataFrame looks like. For the kind of data we're handling here, a DataFrame would probably be the way to go, but I'll stick to the design principles of the workshop and just show you what `pandas` would look like if you want to use it in the future.

In [None]:
import pandas as pd
filename = "../data.csv"
df = pd.read_csv(filename, delimiter="\t")
print(df)

## 7.4 The pickle module

Unlike the options you saw so far, `pickle` gives you a way to store most python objects in a python-specific way. If you have a complicated object that can't be stored in a human-readable file, `pickle` is the way to go. There are really only two functions you need to know:

 * `pickle.load`
 * `pickle.dump`

Because pickling means translating into a stream of 0s and 1s, we need to open files in binary mode. Assuming we have some complex python object that we want to save:

```python
with open("outfile.pic", "wb") as f: #notice I added the b for binary
    pickle.dump(complex_object, f, -1)
```

We added a "b" to get _binary_ writing permission. The `-1` refers to the protocol that is used to serialize the data and refers to the highest available protocol. You can ommit it as well.

If we want to read the object back in, we can to it like this:

```python
with open("outfile.pic", "rb") as f:
    complex_object = pickle.load(f)
```

**Exercise**

There's a mysterious string in the data folder. Try to load it using `pickle`.

In [17]:
# your code here

## 7.5 Starting your project

It's finally time to start the mini-project that we're developing concurrently. I strongly believe in figuring stuff out by yourself as a learning tool, but I'm aware that this approach doesn't work for everyone. As a consequence, I tried to design the project assignments in a way that are largely -but now completely- solvable using the stuff we covered so far. 

Assuming this is an online workshop, we will now split the group into two parts: Those who feel comfortable taking the challenge on on their own go to the breakout room. If you're uncertain about whether you manage to do that or just prefer a more guided approach, you can stay here. We will work out a solution together. Both options are absolutely fine, so don't feel pressured! You can also change your mind midway and change the room.

Here's the task:

**Project exercise 1**

Write a function that reads the `.csv`-file in the data folder. The data consists of multiple variables. The variable names are in the first row. Your function should follow this structure:

```python
def read_data(filename):
    ...
    return data
```

To be more specific, I want the data to consist of a dictionary with one entry per variable. The keys should be the variable names and the values a list or a numpy array of the values for this variable. You can certainly stay in this notebook or a new one and play around to get the solution (this is actually strongly encouraged).

You can use any of the methods outlined above. If you go for the vanilla python one, you will need `str.split` and `str.strip` methods of strings and either the `int` or `float` class.

Have fun!