# Reading and writing files (input and output)

Files are a bit tricky to deal with. You basically have to follow a ceremony to work with them:

1. Open the file and specify if you want to read or write to it
2. Do what you want to do with the file's contents
3. close the file

Opening a file is not that difficult:

In [None]:
file = open( "data/masterkurs2018_n2a_differentiated_20180108_20180109_day1.csv" )

And if there is no error message then this worked.

Now, for the next part. Let us just read the first line:

In [None]:
file.readline()

Do you recoginize this line? Also note the many `\t`s? These are the tab symbol and they are meant to separate columns. We have a lot of empty columns.

What happens when we call `readline` again?

In [None]:
file.readline()

We get the next line! That is nice. But calling `readline` again and again is tedious. And we want the computer to do the tedious work, right? So how about this?

In [None]:
for line in file:
    print( line, end = '' ) #each line already has an end of line marker. We do not want a double end of line

And the for loop automatically stops when there are no more lines left. Even when you run the loop again it will print nothing more, because the file only cares about the lines that have not been read yet.

Finally, it is very important to close the file when we are done with it, because any open file consumes some resources from the system. It's not much, but if you open thousands of files and never close them, the system might experience a shortage of resources. Closing a file is also not that difficult:

In [None]:
file.close()

There is a nice trick that you can use so that your files will always be closed when you are done with them:

In [None]:
with open( "data/masterkurs2018_n2a_differentiated_20180108_20180109_day1.csv" ) as file:
    first_line = file.readline()
    print( first_line )

print( "The file should be closed now" )

#file.readline() #this line will cause an error because the file is already closed

The `with` trick is nice. But if your file code spans several Jupyter cells it will not work.

The `open` function has two parameters: The filename and the action. The action can be one of the following:

* "r" read from the file. This is also the default if you do not specify the second parameter
* "w" create a new file or overwrite an existing file
* "a" append to an existing file

So, if you want to write to a file you can do the following:

In [None]:
with open( "data/my_file", "w" ) as out: #note the "w" here for "write". And our variable is called "out"
    out.write( "This is a message in a bottle" )

#let us retrieve the message again:
with open( "data/my_file", "r" ) as file:
    print( file.read() )

This already covers most of the things that you need when working with files. Python assumes that whatever you read or write will be simple ASCII/UTF-8 text. If you need to read/write something else <a href="https://docs.python.org/3/library/functions.html#open" target="_blank">you have to specify "rb", "wb" or "ab"</a>.

When you read a file the lines in the file will still have their *new line marker*. The easiest way to get rid of it is to use the `rstrip` method on the line string:

In [None]:
with open( "data/masterkurs2018_n2a_differentiated_20180108_20180109_day1.csv" ) as file:
    for line in file:
        print( line.rstrip( '\n' ) ) # `\n` is the new line marker that will create a new file.
        #it is also possible to leave out the parameter to `rstrip` but this will also remove `\t`s
        #when you have empty columns at the end of the line. This will cause problems in practise.

Speaking of files, you can also use Python to ask about the file system. The `os` module provide several facilities for this:

In [None]:
from os import getcwd, listdir
print( "We are currently in the directory {}".format( getcwd() ) )
print( "And here we have the following files:" )
for filename in listdir():
    print( filename )

In [None]:
from IPython.display import display, HTML
from os import listdir

topic_of_next_chapter = 'plot stuff' #this does not seem right

for filename in listdir():
    if topic_of_next_chapter in filename:
        display( HTML( '<a href="{}" target="_blank">Next up: Matplotlib!</a>'.format( filename ) ) )