# Read And Write Files
## Prerequisites
This unit assumes that you are familiar with the following content:
- [Variables](../20_variables_and_datatypes/10_variables_eng.ipynb)
- [In- and Output](../20_variables_and_datatypes/20_in_and_output_eng.ipynb)
- [Primitive Datatypes](../20_variables_and_datatypes/30_datatypes_eng.ipynb)
- [Conditionals](../30_conditionals/conditionals_eng.ipynb)
- [Complex Data Types](../40_complex_data_types/lists_eng.ipynb)
- [Loops](../50_loops/for_loop_sja_eng.ipynb)

You will also need this chapter to edit csv files
- [Bibliotheken](../70_libraries/libraries_eng.ipynb)

## Motivation
When working with computers, files are omnipresent. Files are created, read, changed, copied, sent, moved, deleted, restored, ... So far, all data is lost in our programs as soon as the program is ended. With files, it is possible to save data permanently (persistent) and later to access this data again.

Of course, it is also possible to work with files in Python 🐍.

## What is a file?
All of you have already worked with files: Word files, Excel files, this notebook, Python programs, ... But what is *exactly* a file now? A possible definition could be:

> A file is a set of **logically related** and mostly **sequential** ordered data that are stored on a medium **permanently** and can be addressed using a **name** (identifier).

Please also have a look at the article on [file](https://de.wikipedia.org/wiki/Datei) on Wikipedia.

### Example: textfile.txt
Save a simple mail as a text file (extension .txt) with the name textfile.txt. How can you see the above points now?
- The text of the mail is usually logically connected, e.g. on a subject
- The text is structured sequentially: one line follows the other, lines have words, words have letters, and letters are lined up one after the other.
- The text file.txt file is saved permanently (after you have saved it).
Even if you close the e-mail program or even switch off the computer, the file remains on the computer's memory.
You can open the file again next time, even with another program.
- The file has a unique name: textfile.txt

## Where is the file located?
Nowadays the programs and apps save the files "somewhere" on the computer or on the smartphone.
As a user, you shouldn't have to worry about where files are, where they're stored.
(Do you know where your mp3 files are the smartphone?)

If you want to access files with programs, you need to know **where** exactly these files are located.
### Important for this notebook
The following applies to this and the other notebooks:
Unless otherwise specified, the file that is accessed is in the same directory as the notebook.
If you download a notebook, you must also download the files and save them in the same folder.
Otherwise, some things will not work.


## Access files in Python
The basic handling of a file always consists of the following three steps:
- Open the file and assign the file to a variable
- Edit the file
     - Read from the file (read access)
     - Write to the file (write access)
- Close the file

To open a file there is the function `open()`.
There are methods such as `.write()`, `.read()` or `.close()` for further handling of the file.
There are also libraries that offer additional functions for special file formats such as .csv, .json, ...


## Open Files
A file can be opened with the Python function `open()`.
The function expects the name of a file as a parameter (If necessary, this can be extended by the path to the file if the file is not in the same directory as the program).
Besides, the mode in which the file is to be opened can optionally be specified.
The available modes are listed in the [Python Documentation](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files).
The central modes are:

| Mode   | Discription                                                                                                                                                                    |
|:-------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| r      | It is only read from the file. Write access leads to errors. If the file does not exist -> error message. In the beginning, the read pointer lies at the beginning of the file |
| w      | The file is being written to. If only read access -> error. If the file does __not__ exist, a new file is created. If a file exists, the old content is deleted.               |
| a      | New content can be appended to the old content (append). The write pointer is positioned at the end of the file.                                                               |
| r+     | Read and write access. Error if the file does not exist.                                                                                                                       |
| w+     | Read and write access. New file if the file does not exist. The contents of the existing file will be **overwritten**.!                                                        |
| rb     | A "b" can be added to the mode. In this case, there is no text file however a binary file.                                                                                     |

If no mode is specified, the default value is `" r "`. **Recommendation:** ALWAYS specify a mode. This simplifies the maintenance of the program.

## Examples and tasks
In the following tasks and examples, the modes "r" and "w" will be inspected again.
### Create A File In Write Mode
In the following program, a file is opened in write mode.
Since the file (probably) does not yet exist on your computer, a new one will be created.
The program eventually does not write anything to the file, but the file still exists.
(Important: If no further path is specified, the file is created in the same folder in which the notebook file is located.)

Run the program and then check whether the file has been created.

In [None]:
# Program 1
# File is opened for writing
file = open("new_file.txt", "w")
# The file is closed again
file.close()


### Open A File in Read Mode
In the next program, the file from the first program is opened in read mode and closed again. Let the program run.
Then **Delete** the file "new_file.txt" and let the program run again. What is happening?

In [None]:
# Program 2
# File is opened for reading
file = open("new_file.txt", "r")
# The file is closed again
file.close()

### Open an existing file in write mode
Open "new_file.txt" file with a text editor, enter a few characters and lines, save and close.
Now run program 1 (see above) again. Then use the text editor to check the content of the file. What happened?

## Read files
How do you read from a file now? Files are organized sequentially (see above), i.e. they consist of successive lines.
The `for` loop is suitable for processing sequences. Specifically, you can iterate over the lines of a file:

In [None]:
# open file
file = open('lorem_ipsum.txt', 'r')

#Read the file line by line and output the lines
for line in file:
    print(line)

#close file
file.close()

There are several ways to correct this behavior.
On the one hand, you can set the parameter `end` to an empty character in the function` print()` ` end = ""`.
Another possibility is to "strip" the line first. For strings, there is a method called `.strip()`.
This removes spaces, tabs, and line breaks at the beginning and end of a string.
`.strip()` is often used when reading forms to prevent a leading space from changing the input.
Alternatively, you can use `.lstrip()` or `.rstrip()`. In this case, stipping happens only at the beginning or end of the string, hence the left or right strip.

In [None]:

# open file
file = open('lorem_ipsum.txt', 'r')

#Read the file line by line and output stripped lines
for line in file:
    line = line.strip()
    print(line)

#close file
file.close()

### Output the contents of a file twice
In the following program, the `for` loop is run twice. What does the output look like? Why?

In [None]:
# open file
file = open('lorem_ipsum.txt', 'r')

#Read the file line by line and output stripped lines
print("First round")
for line in file:
    line = line.strip()
    print(line)
    
#Read the file line by line and output stripped lines
print("Second round")
for line in file:
    line = line.strip()
    print(line)

#close file
file.close()

When reading a file, the "reading pointer" is moved character by character over the file.
Can we change the pointer position so that we read specific characters in specific positions?
To place the pointer where we want, the method `.seek()` is introduced below.

### Read the file into a list in one go
Line breaks may be redundant and are only available because e.g. a paper page has a limited width.
In some cases, it may make sense to read the entire text "at once" without iterating over the lines with a loop.
The method `.readlines()` do just that. ### Read the file into a list in one go
Line breaks may be redundant and are only available because e.g. a paper page has a limited width. In some cases, it may make sense to read the entire text "at once" without iterating over the lines with a loop. The method `.readlines()` do just that. The result is a **list** with one entry if there are no linebreaks.
Otherwise each entry in the **list** is a line in the file.

In [None]:
# open file
file = open('lorem_ipsum.txt', 'r')

# Read the file in one go
line = file.readlines()
print(line)

# close file
file.close()


### Open file with `with`
As seen in the previous examples, files must always be closed after opening.
Since forgetting to close is a common cause of errors, there is a `with` keyword in Python.
This ensures that open files are always closed correctly.

In [None]:
# Open file in read mode
with open('lorem_ipsum.txt', 'r') as file:
    # Read the file line by line and output lines
    for line in file:
        print(line)

#File will be automatically closed upon leaving the scope

## Write Files
To be able to write to a file, it must be opened in a mode that allows writing (e.g. the mode`'w'`).
You can then write data to the file using the `write` method. This is shown in the following cell.

In [None]:
with open('numbers.txt', 'w') as f:
    for i in range(100):
        f.write(str(i) + '\n')

Check the result in the file "numbers.txt" with a text editor.
Question: Why is the integer `i` being transformed into a string?
Another question: Why is a `\n` concatenated to the number?
Experiment with the above program, check the changes to the file with a text editor.



### Task: Spell me the alphabet please!
To teach your fellow mates at the daycare center on how to spell all letters of the alphabet create a program that writes all letters from a-z to one line in a file.
Note: The function `chr()` converts a number into a letter.
The number 97 corresponds to an **a**, the number 98 corresponds to a **b**, etc.
The ASCII table is responsible for these assignments.
ASCII is a coding standard that assigns the characters and commands of a typewriter to bit combinations.
The bit combinations are usually given as numbers from 0 to 127.
See also [here](https://de.wikipedia.org/wiki/American_Standard_Code_for_Information_Interchange#ASCII table).

In [None]:
#with open ...

## Place the reading pointer with `.seek()`
The reading pointer can be repositioned using the `.seek()` method. Two arguments are passed to the method.
The first argument specifies how many bytes (!) The pointer is shifted.
The second argument specifies where to start shifting from. The following applies:

* Second argument = 0 →  from start (default value)
* Second argument = 1 →  from current position
* Second argument = 2 →  from the end

examples:
* file.seek(3) → Pointer is on the third byte
* file.seek(5,1) → The pointer is moved 5 Bytes further from the current position
* file.seek(0,0) → Pointer back to the beginning of the file

Experiment with the parameters of `.seek()` in the following file.

In [None]:
file = open("numbers.txt", "r")
file.seek(60,0)
for line in file:
    print(line)

file.seek(0)
for line in file:
    line = line.strip()
    print(line)
file.close()

## Task 1: Two issues
A program first introduced above that displays the content of a file twice.
Copy the program from above into the following cell and complete it so that there are two different formats.

In [None]:
#TODO THis question is not clear!


## Task 2: Always Backup
Create a program that copies the text file "lorem_ipsum.txt".
Afterward, expand the program to first ask for the name of the original file, then for the name of the new file.
Then it will be copied assuming the file exists.


## Task 3: CSV files
The file `studenten.csv` contains a list of students in CSV format.
Use the CSV module to read the file and then display the names of the students on the list.

## Final task: Alice in Wonderland

Write a program that creates a text file called alice_words.txt. This should contain an alphabetical list of all words and their frequency in the text version of Alice´s Adventures in Wonderland. Use functions to structure your program.

A text version of the book can be found in the file `alice.txt`.

The first 10 lines of your output file should look something similar to this:

```text
Word    Count
a       631
a-piece 1
abide   1
able    1
about   94
above   3
absence 1
absurd  2
```

Your program should also print the longest word in the book. How many characters does this word have?