<font size="+3">Write and read files</font>

In python, we can access information stored in files and save our results in a file too. 

**Table of contents:**
* [1. File basics](#File-basics)
* [2. CSV files](#CSV-files)
* [3. JSON files](#JSON-files)
* [4. Pickle files](#Pickle-files)

# File basics

When we deal with files in Python, remember the three **modes:**
- ```r``` : read mode (default): use ```read()```, ```readline()```
- ```w``` : write mode: existing files are overwritten. use ```write('string')```
- ```a``` : write append mode: appending string at the end of the file
You can find more information in: https://docs.python.org/3/library/functions.html#open

## Write file
Let's create and write into a text file containing one sentence 'Hello world'.

In [1]:
f = open('helloworld.txt','w') # open in write (w) mode
f.write('hello world')         # write a string
f.close()                      # close file handle

You have to open, write and close the file. You have to remember to close the file, otherwise the data might not end up finally writen on the hard drive. A trick to not forget the file is the following:
- ```with open ( )``` : automatically closes the file

In [2]:
with open('helloworld.txt', 'w') as f:
    f.write('Hello world')

Now the file is only open during the scope of the **with** instruction and it will be close when the program exits it.

## Read file

Reading a file is also easy, you need to open it in read mode ('r') and you can use the function **.read()** to read its full content:

In [3]:
with open('helloworld.txt', 'r') as f:
    message = f.read()
    print(message)

Hello world


## Appending to a file
Often, you just want to add something at the end of a file. You don't need to read it all and write it again, instead you can ope in append ('a') mode and write at the end:

In [4]:
with open('helloworld.txt', 'a') as f:
    f.write('\nhello world again')

## Reading line by line

You will often have to read large files. In those cases, you can read line by line with **readline()**. This function saves memory because you rewrite the line each time you read one line.

In [5]:
with open('helloworld.txt') as f:
    line = f.readline()
    while line:
        print(line)
        line = f.readline()

Hello world

hello world again


### Example : Save a list to txt file

In [6]:
# Let's save this list
user = ['AfD', 'CDU', 'CSU', 'fdp', 'spdde', 'Die Gruenen', 'dieLinke']

# example 1
with open('germany_party_account.txt','w') as f:
        for u in user:
            f.write('{}\n'.format(u)) # '\n' is the new line character

In [7]:
# example 2
with open('germany_party_account2.txt', 'w') as f:
    f.write("\n".join(user))

Some quick questions:
- What is the difference between the two examples above?-
- Is the content of both files exactly the same?
- What would be the content of the file if both examples used the same filename?
- How can you change the code to save with tabulators as separator between party names instead?

### Exercise 1: writing your own csv file
Using the data about user names and user ids below, write their content in a file in CSV (comma-separated values) format. Remember that CSV files have a comma (',') between fields of each row and a line break ('\n') between rows.

In [8]:
user = ['AfD', 'CDU', 'CSU', 'fdp', 'spdde', 'Die Gruenen', 'dieLinke']
user_id = [844081278,20429858,21107582,39475170,26458162,14553288,44101578,35562287]

In [9]:
# Your code here

In [10]:
# %load solutions/42_files_ex_1.py

# CSV files

CSV files are very common and the **csv** module has useful functions to read and write csv files. The example below writes the data to a csv file with a csv.writer():


In [11]:
import csv
with open("germany_party_account4.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerow(["username","user_id"])
    for u,i in zip(user,user_id):
        writer.writerow([u,i]) # <- It has to be a list !!! Be careful

If your data is stored in a list of dictionaries, you can store it easily as a csv file:

In [12]:
# Lets store the info into the list of dictionary format
list_of_dict= []
for u, i in zip(user, user_id):
    list_of_dict.append({'username':u, 'user_id':i})
print(list_of_dict)


[{'username': 'AfD', 'user_id': 844081278}, {'username': 'CDU', 'user_id': 20429858}, {'username': 'CSU', 'user_id': 21107582}, {'username': 'fdp', 'user_id': 39475170}, {'username': 'spdde', 'user_id': 26458162}, {'username': 'Die Gruenen', 'user_id': 14553288}, {'username': 'dieLinke', 'user_id': 44101578}]


In [13]:
with open('germany_party_account5.csv', 'w') as output_file:
    fc = csv.DictWriter(output_file, fieldnames=list_of_dict[0].keys())
    fc.writeheader()
    fc.writerows(list_of_dict)

Some quick questions:
- Are the files germany_party_account4.csv and germany_party_account5.csv exactly the same? How can you tell?
- What did writeheader() do above?
- What would happen if a dictionary has an extra field that the others don't have?

## JSON files
Similarly to CSV, you have a module called **json** with functions to read and write files in the JSON format. For example, consider the following tweet:

In [14]:
import json
tweet = {
    "created_at": 'Sat Feb 08 15:29:13 +0000 2020',
    "full_text": "I love python!",
    "user": {
        "id": "1234567890",
        "screen_name": "PythonLover"
    }
}

In [15]:
tweet

{'created_at': 'Sat Feb 08 15:29:13 +0000 2020',
 'full_text': 'I love python!',
 'user': {'id': '1234567890', 'screen_name': 'PythonLover'}}

We can dump it to a string with **dumps()**, this is how it would look in a file:

In [16]:
json.dumps(tweet) 

'{"created_at": "Sat Feb 08 15:29:13 +0000 2020", "full_text": "I love python!", "user": {"id": "1234567890", "screen_name": "PythonLover"}}'

We can write it to a file with **dump()**:

In [17]:
# Write tweet data into json format (Serialization)
with open("tweet_example.json", "w") as write_file:
    json.dump(tweet, write_file)

And we can read it with **load()**:

In [18]:
# Read json format file (Deserialization)
with open("tweet_example.json", "r") as read_file:
    tweet2= json.load(read_file)

In [19]:
tweet2

{'created_at': 'Sat Feb 08 15:29:13 +0000 2020',
 'full_text': 'I love python!',
 'user': {'id': '1234567890', 'screen_name': 'PythonLover'}}

Quick question:
- Guess how to load a json object from a string. You can do it with the result of dumps() you ran before.

### Exercise 2: read a JSONL file
There is a common format of JSONL files, which are files that have one JSON object in each file. Write a code that writes three tweets to a JSONL file and that reads them again.

In [20]:
# Your code here

In [21]:
# %load solutions/42_files_ex_2.py

# Pickle files

Python object serialization: You can pickle any Python object! Pickle uses a binary serialization format, so **remember to use 'wb' (write-binary) mode for writing and 'rb' (read-binary) mode for reading.**

#### Why Use Pickle?
Pickle is particularly useful for saving complex data types like lists, dictionaries, custom classes, and other objects. It allows you to store these objects in a binary file, which can later be reloaded into Python, preserving the object’s structure and data.

In [22]:
import pickle
data = zip(user,user_id)
with open('tweet_example.pickle', 'wb') as f: # set 'wb' instead of 'w'
    pickle.dump(data, f)

In [23]:
with open('tweet_example.pickle', 'rb') as f:
    data = pickle.load(f)

In [24]:
for i in data:
    print(i)

('AfD', 844081278)
('CDU', 20429858)
('CSU', 21107582)
('fdp', 39475170)
('spdde', 26458162)
('Die Gruenen', 14553288)
('dieLinke', 44101578)
