# File Input/Output, JSON, CSVs

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/12_file_io.ipynb">Link to interactive slides on Google Colab</a></small>

To add: file path overview (at least cover `/` and `..`)

# Reading files

Python allows you to read the contents of files:

In [None]:
f = open('../../snippets/names.txt')
print(f.read())

You can also get the data one line at a a time:

In [None]:
f = open('../../snippets/names.txt')
for line in f:
    print(line, end='')

Note the `end=''` for the `print`. When reading a file, the newlines from the file are returned. You can strip them out with the `strip()` string method, depending on how you want to use the data.

In [None]:
f = open('../../snippets/names.txt')
for line in f:
    print(f"Name: {line.strip()}")

## File objects

`open()` returns a file object. File objects have a number of methods, and are also **iterables**, as we just saw.

# Writing Files

You can write strings to files also:

In [None]:
f = open('../../snippets/output.txt', 'w')
f.write("Hello, files!")
f.close()

g = open('../../snippets/output.txt', 'r')
print(f"File contents: {g.read()}")

## File modes

Notice that we called `open` slightly differently for writing vs. reading:

`open('../../snippets/output.txt', 'w')`  
vs.  
`open('../../snippets/output.txt', 'r')`

The second paramater is the "mode". There are several modes, but the most commonly useful are:

| character | mode |
|:---:|:---|
| r | open for reading (default) |
| w | open for writing, truncating the file first |
| a | open for writing, appending to the end of file if it exists |

## Closing files

Files need to be closed when they are no longer needed. 

We did it above with the `.close()` method.

This is especially important when writing, because data is sometimes not actually written to the file until `close()` is called!

In [None]:
f = open('../../snippets/output.txt', 'w')
f.write("Hello, files!")

g = open('../../snippets/output.txt', 'r')
print(f"File contents: {g.read()}")

## `with`

There's a convenient way to ensure you don't forget to close a file: a `with` clause.

In [None]:
with open('../../snippets/names.txt', 'r') as f:
    print(f"File closed before for loop? {f.closed}")
    
    for line in f:
        print(line, end='')
    
    print(f"File closed inside after for loop? {f.closed}")

print(f"File closed at end? {f.closed}")

This code opens the file, assigns the file object to `f`, executes the code inside the `with` block, then automatically closes the file when exiting the `with` block.

## Context managers

File objects are **context managers**, which means they can be used with the `with` statement to manage resources automatically when entering and exiting a `with` block.

There are other context managers in Python, and you can even [write your own](https://docs.python.org/3/reference/datamodel.html#context-managers). 

It's good practice to handle file objects with `with` rather than closing manually.

# JSON

We know how to read and write strings, but what about other types - ints, floats, lists, dictionaries?

Python has the `json` package, which contains utilities for reading and writing **JSON**.

JSON stands for "JavaScript Object Notation". It is a very popular, widely used data exchange format. If you need to store structured data, it's a good default choice.

JSON can represent strings, ints, floats, booleans, lists, dictionaries, and `None`.

In [None]:
import json
mydata = {
    "numbers": [1,2,3,4],
    "another number": 2.75,
    "more dictionaries": [{'a': 1, 'b': 2, 'c': 3}]
}
json.dumps(mydata)

## dump / dumps

The `dump` and `dumps` methods **serialize** data structures:
* `json.dumps(<object>)` **serializes** a data structure to a string.  
* `json.dump(<object>, <file object>)` **serializes** a data structure to a string and writes the string to a file.

"Serializing" a data structure means converting it to a string (or bytes) representation.

An easy way to remember the difference: The `s` on `dumps` actually stands for `string`.

## Pretty printing

`json.dumps()` also takes an optional `indent` parameter. If specified, it will "pretty print" the JSON:

In [None]:
import json
mydata = {
    "numbers": [1,2,3,4],
    "another number": 2.75,
    "more dictionaries": [{'a': 1, 'b': 2, 'c': 3}]
}
print(json.dumps(mydata, indent=4))

## load / loads

`load`/`loads` do the opposite of `dump`/`dumps`: they **parse** strings into data structures.
* `json.loads(<str>)` **parses** a string into a data structure.  
* `json.load(<file object>)` reads the contents of a file and **parses** it into a data structure.

In [None]:
import json
mydata = {
    "numbers": [1,2,3,4],
    "another number": 2.75,
    "more dictionaries": [{'a': 1, 'b': 2, 'c': 3}]
}
with open('../../snippets/test.json', 'w') as f:
    json.dump(mydata, f)

with open('../../snippets/test.json', 'r') as f:
    data = json.load(f)

print(data)

# CSV files

CSV stands for "character separated values". 

In CSV files, rows of data are represented by lines in a file, and columns of data are separated by a specific character, called a **delimiter**. Commas (`,`) are commonly used as a delimiter, but any character can be a delimiter.

CSV files are another common way to store structured data, especially if the data is tabular (like a spreadsheet).

Here's an example of CSV data. Each line contains 4 fields: `id`, `name`, `house`, `hair color`:

In [None]:
with open('../../snippets/csv_example.csv', 'r') as f:
    for line in f:
        print(line, end='')

## Reading CSVs

We could just use `.split(',')` to split each line into a list. But Python provides some nice CSV utilities in the `csv` module.

`csv.reader()` creates an iterable object that produces each line as a list.

In [None]:
import csv
with open('../../snippets/csv_example.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    for line in reader:
        print(line)

## Writing CSVs

`csv.writer()` creates an object with a `writerow` method, which takes a `list` and writes it out as a single CSV row.

In [None]:
import csv
data = [
    ['11', 'Harry', 'Gryffindor', 'Brown'],
    ['18', 'Draco', 'Slytherin', 'Blonde'],
    ['22', 'Cho', 'Ravenclaw', 'Black'],
    ['28', 'Ron', 'Gryffindor', 'Red'],
    ['47', 'Hermione', 'Gryffindor', 'Brown']]

with open('../../snippets/csv_example2.csv', 'w', newline='') as f:
    writer = csv.writer(f, delimiter='|')
    for d in data:
        writer.writerow(d)
        
with open('../../snippets/csv_example2.csv', 'r') as f:
    for line in f:
        print(line, end='')

## How useful is this, really?

Why use the `csv` module, when `split()` and `join()` calls are so easy? It handles a lot of edge cases - for example, escaped delimiters within fields.

In [None]:
import csv
data = [
    ['11', 'Harry', 'Gryffindor', 'Brown,mostly'],
    ['18', 'Draco', 'Slytherin', 'Blonde,mostly'],
    ['22', 'Cho', 'Ravenclaw', 'Black,really'],
    ['28', 'Ron', 'Gryffindor', 'Red,very'],
    ['47', 'Hermione', 'Gryffindor', 'Brown']]

with open('../../snippets/csv_example2.csv', 'w', newline='') as f:
    writer = csv.writer(f, delimiter=',')
    for d in data:
        writer.writerow(d)
        
with open('../../snippets/csv_example2.csv', 'r') as f:
    for line in f:
        print(line, end='')