# Reading and Writing Files

In this lecture, we'll discuss some basic methods for reading and writing files. Our primary focus today will be on text. We'll also briefly discuss reading *delimited files*, such as `.csv`s, but we'll spend more time on these when we work with rectangular data using the `pandas` module. 

## Writing Data

Whenever we interact with a file -- including creating, writing, and reading -- we will use the `open()` function. We need to specify both the path to the file and the *mode* of interaction. For example, `"w"` specifies that we want to *write* to the file.  

In [1]:
f = open("my_file.txt", "w")

In the file explorer, we can see that `my_file.txt` has been created! Additionally, a variable `f` has been assigned. 

In [2]:
f

<_io.TextIOWrapper name='my_file.txt' mode='w' encoding='UTF-8'>

Let's write some text to `f`. The return value of `write()` is the number of characters in the string. 

In [9]:
f.write("To boldly go\n") # \n starts a newline

13

Even after calling `f.write()`, we won't be able to inspect the new text we've written until we *close* the connection: 

In [10]:
f.close()

Once we've closed the connection, we can't write any more text to `f`: 

In [11]:
f.write("To boldly go\n") # \n starts a newline
# ---

ValueError: I/O operation on closed file.

You should **always** close files after you have opened them. However, there is a nice syntactical way to avoid having to remember to do this, using the `with` keyword: 

In [12]:
with open("my_file2.txt", "w") as f:
    f.write("to boldly go\n")

You can think of the `with` keyword as temporarily assigning the name `f` within a local scope. Once the local scope terminates, the variables created within that scope are discarded. This process has the convenient effect of automatically closing `f` for us. You should usually use `with` rather than `open()` and `f.close()` unless there are concrete requirements otherwise. 

A very common pattern is to loop over data and write it line-by-line to a file. Here's an example: 

In [13]:
D = {"Chihiro" : "Spirited Away",
     "Ashitaka" : "Princess Mononoke",
     "Sophie" : "Howl's Moving Castle"}

with open("movies.txt", "w") as f:
    for key, val in D.items():
        f.write(key + " is the protagonist of " + val + ".\n")

The use of the `"w"` argument to `open()` will erase an existing file `movies.txt` if one exists. To append to the file, use mode `"a"` instead:  

In [14]:
with open("movies.txt", "a") as f:
    f.write("Satsuki and Mei are the protagonists of My Neighbor Totoro")


## Reading Data

Reading data is not difficult, provided that you can remember iterators. The `f.readline()` method returns the current line of the file and then advances to the next line, very similar to the `__next__()` method of iterators. 

In [16]:
with open("movies.txt", "r") as f:
    print(f.readline(), end = "")
    print(f.readline())

Chihiro is the protagonist of Spirited Away.
Ashitaka is the protagonist of Princess Mononoke.



This means that you can conveniently iterate using a `for`-loop. 

In [17]:
with open("movies.txt", "r") as f:
    for line in f:
        print(line, end = "")

Chihiro is the protagonist of Spirited Away.
Ashitaka is the protagonist of Princess Mononoke.
Sophie is the protagonist of Howl's Moving Castle.
Satsuki and Mei are the protagonists of My Neighbor Totoro

## CSV Files

`CSV` standands for "comma-separated values," and it is a common format for representing tabular data. A raw `CSV` file might look like this: 

```csv

"Picard", "TNG", "Enterprise D" 
"Kirk", "TOS", "Enterprise A"
"Janeway", "VOY", "Voyager"
.
.
.
```

Python offers a `csv` module that offers some utilities for reading and writing `CSV` files. For example, let's take a quick peek at the [Palmer penguins] data set, which were collected by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/). I downloaded these data from [Kaggle](https://www.kaggle.com/), an excellent source of interesting data sets. The file `palmer_penguins.csv` can be downloaded from [this link](https://philchodrow.github.io/PIC16A/content/IO_and_modules/IO/palmer_penguins.csv). It is necessary to place the file `palmer_penguins.csv` in the same directory as this notebook. 

<figure class="image" style="width:50%">
  <img src="https://allisonhorst.github.io/palmerpenguins/man/figures/lter_penguins.png" alt="Three stylized penguins, one each of the species Adelie, Gentoo, and Chinstrap, with labels above their heads and patches of color behind them.">
  <figcaption><i>Illustrations of the penguins in the Palmer Penguins data set, by Allison Horst.</i></figcaption>
</figure>

In [18]:
import csv
with open("palmer_penguins.csv", "r") as f:
    reader = csv.reader(f)
    for row in list(reader)[0:10]:
        print(row[0:5])

['studyName', 'Sample Number', 'Species', 'Region', 'Island']
['PAL0708', '1', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '2', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '3', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '4', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '5', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '6', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '7', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '8', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']
['PAL0708', '9', 'Adelie Penguin (Pygoscelis adeliae)', 'Anvers', 'Torgersen']


We can also write new CSV files. For example, let's make a subset file that only includes Adelie Penguins. 

In [19]:
with open("palmer_penguins.csv", "r") as original: 
    with open("palmer_penguins_subset.csv", "w") as subset: 
        reader = csv.reader(original)
        writer = csv.writer(subset)
        
        for row in reader: 
            if row[2] == "Adelie Penguin (Pygoscelis adeliae)":
                writer.writerow(row)





You can check the file `palmer_penguins_subset.csv` and observe that all rows in the new data are Adelie penguins. 

While it is possible to use the `csv` module for reading and manipulating tabular data, we will see much more flexible and powerful methods when we introduce the *data frames* paradigm via the `pandas` module. 