# Reading and Writing Files

### Before the start of this video: Download the file palmer_penguins.csv and put it in the same folder as this notebook

In this lecture, we'll discuss some basic methods for reading and writing files. Our primary focus today will be on text. We'll also briefly discuss reading *delimited files*, such as `.csv`s, but we'll spend more time on these when we work with rectangular data using the `pandas` module. 

## Writing Data

Whenever we interact with a file -- including creating, writing, and reading -- we will use the `open()` function. We need to specify both the path to the file and the *mode* of interaction. For example, `"w"` specifies that we want to *write* to the file.  

If we look, we can see that a file called 'my_file.txt' has been created. Addidttionally the variable f has been assigned

<_io.TextIOWrapper name='my_file.txt' mode='w' encoding='cp1252'>

The text won't actually appear in the file until close the connection

Now that the connection is closed, we can't write any more

You should **always** close files after you have opened them. However, there is a nice syntactical way to avoid having to remember to do this, using the `with` keyword: 

You can think of the `with` keyword as temporarily assigning the name `f` within a local scope. Once the local scope terminates, the variables created within that scope are discarded. This process has the convenient effect of automatically closing `f` for us. You should usually use `with` rather than `open()` and `f.close()` unless there are concrete requirements otherwise. 

A very common pattern is to loop over data and write it line-by-line to a file. Here's an example: 

The use of the `"w"` argument to `open()` will erase an existing file `movies.txt` if one exists. To append to the file, use mode `"a"` instead:  

## Reading Data

Reading data is not difficult, provided that you can remember iterators. The `f.readline()` method returns the current line of the file and then advances to the next line, very similar to the `__next__()` method of iterators. 

In [2]:
#Read two line


In [1]:
#Read whole file


## CSV Files

`CSV` standands for "comma-separated values," and it is a common format for representing tabular data. A raw `CSV` file might look like this: 

```csv

"Boston", "Massachusetts", "The Bay State" 
"Lansing", "Michigan", "The Great Lakes State"
"Sacremento", "California", "The Golden State"
.
.
.
```

Python offers a `csv` module that offers some utilities for reading and writing `CSV` files. For example, let's take a quick peek at the [Palmer penguins] data set, which were collected by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/). I downloaded these data from [Kaggle](https://www.kaggle.com/), an excellent source of interesting data sets. If you haven't already, please make sure that the file `palmer_penguins.csv` is in the same directory as this notebook. 

<figure class="image" style="width:50%">
  <img src="https://allisonhorst.github.io/palmerpenguins/man/figures/lter_penguins.png" alt="Three stylized penguins, one each of the species Adelie, Gentoo, and Chinstrap, with labels above their heads and patches of color behind them.">
  <figcaption><i>Illustrations of the penguins in the Palmer Penguins data set, by Allison Horst.</i></figcaption>
</figure>

In order to read the file we need to import csv

__You have my permission to stop typing here__ 

The file is long, so let's look at the first 10 rows and the first five entries of each row

In [4]:

with open ("palmer_penguins.csv", "r") as f:
    reader=csv.reader(f)
    for row in list(reader)[0:10]:
        print(row[0:5])

We can also write new CSV files. For example, let's make a subset file that only includes Adelie Penguins. 

In [33]:
with open("palmer_penguins.csv", "r") as original: 
    with open("palmer_penguins_subset.csv", "w") as subset: 
        reader = csv.reader(original)
        writer = csv.writer(subset)
        
        for row in reader: 
            if row[2] == "Adelie Penguin (Pygoscelis adeliae)":
                writer.writerow(row)

You can check the file `palmer_penguins_subset.csv` and observe that all rows in the new data are Adelie penguins. 

While it is possible to use the `csv` module for reading and manipulating tabular data, we will see much more flexible and powerful methods when we introduce the *data frames* paradigm via the `pandas` module. 