# Files

Reading data from a file is an essential in any programmer's toolbox. 

We will focus on ASCII ("American Standard Code for Information Interchange"), or "plain text", files in this lesson. File suffixes for text files are typically ".txt". (By the way, Python scripts are actually plain text files, but use ".py" for easy identification.)

## Previewing files

First, let's have a look at a simple file. It has the following contents in it:

## Reading in files

### Syntax

The basic syntax for reading in a file is as follows:

```
with open(<path_to_file>, "r") as <variable_name>:
    <code>
```

Note the following:

1. The `with` keyword is used to only open the file while the indented code is run (and then the file is closed)
2. We use the `open` function to open the file
3. We pass the path to the file along with `"r"`, which tells Python that we want to open the file in "read mode"
4. We can store a reference to the file as `<variable_name>`, which we can use in the indented code

Let's open `data_1col.txt` and print each line:

In [None]:
with open("data/data_1col.txt", "r") as file:
    for line in file:  # The elements of "file" are the lines within the file by default
        print(line)

Rather than just printing, let's try storing each line as an element in a list:

In [None]:
data1_list = []

with open("data/data_1col.txt", "r") as file:
    for line in file:
        data1_list.append(line)

By the way, if we just want to get a list where each element is each line in the file, we can accomplish the same thing with the `readlines` method:

In [None]:
with open("data/data_1col.txt", "r") as file:
    data1_list = file.readlines()

In [None]:
print(data1_list)

Notice that the `\n` characters (which correspond to returns to the next line in the file) are included.

But what if we just look at one element at a time?

In [None]:
print(data1_list[0])

Notice that Python prints the number AND the blank line (from the `\n`).

Ok, great! We have learned to extract values from a data file. Can we now do mathematical operations with this number?

In [None]:
print(data1_list[0] / 2)

In [None]:
print(type(data1_list[0]))

Ok, so we need to convert this `string` to a number, like a `float`.

In [None]:
print(float(data1_list[0]))

And now let's try doing that same mathematical operation:

In [None]:
print(float(data1_list[0]) / 2)

Ok, let's try something a little more involved: How can we calculate the mean and standard deviation of the values in the file?

First, we need to convert all of the strings into floats:

In [None]:
data1_float_list = []
for element in data1_list:
    data1_float_list.append(float(element))

Let's use the `numpy` module's functionality to get the mean and standard deviation of the values:

In [None]:
import numpy as np

print(data1_float_list)
print()
print(np.mean(data1_float_list))
print()
print(np.std(data1_float_list))

## Handling multiple columns and headers

What if the file we're reading has multiple columns and a header? Let's look at the contents of another file:

And let's read it in:

In [None]:
with open("data/data_2col.txt", "r") as file:
    lines = file.readlines()

for line in lines:
    print(line)

Headers are typically the first line or lines of the file, and tell you what each column in the file represents. They are sometimes denoted with a `#` symbol so you can distinguish them from the data.

So we can imagine detecting header lines by checking if the first character is `#`.

## Processing lines of data within a file

How could we perform numerical operations on the numbers in this file? What steps would we have to take? 

We need to isolate the numbers and convert them to floats, like we saw before. Is there a simple way to do this?

### The `split` method

One simple technique is to use the `split` method to split apart each string using whitespace as a delimeter:

In [None]:
splitLines = []

for line in lines:
    print(line.split())  # Printing the result of the split function on each line.
                         # Note the absence of an argument.
    splitLines.append(line.split())  # Adding each split line to a list for later use.

Note that each line (a string) has been turned into a list of strings, where the whitespace has been removed. So `split` is a method that acts on a string to create a list of string values. By default, `split` splits a string on spaces, but other delimiters can be used instead (e.g., `line.split(",")` if the file contains values separated by commas). Look up the documentation for `split` if you want to learn more!

Let's look at that list we created during our loop above.

In [None]:
print(splitLines)

In [None]:
print(splitLines[1])
print(splitLines[1][1])

## Writing data to a file

The syntax for writing data is identical to reading data, other than using `"w"` (for write) instead of `"r"` (read) for the second argument in the `open` function:

In [None]:
dataToWrite = "# header line\nline 1\nline 2"

with open("data/new_data.txt", "w") as file:
    file.write(dataToWrite)

Note that we use the `write` method to write the data, and that data must be a string with each line separated by a `"\n"`. So, if we want to save a list of data to a file, we would have to convert the data to this specific type of string first.

### Preparing data for writing

We can use the `join` method to convert a list of strings to a single string to be saved to a file:

In [None]:
dataAsList = ["# header", "line 1 data", "line 2 data"]

dataAsString = "\n".join(dataAsList)
print(dataAsString)

Let's break down how this works:

1. The `join` method is called on a string that we define (`"\n"` in this case)
2. The string that we define is the separator that will be placed between each of the string elements in the list we pass to the `join` method
3. The list with the string elements we want joined into a string is passed to the `join` method.

So, if our data is a list of lists, and we want each column of the data separated with a space, we can create the data string as follows:

In [None]:
dataAsList = [["#", "value1", "value2", "value3"], [5, 7, 12], [30, 15, -4]]

rows = []
for row in dataAsList:
    stringRowList = [str(element) for element in row]  # Converting each list element to a string
    stringRow = " ".join(stringRowList)  # Turning the list of strings into a space-separated string
    rows.append(stringRow)  # Appending each space-separated string to rows

dataAsString = "\n".join(rows)  # Turning the list of space-separated strings to a string separated by \n characters
print(dataAsString)

## Appending data to a file

The syntax for writing data is identical to writing data, other than using `"a"` (for append) instead of `"w"` (write) for the second argument in the `open` function. This will append lines to the _end_ of a file, rather than totally rewriting the file (erasing the data that already exists in the file and only writing the new data).