## Files
---
Everything we've done so far has been completely self-contained in the notebook, and every time we run any of the cells we will get exactly the same output. The power of programming is to be able to take the same piece of code and apply it to different data to get different results. One common way in which this is done is writing a script which can analyse a data file. To do that we need to learn how to open files.

The simplest thing we can do with files is read a file in and print it to the screen:

In [9]:
with open("file.py") as f:
    for line in f:
        print(line, end="")

#this is a file with a single instruction and a comment
print("hello")

which is the contents of the file `file.py`.

There are a few new things here so let's go through them in turn. The first thing is to open the file. You open files using the `open` function. The part open(`"file.py"`) says to open the file `file.py`. This returns a *file handle* which is assigned to the variable `f`. If the file does not exist, or is not readable then the script will exit with an error (have a try and see what the error looks like!). The use of a `with` statement means that when the code inside the with block has finished running the file will be closed automatically.

In the next line (`for line in f:`) we are looping over the lines of the file. This loop looks just like those we used when looping over lists a few chapters previously. When looping over a list you get each of the *elements* in turn but when looping over an open file you get each of the *lines* in turn. We assign the string containing the line from the file to the variable `line`.

Finally, we print the string `line`. Each line in the file already ends with a "new-line" character so when it is printed, it will print the new-line too. By default the `print` function will also add its own new-line so we disable that by using `end=""`.

### Exercise: reading numbers
- Printing out Python code isn't the most useful so let's make a data file to read instead. Make a new file called `data.txt` and put inside it:
`12`

`54`

`7`

`332`

`54`

`1`

`0`

- print out the content of `data.txt


In [10]:
with open("data.txt") as f:
    for line in f:
        print(line)

12

54

7

332

54

1

0


As you can see, there is an empty line after each number we read from the file. Why is this happening?
Let's store at the content of the opened file into a variable. We will use the method `.read()`

In [21]:
with open("data.txt") as f:
    content = f.read()
content

'12\n54\n7\n332\n54\n1\n0'

Since it's number is written in a separate line, there is a "newline" character `\n` after each number.

Python's `print()` function by default also ends with a `newline` character: `\n`. So, effectively, we are printing two consecutive "newline" characters.

By explicitly setting the end parameter into print to empty string `""`, we can avoid an empty line printed after each line in the file:

In [11]:
with open("data.txt") as f:
    for line in f:
        print(line, end="")

12
54
7
332
54
1
0

Python's `print()` function by default also ends with a `newline` character: `\n`. So, effectively, we are printing two consecutive "newline" characters.

By explicitly setting the end parameter into print to empty string `""`, we can avoid an empty line printed after each line in the file:

In [3]:
with open("data.txt") as f:
    for line in f:
        print(line, end="")

12
54
7
332
54
1
0

### Data type conversion
---

Simply reading the data and printing it isn't very useful. Let's take a first step towards some data analysis and pretend that the task we're trying to do is to read in data from the file and add 17 to each value.

In [5]:
with open("data.txt") as f:
    for line in f:
        new_number = line + 17  # Here is where we do our "data analysis"
        print(new_number, end="")

TypeError: can only concatenate str (not "int") to str

We have got and error. 
This is telling us that there is an error occuring when trying to add 17 to the data read in from the line in the file. The *type* of the error is `TypeError` which tells us the problem is likely due to incorrect data types (i.e. string, float, int, list etc.). The error message says `can only concatenate str (not "int") to str` which implies that the computer believes that we're trying to concatenate (join together) something with a string. The only two things involved in this operation are `line` and `17`. We know that `17` is an integer so `line` must be a string!

When reading from a file like this, everything it gives you will *always* be a string, even if the string only contains digits like `"12"`. If we know that the file only contains integers then we can convert each number as it comes in using the `int` function. Also, since we're now printing integers, we no longer need the `end=""` tweak:

In [6]:
with open("data.txt") as f:
    for line in f:
        number = int(line)  # Here we do the type conversion
        new_number = number + 17  # Here is where we do our "data analysis"
        print(new_number)

29
71
24
349
71
18
17


### Exercise

- Change the instrution to multiply the data by 10 instead of adding 17.

In [14]:
with open("data.txt") as f:
    for line in f:
        number = int(line)  # Here we do the type conversion
        new_number = number * 10  # Here is where we do our "data analysis"
        print(new_number)

120
540
70
3320
540
10
0


- After looping though the data, print out the sum of all the data values seen.
    - hint: Make an integer before the loop, initially set to zero and add to it each time around the loop using `+=`:
        ```python
        num = 3
        num += 4
        print(num)  # `num` will now be 7
        ```

    - Note: This is the same as writing:
        ```python
        num = 3
        num = num + 4
        print(num)  # `num` will now be 7
        ```

In [4]:
total = 0

with open("data.txt") as f:
    for line in f:
        number = int(line)
        total += number
print("Sum of all values is:", total)

Sum of all values is: 460


- Print out the count of the number of data points seen as well. 

In [7]:
total = 0
count = 0

with open("data.txt") as f:
    for line in f:
        number = int(line)
        total += number
        count += 1

print("Sum of all", count, "values is:", total)

Sum of all 7 values is: 460


- Print out the mean average of the data in the file.

In [10]:
total = 0
count = 0

with open("data.txt") as f:
    for line in f:
        number = int(line)
        total += number
        count += 1

mean = total / count

print("Sum of all", count, "values is:", total)
print("The mean is", mean)

Sum of all 7 values is: 460
The mean is 65.71428571428571


See what happens if you run the script after deleting the contents of `data.txt`.  Add an `if` statement to fix it.

#### if you have a Linux or mac computer, please run these commands to delete and recreate the (empty) file
``` 
!rm -f data.txt
!touch data.txt
```
(copy the commands and paste them into a new cell, then run it)

#### if you have a windows computer, please run these commands to delete and recreate the (empty) file
```
!del data.txt
!copy NUL data.txt
```
(copy the commands and paste them into a new cell, then run it)

In [15]:
total = 0
count = 0

with open("data.txt") as f:
    for line in f:
        number = int(line)
        total += number
        count += 1

print("Sum of all", count, "values is:", total)

if count > 0:
    mean = total / count
    print("The mean is", mean)

Sum of all 0 values is: 0


Collect the statistics into a summary dictionary with keys `"sum"`, `"count"` and `"mean"`.

In [16]:
stats = {"sum": 0, "count": 0}

with open("data.txt") as f:
    for line in f:
        number = int(line)
        stats["sum"] += number
        stats["count"] += 1

if stats["count"] > 0:
    stats["mean"] = stats["sum"] / stats["count"]

print(stats)

{'sum': 0, 'count': 0}


### Writing to a file

Let's now create a file and write a string to it using the argument `file` of the function `print()`

In [18]:
with open("text.txt", "w") as f: 
    print("#this is a comment", file=f)
    print("This is text!", file=f)

Does't seem like it did anything, but actually it created a `text.txt` file in out working directory. 
We can open the file in the editor, or read it as we have just learnt:

In [21]:
with open("text.txt", "r") as f:
    for line in f:
        print(line)

#this is a comment

This is text!



Trying to open a non-existent file with `w` or `a` creates the file for
us, but doing that with `r` gives us an error instead. We'll learn more
about errors later on.


In [4]:

with open('this-doesnt-exist.txt', 'r') as f:
    print("It's working!")


FileNotFoundError: [Errno 2] No such file or directory: 'this-doesnt-exist.txt'