# Introduction to Python

## Lecture 8: Files, Exceptions

We have already discussed how to print to the terminal and read in user input. 
Another type of input/output is reading and writing from files. 

Programs are usually deterministic -- there are no unexpected events (except maybe because you created a bug in your code). 

Once a program depends on input, that is no longer guaranteed because input might be malformed or otherwise unexpected. It is also possible that files do not exist or cannot be accessed for other reasons. In those situations, Python programs raise **Exceptions**.

A **file** is a collection of data that resides permanently on disk. It is independent of the program you're running. 

We often need to read and write data from files.

A file has a **name** and a location in your **file system**. The location is called the **path** of the file. 

### File names
On Mac and Unix systems, path names start with "/". This top-level contains a number of directory, such as 

/Users
/bin
/dev
/etc
/usr
/var
/tmp
...

These directories may live on the same physical disk, or they may be distributed over many disks or even live on a remote server. 

/Users/daniel/test.txt  refers to a text file test.txt in the directory /Users/daniel, which is the home directory for the user daniel.


In windows, files are organized according to the disk partition they live on. 

C:\ is the name of the main partion that contains system files. Windows filenames use \ to separate components of a path instead of /.

### File extensions
Most files have an extension (for example .txt) indicating the type and purpose of the file. Some operating systems hide file extensions, even though the extension is really part of the file. To open a file in Python you need to specify its full name. 

To force your operating system to display file extensions follow these steps: 

 * OSX/macOS: https://support.apple.com/kb/PH25381
 * Windows: http://kb.winzip.com/kb/entry/26/

### Text vs. Binary Files

**Text files** are human readable. These files are encoded like Python string (often Unicode / UTF-8). You can view the content of a text file in any text editor. Python programs (.py) are text files. 

**Binary files** use some special coding scheme, for example multimedia files (.jpg, .png, .mp3, .pdf, ...) but also text with formatting (.docx) and compressed data (.zip, .tgz). If you open these files in a text editor, you will only see gibberish. 

We will mostly be working with text file. Special modules exist to work with many binary files, so you rarely have to write your own code to read or write these files (unless you invent your own binary format). 

### Reading from files

To read from a file you first need to open it using the `open` function. 

`open` returns a **file object** (aka a *pipe*) that allows you to read (and/or write) from this specific file.

In [10]:
test_file = open("test.txt",'r')  # "r" means "open for reading" 

In [11]:
test_file

<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>

The `readline` method reads a single line, including the \n at the end.

In [12]:
line = test_file.readline()

In [13]:
line

'Hello World\n'

In [14]:
line.strip()

'Hello World'

In [15]:
test_file.readline()

''

In [20]:
test_file.readline() #Once end of file is reached readline returns empty string

''

In [21]:
test_file.seek(0)

0

In [22]:
test_file.readline()

'Hello World\n'

`readline` returns an empty string if there are no more lines in the file.

In [None]:
test_file.readline() 

`close` closes the file object. Because only one program should be allowed to read or write from a file (why?), you should always close file objects when they are no longer needed. 

In [23]:
test_file.close()

In [24]:
test_file.readline()

ValueError: I/O operation on closed file.

You can also use a for loop to iterate through the lines of a file. 

In [28]:
test_file = open("test.txt",'r')

for line in test_file: 
    print(line.strip()) # .strip() removes whitespaces, including the \n
    
test_file.close()    

Hello World

line 2

line 3


In [27]:
test_file = open("test.txt",'r')
line = test_file.readline()
while line!='':
    print(line.strip())
    line = test_file.readline()


Hello World
line 2
line 3


One important features of file objects is that they are **buffered**. When you open a file, the operating system copies part of the file (or all of it) into main memory. Memory access is a lot faster than disk access.

### Writing to files

To write to a file, we open it with the "w" option.

In [33]:
test_file2 = open("test2.txt",'w') # "w" means "open for writing"

**Warning**: opening a file this way creates a NEW file. If the file already exists, the old file is overwritten and its contents are lost. Python will NOT ask you for confirmation.

We can now direct the output of `print` statements to this new file. 

In [34]:
print("lorem ipsum", file=test_file2)

In [35]:
print("dolor sit amet", file=test_file2)

In [36]:
test_file2.close()

The reason why print works with files just like it works to print something on the console is 
that *the console is also a file object* from the point of view of Python. 

In [37]:
import sys
sys.stdout

<ipykernel.iostream.OutStream at 0x10b122208>

In [38]:
print("lorem ipsum", file=sys.stdout)

lorem ipsum


You can also use the `write` method to write to a file. 

In [45]:
test_file2 = open("test2.txt",'a')
test_file2.write("consectetur adipiscing elit\n") # returns the number of characters written
test_file2.write("some other line\n") # returns the number of characters written


16

In [46]:
test_file2.close()

### Example 1: A word puzzle

Goal: Find all English words which contain exactly the vowels A E I O U in that order. Ignore consonants. 


### Example 2: Analyzing temperature data
source: https://www.ncdc.noaa.gov/cag/time-series/us/110/00/tavg/all/01/1895-2017.csv

# Exceptions

We have already seen a few exceptions. For example: 

In [None]:
while = 27

SyntaxErrors are examples of **compile time** exceptions. With these errors in place, your program will not even run.

Other types of exceptions occur while your program is running. These are **runtime** exceptions.

In [None]:
print(lkwjer)

In [None]:
27+"hello"

In [None]:
x = "27.5"
int(x)

A careful programmer should be able to avoid these problems in their code.

Unfortunately,  when a program reads input from the user or from a file, exceptions may be out of our control.

In [None]:
f = open("kjhwerkj.txt",'r')

Rather than stopping the program, it might be better *anticipate* exceptions and *handle* them gracefully.

The **try-except** construct allows you to capture and handle runtime errors.

In [None]:
try:
    f = open("somefile.txt",'r')
    print("Opened file correctly.")
except FileNotFoundError: 
    print("Sorry, file did not exist.")

In [None]:
okay = False
while not okay: 
    try: 
        x = input("please type an integer number:")
        x_int = int(x)
        f = open("weljhwer.txt",'r')
        okay = True
    except ValueError:
        print("{} is not an integer. Try again.".format(x)) 
    except FileNotFoundError: 
        print("File not found.")