## ENGI E1006: Introduction to Computing for Engineers and Applied Scientists
---


To this point, we have primarily relied on `input` and `print` to get information in and out of our programs. More commonly, we will rely on files to get data in and out. 

A **file** is a collection of data that resides permanently on disk. It is independent of the program you're running. 

We often need to read and write data from files.

A file has a **name** and a location in your **file system**. The location is called the **path** of the file. 

### File names
On Mac and Unix systems, path names start with "/". This top-level contains a number of directory, such as 

/Users
/bin
/dev
/etc
/usr
/var
/tmp
...

These directories may live on the same physical disk, or they may be distributed over many disks or even live on a remote server. 

/Users/tim/test.txt  refers to a text file test.txt in the directory /Users/tim, which is the home directory for the user tim.


In windows, files are organized according to the disk partition they live on. 

C:\ is the name of the main partion that contains system files. Windows filenames use \ to separate components of a path instead of /.

### File extensions
Most files have an extension (for example .txt) indicating the type and purpose of the file. Some operating systems hide file extensions, even though the extension is really part of the file. To open a file in Python you need to specify its full name. 

To force your operating system to display file extensions follow these steps: 

 * OSX/macOS: https://support.apple.com/kb/PH25381
 * Windows: http://kb.winzip.com/kb/entry/26/

### Text vs. Binary Files

**Text files** are human readable. These files are encoded like Python string (often Unicode / UTF-8). You can view the content of a text file in any text editor. Python programs (.py) are text files. 

**Binary files** use some special coding scheme, for example multimedia files (.jpg, .png, .mp3, .pdf, ...) but also text with formatting (.docx) and compressed data (.zip, .tgz). If you open these files in a text editor, you will only see gibberish. 

We will mostly be working with text file. Special modules exist to work with many binary files, so you rarely have to write your own code to read or write these files (unless you invent your own binary format). 

### Reading from files

To read from a file you first need to open it using the `open` function. 

`open` returns a **file object** (aka a *pipe*) that allows you to read (and/or write) from this specific file.

In [None]:
test_file = open("file1.txt", 'r')  # "r" means "open for reading" 

In [None]:
test_file

The `readline` method reads a single line, including the \n at the end.

In [None]:
line = test_file.readline()

In [None]:
line

In [None]:
line.strip()

In [None]:
test_file.readline()

In [None]:
test_file.readline() #Once end of file is reached readline returns empty string

In [None]:
test_file.seek(0)

In [None]:
test_file.readline()

`readline` returns an empty string if there are no more lines in the file.

In [None]:
test_file.readline() 

`close` closes the file object. Because only one program should be allowed to read or write from a file (why?), you should always close file objects when they are no longer needed. 

In [None]:
test_file.close()

In [None]:
test_file.readline()

You can also use a for loop to iterate through the lines of a file. 

In [None]:
test_file = open("test.txt",'r')

for line in test_file: 
    print(line.strip()) # .strip() removes whitespaces, including the \n
    
test_file.close()    

In [None]:
test_file = open("test.txt",'r')
line = test_file.readline()
while line!='':
    print(line.strip())
    line = test_file.readline()


One important features of file objects is that they are **buffered**. When you open a file, the operating system copies part of the file (or all of it) into main memory. Memory access is a lot faster than disk access.

### Writing to files

To write to a file, we open it with the "w" option.

In [None]:
test_file2 = open("test2.txt", 'w') # "w" means "open for writing"

**Warning**: opening a file this way creates a NEW file. If the file already exists, the old file is overwritten and its contents are lost. Python will NOT ask you for confirmation.

We can now direct the output of `print` statements to this new file. 

In [None]:
print("lorem ipsum", file=test_file2)

In [None]:
print("dolor sit amet", file=test_file2)

In [None]:
test_file2.close()

The reason why print works with files just like it works to print something on the console is 
that *the console is also a file object* from the point of view of Python. 

In [None]:
import sys
sys.stdout

In [None]:
print("lorem ipsum", file=sys.stdout)

You can also use the `write` method to write to a file. 

In [None]:
test_file2 = open("test2.txt",'a')
test_file2.write("consectetur adipiscing elit\n") # returns the number of characters written
test_file2.write("some other line\n") # returns the number of characters written


In [None]:
test_file2.close()