<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Input and output](#Input-and-output)
	* [The `input` function](#The-input-function)
	* [The `print` function](#The-print-function)
	* [`stdin`, `stdout`, and `stderr`](#stdin,-stdout,-and-stderr)
	* [Basic file I/O](#Basic-file-I/O)
	* [Delimited data files](#Delimited-data-files)
* [Reading from a URL](#Reading-from-a-URL)
* [Pickling objects](#Pickling-objects)

# Learning Objectives:

After completion of this module, learners should be able to:

* use & explain common Python idioms for working with text files
* retrieve information from websites
* pickle and load Python objects

## Basic file I/O

The generic way to work with files is by *opening* them (either for reading or writing). The important Python keyword for opening files is `open`. We'll create a string to write to a file for illustrative purposes.

In [None]:
# Let's first create a string to write to a file
lumberjack = """He's a lumberjack and he's OK. He sleeps all night and he work all day.
I cut down trees, I eat my lunch,
I go to the lava-try.
On Wednesdays I go shoppin'
And have buttered scones for tea. """

In [None]:
# Now, let's write a file to disk


* The value returned by the function `open` is a *stream object* that we'll usually refer to as an *open file* or *file handle*.
* The call to the function `open()` accepts a string with a path to a filename as a first argument. The path can contain  forward slash characters (i.e., `/`) to separate directories and files, as long as the path is valid.  Backslash characters (i.e. `\`) are permitted on Windows systems also.
* The option keyword argument `mode='w'` means *writeable*. There are other alternatives.  

|Character|Meaning|
|:-:|:-|
|`r`| open for reading (default) |
|`w`| open for writing, truncating the file first |
|`x`| create a new file and open it for writing |
|`a`| open for writing, appending to the end of the file if it exists|
|`b`| binary mode |
|`t`| text mode (default)|
|`+`| open a disk file for updating (reading and writing)  

* The invocation `outfile.write(long_string)` writes the text string as is (including line breaks) to disk.
* More invocations of the form `outfile.write(`*`string`*`)` would append more strings after the text currently in the file.
* The method `outfile.writelines(sequence_of_strings)` will write multiple strings at once.
* The invocation `outfile.close()` closes the file that was previously opened for writing.
* Stream objects have several useful methods and attributes we can investigate using `help`.

It is generally a good idea to close an open file (stream object) when it is no longer needed using the `close()` method. The Python standard does not guarantee that open files will be closed upon exit from the program. The CPython implementation does, in practice, close any unclosed files, but that is not guaranteed in all Python implementations (e.g., in IronPython or in Jython).  Of greater concern, however, is that a file might remain open throughout a program run, and attempts to read or write from it later might not behave as expected. It is safest to match any `file=open(...)` invocation with a matching `file.close()` invocation.

Actually, there is an even safer idiom for file-handling that uses the idiom

```python
with open(filename) as file:
```

to enclose a block that uses `file`. At the end of the `with` block, the file is closed automatically. It is good practice to use the `with` statement when opening file objects because it is guaranteed to close the file *no matter how the nested block exits*. Even if an exception occurs before the end of the block, the file will be closed prior to passing the exception up to an outer exception handler (the `with` block is also shorter than the corresponding `try-except-finally` block). The `with` statement also closes the file even if the nested block contains `return`, `continue`, or `break` statements. For other use cases, [The Python "with" Statement by Example](http://preshing.com/20110920/the-python-with-statement-by-example/) contains an interesting discussion of using `with` as a context manager.

* The `readline` method reads lines from a file one at a time. When there are no more lines to read, the `readline` method returns an empty string.
* The `readlines` method also reads the file line by line but it reads the whole file at once. Notice that the linefeed characters are preserved in each element of the list.

* At the end of the `with` block, the file is closed without having to execute `infile.close()`.
* The stream method `read()` reads data from the stream object `infile` into a string `lines`.
* By default, the data from the stream is treated as plain text, but this can be overridden.
* Attempting to read a non-existent file produces an error.
* Attempting to write/append to a non-existent file creates an empty file.
* Attempting to read or write to a stream that has been close produces an error.

In [None]:
# Opening a non-existent file for reading: FileNotFoundError
with open('no-such-file') as infile: # Default: mode='r'
    pass

In [None]:
# By contrast, opening a non-existent file for write/append mode *creates* file
with open('./tmp/make-the-file', 'a') as newfile:
    pass
%ls -l tmp/make-the-file

In [None]:
# once closed, we cannot write more to outfile.
newfile.write("More stuff") # raises a ValueError

A more Pythonic way to read data from a text file is to read it line-by-line using a `for` loop. Just as with various data collections, a stream is an *iterable* in Python; hence, it can be looped over. This is probably a wiser choice when dealing with arbitrarily large files that can in principle fill the available memory.

In [None]:
# It is often more idiomatic to read by lines in a loop
with open('data/lumberjack.txt') as infile:
    for line in infile:
        print(line.upper(), end="")

We can also use comprehensions to open, read and manipulate the data.

Using a comprehension we can open the file and find all of the 's' words.

### Reading by chunk

In [None]:
# We can read a file in fixed-size chunks (not just line by line)
from random import random

CHUNK_SIZE = 4    # Specify block-size in bytes

with open('data/lumberjack.txt') as infile:
    while True:
        chunk = infile.read(CHUNK_SIZE)
        if not chunk:
            break
            
        # randomly choose to make upper case every 4 bytes    
        if random() < 0.5:
            print(chunk.upper(), end='')
        else:
            print(chunk.lower(), end='')

## Delimited data files

Very often, text files represent tabular data using a *delimiter* to separate columns. We illustrate a few examples of writing and reading CSV (*comma-separated-values*) files or similarly TSV (*tab-separated-values*) or other delimiters.

Let's make a multiplication table.

`print()` accepts the `file=` keyword argument to print directly to a file.

Let's do this without generating all of the data in memory.

In [None]:
# Here is a "hand-crafted" way to write a TSV file of numbers

nrows = 10
ncols = 4 
filename = './tmp/table.tsv'

with open(filename,'w') as table:
    


Read in the table.
* read in one line at a time
* split the values
* what data type did Python assume?

Read the table into a list-of-lists

Of course, rather than developing code for reading and writing to CSV/TSV files, we can use a module from the Python Standard Library (namely `csv`). When confronted with a new data analysis problem, *always* check to see if there is a library. There is a good chance someone else has had a similar problem to solve and has developed a module that will solve the problem for you.

In [None]:
# Here, we open a CSV file and create a custom stream using csv.reader.
# The csv.reader stream can be iterated over in a for loop that extracts rows
# from the CSV file and separates entries into Python lists of strings. We 
# could do this with str.split(','), but the code has been written for us.
import csv
with open('./data/AAPL.csv') as csvfile:
    stockreader = csv.reader(csvfile)
    for row in stockreader:
        print(row)

# Reading from a URL

The Python Standard library includes a module called `urllib`. The `urlopen` function is capable of reading many different protocols including `file:`, `ftp:` and `imap:`.

In [None]:
url='http://www.wunderground.com/history/airport/KAGC/2016/1/6/DailyHistory.html?req_city=Pittsburgh&req_state=PA&req_statename=Pennsylvania&reqdb.zip=15206&reqdb.magic=1&reqdb.wmo=99999&format=1'

`urlopen` creates a connection to the website much like `open` created a connection to a file. We can now call `.read` inside the `with` context manager.

In [None]:
# Python 3 (Note: Python 2 urlopen does not use the with statement as below)
with url_conn as u:
    contents = u.read()

In [None]:
print(contents)

In [None]:
print(contents.decode('utf-8'))

In many cases there can be quite a lot of additional text processing required before you can work with the data. There are many modules available in Python. For example, [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) is great for parsing HTML content.

Let's do some of it by hand:
* strip trailing newline character
* replace `<br />` with empty strings
* split along `,` per line
* remove empty lines

Compute the average temperature

# Pickling objects

A `pickle` is a binary dump of a Python object to a file. Here we are opening a new file called `weather.pkl` and declaring it to be binary with `wb`.

In [None]:
import pickle
with open('./tmp/weather.pkl', 'wb') as out_file:
    pickle.dump(weather,out_file)
%ls -l tmp/weather.pkl

Using the same context manager idiom we can load the pickle directly into a new object. Reading foreign pickle files can be dangerous as malicious code stored in the pickle could be run on `load`. It is best to keep pickle files for local use only.

In [None]:
with open('./tmp/weather.pkl','rb') as in_file:
    new_weather = pickle.load(in_file)   

In [None]:
for line in new_weather:
    print(line[2])

<img src='img/copyright.png'>