# Session 6: input and output

One of the most basic, but important things we need to be able to do is provide *input* to a program or script, and to have the program generate *output*.

There are two distinct types of input and output (I/O):

* *Interactive* I/O<br><br>
    * reading from the keyboard<br><br>
    * writing to the screen<br><br>
* *File-based* I/O<br><br>
    * reading from a file<br><br>
    * writing to a file<br><br>

##### Exercise

Actually, keyboard and screen are not the only types of interactive I/O that are possible. 

Think of at least one other type of input and one other type of output.

Investigate (OK, google) whether/how these types of I/O can be dealt with within *Python*. Are there built-in functions or functions within the modules we've already discussed? Perhaps there are functions within some of the modules pre-loaded in anaconda? Or are the packages/modules you could download somewhere?

## Interactive output

Interactive output to the screen typically happens via the *print* command (which we know and love).

### Simple print

The way we have used the *print* command so far is referred to as "simple printing". 

This means that we simply type "print", followed by the list of objects we want to print, separated by commas.

In [120]:
a = 10
b = 'Attitude Adjuster'
print a
print
print b
print
print a, b
print
print "The number a = " , a
print
print "The number is ", a, " and the string is ", b

10

Attitude Adjuster

10 Attitude Adjuster

The number a =  10

The number is  10  and the string is  Attitude Adjuster


Note that <br>

* *Python* automatically adds a space between every object that is printed.<br><br>

* every *print* command starts printing on a new line<br><br>
    * so a *print* on its own is a way to add vertical space<br><br>
    * we can suppress that behaviour by finishing the *print* statement with a comma<br><br>

In [121]:
print "This is the beginning of a single line...",
print "  ...and this is the end of the same line."

This is the beginning of a single line...   ...and this is the end of the same line.


One additional thing to note here is that the syntax of the *print* command has changed slightly between *Python 2* and *Python 3*. 

The main difference is that, in *Python 3*, *print* is effectively considered as a normal function, which requires parenthesis when called. 

Somewhat weirdly though, if there is more than one argument, each must be provided in its own parentheses (at least if we want to replicate the behaviour of simple printing as in *Python 2*).

That new syntax is actually also allowed in later versions of *Python 2*, so if we want to write code compatible with both versions, we could make a point of always writing print statements like this:

In [122]:
print("Hello World!")
print("Hello World"),("oops, I forgot the exclamation mark!")
print("Hello World"),
print("oops, I forgot the exclamation mark!")

Hello World!
Hello World oops, I forgot the exclamation mark!
Hello World oops, I forgot the exclamation mark!


It would probably not be a bad idea to do this as a matter of course. However, since I'm so used to the *Python 2* version, I'll most likely forget to do that. 

It's your call if you want to do the same.

### Formatted printing

One of the downsides of simple printing is that it gives us very little control over how things appear on the printed line.

For example, maybe we want to print just the first few decimal places of a *float*. Simple printing doesn't allow that:

In [123]:
import math
print math.pi

3.14159265359


The solution is formatted printing. The best way to illustrate how this works is by example.

Here is how we would print pi with a formatted *print* command, in a whole bunch of different formats:

In [124]:
print "Pi = %5.2f" % math.pi     #print as float: 
                                 #5 spaces in total, 2 decimal places

Pi =  3.14


In [125]:
print "Pi = %2d" % math.pi       #print integer part (truncate if reqd):
                                 #2 spaces in total      

Pi =  3


In [126]:
print "Pi = %10.3e" % math.pi     #print in exponential notation: 
                                  #10 spaces total, 3 decimal places

Pi =  3.142e+00


In [127]:
print "Pi = %10s" % math.pi      #print as a string: 
                                 #10 spaces total

Pi = 3.14159265359


The general notation here is hopefully pretty obvious -- the main thing you need to remember is the letter that denotes what format to use (e.g. "f", "d", "e", "s").

What if we want to print out more than one object? Easy:

In [128]:
print "Pi = %5.2f   e = %5.2f" % (math.pi, math.e)

Pi =  3.14   e =  2.72


So the thing that we are asking to be printed -- the thing after the "%" -- is actually provided to the formatted print statement as a *tuple*.

There are a couple of additional important things we should know:<br>

1. *printing* always involves an initial conversion of the output to a single string<br>

2. this implies we don't immediately have to print this string; we could *store* it instead

In [129]:
output = "Pi = %5.2f   e = %5.2f" % (math.pi, math.e)
print output

Pi =  3.14   e =  2.72


The other thing to note is that there are two fundamental ways to convert non-*strings* to *strings*. Both are actually implemented as *methods* that every data type should have:

1. "str" and "__str__"
    * all objects in *Python* should provide a method "__str__" 
    * this method should return a nice string representation of the object
    * this method is what's being called when we use the "%s" formatting statement
    * it's also what is called when we use the "str()" function to convert<br><br>
    
2. "repr" and "__repr__"
    * all objects in *Python* should provide a method "__repr__" 
    * this method should return another string representation of the object
    * this representation should be such that the built-in function *eval* recreates the object     
    * this method is what's being called when we use the "%r" formatting statement
    * this *method* is called when we use the "repr()" function to convert

All of this probably seems a little strange, but a couple of examples should clarify it:

In [130]:
mypi = math.pi
print type(mypi)
print mypi.__str__()   #looks awkward, I know, but illustrates the underlying method
print "%10s" % mypi    #%s formatting statement; calls __str__
print str(mypi)        #str() function; calls __str__
#
print
#
print mypi.__repr__()  #looks awkward, I know, but illustrates the underlying method
print "%10r" % mypi
print repr(mypi)
test = eval(repr(mypi))
print "%0.15f" % test  #note that "%0.15f" means "use as many spaces as you need"
print type(test)       #checking that "eval(repr)" has in, fact, returned the original type

<type 'float'>
3.14159265359
3.14159265359
3.14159265359

3.141592653589793
3.141592653589793
3.141592653589793
3.141592653589793
<type 'float'>


##### Exercise

What happens when we use the "eval" function on the *string* created by "str()"?

Here is a complete list of the various formatting code that are available:

|Formatting code|Meaning|
|:-------------:|-------|
|  d  |	Signed integer decimal.	 |
|  i  |	Signed integer decimal.	 |
|  o  | Signed octal value.|
|  u  |	Obsolete type – it is identical to 'd'.|
|  x  |	Signed hexadecimal (lowercase).|
|  X  |	Signed hexadecimal (uppercase).|
|  e  |	Floating point exponential format (lowercase).|
|  E  |	Floating point exponential format (uppercase).|
|  f  |	Floating point decimal format.|
|  F  |	Floating point decimal format.|
|  g  |	Floating point format.  Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.|
|  G  |	Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.|
|  c  |	Single character (accepts integer or single character string).|
|  r  |	String (converts any Python object using repr()).|
|  s  |	String (converts any Python object using str()).|
|  %  |	No argument is converted, results in a '%' character in the result.|

##### Exercise

Play around with both formatted and unformatted printing. Try to get a feel for how you can generate attractive-looking output for the user. 

As a specific example, suppose you want to print a *list* of *lists* (of various data types). Write a function that will print this in an attractive format, no matter what the data type.

## Interactive input

How do we go about allowing the user of our program to provide input via the keyboard?

*Python* provides two basic functions for this that we should know about.

### raw_input

We've actually already met the "raw_input()" function -- I briefly explained it in the context of the tic-tac-toe exercise at the end of the last session.

raw_input() asks the user for input from the keyboard; its optional input parameter is a prompt *string* provided to the user. It *always* returns the input as a *string*.

In [131]:
nr_s = raw_input('enter your house number as a string')
print 'The number was', nr_s, 'and its type was', type(nr_s)
#
print
#
nr_i =  raw_input('enter your house number as an integer')
print 'The number was', nr_i, 'and its type was', type(nr_i)

enter your house number as a stringten
The number was ten and its type was <type 'str'>

enter your house number as an integer10
The number was 10 and its type was <type 'str'>


In the hints for the tic-tac-toe exercise, we saw how already saw how we can convert input strings into other data types. 

In particular, we saw how to convert input of the form "1, 2" into a numerical list via the *split()* *method* and the *map()* *function*:

In [132]:
my_input = '1 2 2'
split_input = my_input.split(" ")
listint_input = map(int,split_input)
print split_input
print listint_input

['1', '2', '2']
[1, 2, 2]


For simpler input (e.g. a single integer), we can of course also just rely on the *eval()* function:

In [133]:
my_input = "2"
my_input_i = eval(my_input)
print my_input_i, type(my_input_i)

2 <type 'int'>


### (not so raw) input

The other way to read user input from the keyboard is via the function *input()*. 

This function is a bit more snazzy than *raw_input()*, in that it tries to *guess* what sort of data type we want to enter:

In [134]:
i = input('enter an integer: ')
print i, type(i)

enter an integer: 10
10 <type 'int'>


In [135]:
f = input('enter a float: ')
print f, type(f)

enter a float: 10.0
10.0 <type 'float'>


In [136]:
s = input('enter a string (use quotes!): ')
print s, type(s)

enter a string (use quotes!): "test"
test <type 'str'>


The *input()* function is actually pretty clever -- it even works for lists and tuples:

In [137]:
l = input('enter a list (use standard list notation): ')
print l, type(l)
t = input('enter a tuple (use standard tuple notation with parentheses): ')
print t, type(t)
t = input('enter a tuple (use standard tuple notation without parentheses): ')
print t, type(t)

enter a list (use standard list notation): [1, 2, 3]
[1, 2, 3] <type 'list'>
enter a tuple (use standard tuple notation with parentheses): (1, 2, 3)
(1, 2, 3) <type 'tuple'>
enter a tuple (use standard tuple notation without parentheses): 1, 2, 3
(1, 2, 3) <type 'tuple'>


Note that this means that we could have saved ourselves all that pain of using *split()* and *map()* in our tic-tac-toe program. Sorry about that -- I thought it'd be useful for you to learn these things :)

##### Exercise

Write a function that prompts the user for input and then reads this and tests it for validity. 

For example, the function might prompt the user to enter a real number, read that number, and complain if the user entered something that isn't a real number. In that case, it should reprompt a finite number of times (say 3 times), before finally stopping with an error message.

##### Exercise

First, write a function that prompts the user to generate a 4-digit PIN. It should ask the user to enter this twice and make sure that the same PIN was entered both times (and repeat the operation if it wasn't). The function should then return the PIN to the calling program.

Then write another function that will prompt the user for her PIN, giving her 3 attempts to enter it correctly. If the correct PIN is entered, the function should call another function (let's call it "access_granted"), if the incorrect PIN is entered 3 times, it should call yet another function (let's call it "access_denied"). Write those functions as well, but just make them "dummies" for now -- i.e. all they have to do is print "access granted" or "access denied". 

Finally, write a wrapper main program that calls your functions and test it all out.

## reading and writing files

Of course, we don't always want to just read from the keyboard and write to the screen.

More often than not, the data we want to read in will be stored in a file, and the results our program produces will need to be written to another file.

The most basic way of reading and writing files in *Python* involves *opening* a file *object* and then using the *read* and *write* methods that are automatically available for such *objects*.

You may be a little confused at this point that we're referring to a *method* associated with an *object*. After all, we haven't formally defined the term *object* in the course so far, and we've only really met *methods* as functions that are associated with, well, "objects" that are associated with a particular data type.

Don't worry about this for now. We'll define the term *object* a little more properly later in the course -- as we'll see there, in some sense everything is an *object* in *Python*. For the moment, it's perfectly fine -- and actually pretty much correct -- to just think of *file* as being another data type. 

So we we can have *objects* (files) that have that data type, and as such they can have methods associated with them.

The best way to understand how this basic way of reading and writing files works is by example (and the following ones are stolen shamelessly from Hans Fangohr's textbook).

In [138]:
# 1. Write a file
out_file = open("test.txt", "w")                 # "w" = open for writing"
out_file.write("This is the first line. \n" + \
               "This is the second line.")       # use "write" method - 
                                                 # "\n" is line break (in file)
                                                 #  +" = concat
                                                 # "\" = command contd next line
                                                 # (no comments after "\")

            
out_file.close()                                 # closefile with "close" method


# 2. Read a file
in_file = open("test.txt", "r")                  # "r" = "open for reading"
text = in_file.read()                            # read file with "read" method 
                                                 # put into string "text"

in_file.close()                                  # close file with "close" method


# 3. Display data
print text

This is the first line. 
This is the second line.


Things to note here:<br>

* the *object* returned by *file.read()* is a *string*<br><br>

* it's good practice to *close* and files that you've *opened*<br><br>

* we can insert line breaks (and other special characters) by using the "escape" character "\"

We can also use the same formatting statements we learned about for screen output for file-based I/O:

In [139]:
f = open("table.txt", "w")
for i in range(1, 11):
    f.write("%2d x 17 = %4d\n" % (i, i*17))
f.close()
#
f = open("table.txt", "r")
text = f.read() 
print text

 1 x 17 =   17
 2 x 17 =   34
 3 x 17 =   51
 4 x 17 =   68
 5 x 17 =   85
 6 x 17 =  102
 7 x 17 =  119
 8 x 17 =  136
 9 x 17 =  153
10 x 17 =  170



## reading and writing *arrays*

With the things we have learned so far, we can read and write just about any text-based input and output to and from a file. 

However, doing so can be pretty tedious: we always have to think pretty hard about formatting (when we do output) and carefully split strings into various bits of data (when we do input).

This can get pretty annoying...

What file and data formats do we usually want to work with?

* file:
    * one or more columns of numerical or string data
    * possibly a few lines with header comments
    * columns separated by space, tab, comma, ....

* data:
    * *numpy* *arrays*
    * one per file column

Wouldn't it be nice if there were some simple functions that make reading and writing this type of data easier for us?

Well, there are...

#### *loadtxt* and *savetxt*

*numpy* provides two functions for us that make reading and writing *arrays* to and from column-based files really really simple.

These functions are *numpy.loadtxt()* and *numpy.savetxt()*.

Here is an example of how to use them:

In [140]:
import numpy as np
x = np.arange(3.0)                    #create an array x

print " x"                            #print x out for us
print "==="
for i in range(len(x)):
    print x[i]
print
print
    
np.savetxt('test_save.txt', x)        #use savetxt to save x 
                                      #to file test_save.txt

a = np.loadtxt('test_save.txt')       #use loadtxt to read the data
                                      #from the file into the new
                                      #vector a

print type(a)                         #check the typea of a
print
print " a"                            #print a out for us
print "==="
for i in range(len(a)):
    print a[i]
print    

print "the actual file content:"  
print "========================"
f = open('test_save.txt', 'r')
print f.read()                        #inspect actual file contents

 x
===
0.0
1.0
2.0


<type 'numpy.ndarray'>

 a
===
0.0
1.0
2.0

the actual file content:
0.000000000000000000e+00
1.000000000000000000e+00
2.000000000000000000e+00



So this is very nice. We dumped the array x into a text file, writing one element per line. We didn't have much control over the formatting, but that's a detail we can worry about later.

Now let's try the obvious way of reading and writing multiple 1-D arrays to a single file:

In [141]:
import numpy as np
x = np.arange(5)                      #create an array x
y = x**2                              #create an array y

print " x     y  "                    #print x and y out for us
print "========="
for i in range(len(x)):
    print x[i],' ', y[i]
print
print
    
np.savetxt('test_save.txt', (x, y),\
           fmt = '%4i')               #use savetxt to save x and y to file 
                                      #we use the "i" formatting string
                                      #since our variables are *ints* 
                                      #otherwise it will be exponential
                                      #notation by default
            
a, b = np.loadtxt('test_save.txt',\
                  dtype=int)          #use loadtxt to read the data
                                      #from the file into two new
                                      #vectors called a and b
                                      #specify type of vectors via *dtype*
                                      #(default is *float*)

print type(a), type(b)                #check the types of a and b
print
print " a     b  "                    #print a and b out for us
print "========="
for i in range(len(a)):
    print a[i],' ', b[i]
print    
    
print "the actual file content:"  
print "========================"
f = open('test_save.txt', 'r')
print f.read()                        #inspect actual file contents

 x     y  
0   0
1   1
2   4
3   9
4   16


<type 'numpy.ndarray'> <type 'numpy.ndarray'>

 a     b  
0   0
1   1
2   4
3   9
4   16

the actual file content:
   0    1    2    3    4
   0    1    4    9   16



Hmmm, so the reading and writing *seemed* to go OK, but when we look at the actual file contents, we see that the two 1-D arrays were **not** written to a 2-D column file (one column per array). Instead, they were written to a 2-row file (one row per array).

This is once again due to the fact that, when we pass something like "(x, y)" as an argument, we're passing something like a *list* of *lists*.

(Actually, in this case, it's a *tuple* of *arrays* -- in general terms, what we're talking about here really is a sequence of sequences -- but list of lists is probably more intuitive.)

What the function *savetxt* does is take a single list (sequence) and writes it to file, one element per line.

If that list (sequence) contains other lists (sequences), then these are written into separate colunns.

Viewed like this, the behaviour above makes perfect sense. 

Note also that *loadtxt* reads things in exactly the same way.

So how can we get *savetxt* and *loadtxt* deal with column-based I/O?

Well, we basically have to transpose the dimensions of the multi-dimensional *tuples* were using.

Luckily, there are convenient ways to do that:

* writing with *savetxt*:<br><br>
    * *numpy* provides a function *column_stack*, which stacks 1-D *arrays* into a suitable 2-D *array<br><br>
    * so instead of passing "(x, y)" to *savetxt*, we pass "np.column_stack((x,y))"<br><br>

* reading with *loadtxt*:<br><br>
    * *loadtxt* has an optional *boolean* input parameter *unpack*<br><br>
    * if this is set to *True*, things are read in colunn-based format automatically

Let's see how this works by using it in our previous example:

In [142]:
import numpy as np
x = np.arange(5)                      #create an array x
y = x**2                              #create an array y

print " x     y  "                    #print x and y out for us
print "========="
for i in range(len(x)):
    print x[i],' ', y[i]
print
print
    
np.savetxt('test_save.txt',\
           np.column_stack((x, y)),\
           fmt = '%4i')               #use savetxt to save x and y to file 
                                      #we use the "i" formatting string
                                      #since our variables are *ints* 
                                      #otherwise it will be exponential
                                      #notation by default
            
a, b = np.loadtxt('test_save.txt',\
                  unpack=True, \
                  dtype=int)          #use loadtxt to read the data
                                      #from the file into two new
                                      #vectors called a and b
                                      #specify type of vectors via *dtype*
                                      #(default is *float*)

print type(a), type(b)                #check the types of a and b
print
print " a     b  "                    #print a and b out for us
print "========="
for i in range(len(a)):
    print a[i],' ', b[i]
print    
    
print "the actual file content:"  
print "========================"
f = open('test_save.txt', 'r')
print f.read()                        #inspect actual file contents

 x     y  
0   0
1   1
2   4
3   9
4   16


<type 'numpy.ndarray'> <type 'numpy.ndarray'>

 a     b  
0   0
1   1
2   4
3   9
4   16

the actual file content:
   0    0
   1    1
   2    4
   3    9
   4   16



Both *loadtxt* and *savetxt* are actually more flexible than our example might suggest.

For example, they can be used to read and write files where the columns contain data of different types and files with header comments, different column delimiters.

##### Exercise

Experiment with *savetxt* and *loadtxt*. The full specifications can be found here:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

Here are some specific things to try (but there are loads more!):

* Try writing to and reading from a 3 column file, where different columns are different data types (e.g. one *int*, one *float* and one *string*).

* Try out different delimiters, e.g. produce csv (comma separated values) output. 

* Can you figure out how to generate output using tabs as delimiters?

* Generate a 5 column file and then read in *only* column 3.

* Skip rows 2 and 4 from an input file when reading it.

##### Exercise

The UK government (and indeed most government and large organizations) collates a ton of interesting and useful data that is available for convenient download, usually in csv format. 

Grab the latest data set on smoking statistics for England here:

https://data.gov.uk/dataset/statistics_on_smoking_england

We will want to use this dagta to answer the question "What is the amount of money spent on smoking-prevention prescriptions per smoking-related death over the last 30 years?".

First, familiarize yourself with the data set -- it comes in 3 csv files and a *Word* document explaining the format. Then, write a function that gathers the necessary data. Finally, write a program that calls this function and manipulates the data to create an array with the quantity you need (let's call it frac_spend) and saves frac_spend as a function of year in a 2-column text-based file.

##### Exercise (advanced)

Plot frac_spend vs year using *matplotlib* functions. We have not yet covered this properly, but take a look back at our discussion of the *matplotlib* package and/or take a look at the endless info on matplotlib online.

##### Exercise

Continue work on your tic-tac-toe and/or Connect 4 program. Specifically, write the parts (functions?) that will read the user input and write the information the user needs to the screen.