# Input / Output Operations

Programs are no good on their own without having means of interacting with their user.

## Output to screen - print function.
We have had a brief look at python's print function over the last couple of tutorials.

In [1]:
from __future__ import print_function
s = 'string1'  # a random string
i = 5  # an integer
f = 0.33  # a float

If we want to print out a certain variable on screen we can use the `print` function.

In [2]:
print(s)  # string1
print(i)  # 5
print(f)  # 0.33

string1
5
0.33


We can also use print to print out multiple variables.

In [3]:
print(s, i, f)

string1 5 0.33


A good practise it to describe what we are printing when we print the value of a variable.

In [4]:
print('The value of the variable s is:', s)

The value of the variable s is: string1


We can even place variable values in text.

### old formatting scheme

In [5]:
print('%s is a string, %i is an integer and %f is a float' %(s, i, f))

string1 is a string, 5 is an integer and 0.330000 is a float


When we place a `%s` in a string, python knows to expect a string variable to take its place. The same happens with integers (`%i`) and floats (`%f`).

In [6]:
print('%s is a string' %f)

0.33 is a string


So the float was typecasted as a string and then printed out.  
We can confirm this because `print('%i is a string' %s)` would raise a *TypeError*. 

### a better way...

A better way to format is to use the `string.format()` method which has more functionality.

In [7]:
print('{} {}'.format(12, 23))

12 23


We can also configure the order in which the variables are passed to the string, by placing an index into the brackets.

In [8]:
print('{1} {0}'.format(12, 23))

23 12


With this we essentially instruct python to use the second (index 1) argument of the tuple first and the first (index 0) one second.  

Another useful thing is to define how many characters the variable should occupy in the string. If the variable requires more characters, it will discard the latter ones. If it has less it will fill the rest with whitespaces. If this is the case, we can align the string wherever we want.

The syntax is:
```python
'{:<10}'.format(variable)  # this reserves 10 characters for the variable and aligns it left
# By changing the number we can define how many characters we will reserve for the variable:
'{:<5}'.format(variable)   # this reserves 5 characters and aligns the variable to the left
# We can also align the variable to the middle or the right of the reserved characters:
'{:^5}'.format(variable)  # middle alignment
'{:>5}'.format(variable)  # right alignment
```

In [9]:
print('{:>20}'.format(s))

             string1


This tells python to create a string of 20 characters and align our string to the right of that (`>`). 

Here `string1` contains 7 characters, but we told the string to reserve 20 characters for this variable and to place it in the end of those 20 characters. The rest were *padded* with whitespace. 

We can also define with which character the string will *pad* these surplus characters by adding that value after the colon (`:`).

A final option is to *truncate* long strings (this is usually done in floats where we don't care about all of the decimals.

In [10]:
# we can use padding to see this more clearly
print('{:_>20}'.format(s))
# left alignment (with padding):
print('{:_<20}'.format(s))
# middle alignment (with padding):
print('{:_^20}'.format(s))
# we can also truncate long strings (lets say to 5 characters)
print('{:.5}'.format(s))
# truncating + padding
print('{:_<10.5}'.format(s))

_____________string1
string1_____________
______string1_______
strin
strin_____


### Number types:

In [11]:
# Integers:
print('{:d}'.format(42))  # 'd' is same thing as 'i' for printing 
# Floats:
print('{:f}'.format(3.141592653589793))
# with padding: 
print('{:5d}'.format(42))
# padding and truncating
print('{:06.2f}'.format(3.141592653589793))

42
3.141593
   42
003.14


### Named Placeholders:

In [12]:
# from dictionary
dic = {'first':'first_string','second':'second_string','third':'third_string'}
print('{first}, {third}'.format(**dic))
# explicit naming
print('{first} {last}'.format(first='str1', last='st2'))
# from list
range_list = list(range(21,30))
print('{l[2]}, {l[5]},   {l[0]}'.format(l=range_list))
print('{:.{prec}} = {:.{prec}f}'.format('Gibberish', 2.7182, prec=3))
# Parameterized formats
print('{:{align}{width}}'.format('test', align='^', width='10'))

first_string, third_string
str1 st2
23, 26,   21
Gib = 2.718
   test   


### Datetime:

In [13]:
from datetime import datetime
dt = datetime(2001, 2, 3, 4, 5)
print('{:{dfmt} {tfmt}}'.format(dt, dfmt='%Y-%m-%d', tfmt='%H:%M'))

2001-02-03 04:05


### Formatting example:

In [14]:
print('Without formatting:')
for x in range(1, 11):
    print(x, x ** 2, x ** 3)
    
print('\n')

print('With formatting:')
for x in range(1, 11):
    print('{0:2d} {1:3d} {2:4d}'.format(x, x ** 2, x ** 3))

Without formatting:
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729
10 100 1000


With formatting:
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000


## Keyboard Input.

This happens when the we ask the user to input something from the keyboard so that we can use it in our code.

This is done through the built-in `input()` function equivalent to python2's `raw_input()`. This function accepts a parameter and reads it as a `string`.

In [15]:
name = input('What is your name? ') # prints the string 'What is your name? 'and expects a string as an input to store in the variable name
print(name, type(name))
age = input('What is your age? ') # same thing with age
print(age, type(age))
# So, no matter what we enter, it is interpreted as a string.
# If we age expecting an integer we need to cast it
age = int(age) 
print(type(age))

What is your name? Thanos
Thanos <class 'str'>
What is your age? 28
28 <class 'str'>
<class 'int'>


When reading inputs from users, things might not turn out always as expected. A smart thing to do is to check if the input matches your specifications and force the user to repeat what he entered if it does not.

### Example

Lets say we want to make a test.

In [16]:
print('Which of the following animals is also a name for a programming language?')
print('{:<15} {:<15}'.format('A: Python', 'B: Pitbull'))
print('{:<15} {:<15}'.format('C: Bear', 'D: Horse'))

Which of the following animals is also a name for a programming language?
A: Python       B: Pitbull     
C: Bear         D: Horse       


The first thing we want to do is to create a loop that will force the user to answer only with a letter a, b, c or d.

In [17]:
answer = input('Please enter your answer: ')
while answer.lower() not in ('a', 'b', 'c', 'd'):
    answer = input('Not an appropriate choice. The answer must be A, B, C or D.\nTry again: ')

# With this infinite loop we repeat the raw input until the user answers a letter that is in a given set.
# By utilizing the .lower function we can ensure that both upper and lowercase letters are acceptable.
if answer.lower() == 'a':
    print('Congratulations, you chose correctly!')
else:
    print('Sorry, the correct answer was A: Python.')

Please enter your answer: e
Not an appropriate choice. The answer must be A, B, C or D.
Try again: b
Sorry, the correct answer was A: Python.


## Using files for I/O operations.

Variables are no good for storing data for long term use. Once the program has finished, variables will be removed and the data they stored will be deleted from memory. **Files** are a way of storing data in your disk for long term use.

In [18]:
# say we create a list
a_list = list(range(55)) + ['a', 'ab', 'abc'] * 15

We want to store this list into a file so that we use it in the future. First, we need to specify the path that we want the file to be.

```python
# Linux paths:
file_location = '/home/thanos/path/to/file'
# Windows paths:
file_location = 'C:\\Users\\thanos\\path\\to\\file'
```

The default path your working directory (the directory from where python was launched). 

In [19]:
# Feel free to change your path to a valid one. I'll use the current directory.
file_location = 'test_file'

We can use a file with the python's `open()` function.

```python
handle = open(path, 'wb')
```

This command instructs python to open the file identified in path and refer to it with the variable `handle`.

- The `'w'` option indicates that the file is opened only for writing (not reading).
- The `'b'` option indicates that the file is being written in binary format.

The latter has an effect only on systems that differentiate between binary and text files. If we want to write a text file we shouldn't add the `'b'` option.

If the file doesn't exist it will create it. If a file with the same name exists in this location python will overwrite it!

In [20]:
f = open(file_location, 'w')
# According to python f is: <open file 'test_file', mode 'w' at 0x7f9e8077c9c0>

In [21]:
# In order to store something in the file we can use the file's .write() function.
f.write('This is the file \n')
# The previous command writes the string 'This is the file \n' to the file.
# If we want to write the contents of the list, we need to first convert them to a string.
for i in a_list:
    f.write(str(i) + '\n') # we chose to separate the elements with a new line
# Now that we're done with the file we need to close it.
# This is done with the file.close() method 
f.close()

The `file.close()` method is important and should not be forgotten.  
If we don't close it manually, python will eventually close the file when it is up for garbage collection but we don't know exactly when this will be.  
It is a bad practice leaving open file handles all over the place because that would be a waste of system resources.

Now let's try and retrieve the stored data from the file.  
This time we need to open the file for reading.  
We will use a different syntax this time to open the file. Instead of `open()` - `close()` we will use the `with` statement. This allows us to open a file and perform all actions on the file with **indented** commands. We don't need to close the file afterwards!

In [22]:
new_list = []  # we will store the retrieved elements here
with open(file_location, 'r') as f:  # this command opens a file for reading ('r') and refers to it as f
    f.readline()  # this command will read the first line of the file and print it on screen.
    for line in f:  # this automatically iterates through every line of the file until the eof
        new_list.append(line)  # because the elements were line-separated we just need 
                               # to append each line to a new element of the file
print(new_list)

['0\n', '1\n', '2\n', '3\n', '4\n', '5\n', '6\n', '7\n', '8\n', '9\n', '10\n', '11\n', '12\n', '13\n', '14\n', '15\n', '16\n', '17\n', '18\n', '19\n', '20\n', '21\n', '22\n', '23\n', '24\n', '25\n', '26\n', '27\n', '28\n', '29\n', '30\n', '31\n', '32\n', '33\n', '34\n', '35\n', '36\n', '37\n', '38\n', '39\n', '40\n', '41\n', '42\n', '43\n', '44\n', '45\n', '46\n', '47\n', '48\n', '49\n', '50\n', '51\n', '52\n', '53\n', '54\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n', 'a\n', 'ab\n', 'abc\n']


Because the format isn't what we wanted, we need to erase the last two characters from the string (the `'\n'` part) and convert the first numbers to integers. This is not a really effective way, though, to store data.

Other open modes are `'a'` which opens the file for **updating** (doesn't overwrite the previous file). `'r+'` which opens a file for **reading and writing** (similarly `'w+'` and `'a+'`).

### Pickle module

This is probably the most common and most effective way to store data in python.

In [23]:
import pickle
pickle_file = 'test_file.pkl'  # or /path/to/test_file.pkl. suffix is optional
pickle.dump(a_list, open(pickle_file, 'wb'))  # this command will store the data in a_list inside the pickle_file 
# This file is now NOT human-readable
new_list = pickle.load(open(pickle_file, 'rb'))  # this command loads data from a pickle file to memory
print(new_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc']


With pickle, the new list has the exact same format as the old one. As a plus we managed that only with one line for the saving (`.dump()`) and one line for the loading (`.load()`). No iterations are needed when using pickle. Pickle files are only good for storing data. It is not good for creating any sort of human-readable file (for example log files).

### CSV module

This module is for storing (and loading) data from (and to) csv format.
CSV (comma separated values) is one of the most common formats for storing data (most commonly from spreadsheets) into text files. The standard csv format separates the values of an entry in a spreadsheet with commas and entries (rows or records) with new lines.  
Other similar formats are the tab-separated values (TSV) format, or more custom formats that use custom symbols to separate values.

In [24]:
# lets say we have a list of triples that we want to store as a csv file
trip_list = [('thanos', 1, 2), ('mike', 2, 3), ('george', 5, 6), ('maria', 9, 8), ('xristina', 12, 23)] 
import csv
with open('a_file.csv', 'w', newline='\n') as f:  # define the line separator
    writer = csv.writer(f, delimiter=',')  # here we are using the default delimiter and it is not necessary.
                                           # if we wanted a space separated values file we could use delimiter=' ',
                                           # if we wanted a tab-separated values file we could have used delimiter='\t'
    for i in trip_list:
        writer.writerow(i)

This creates a csv file named a_file.csv containing the following:

    thanos,1,2
    mike,2,3
    george,5,6
    maria,9,8
    xristina,12,23

Now, let's try and load it back.

In [25]:
new_list = []
with open('a_file.csv', 'r', newline='\n') as f:
    reader = csv.reader(f, delimiter=',')  # again here delimiter is optional
    for row in reader:
        new_list.append(row)
print(new_list)

[['thanos', '1', '2'], ['mike', '2', '3'], ['george', '5', '6'], ['maria', '9', '8'], ['xristina', '12', '23']]


If we wanted a list of tuples we should have changed the last line to `new_list.append(tuple(row))`.
Then we would have gotten new_list: `[('thanos', '1', '2'), ('mike', '2', '3'), ('george', '5', '6'), ('maria', '9', '8'), ('xristina', '12', '23')]`.

Again, there are differences with the original (for example integers are stored as strings and depending on our situation may need to be identified and casted back as integers).

In future tutorials we will learn easier ways of reading and writing csv files.

### Other formats

Besides from these there are many more ways of storing data, such as HDF5, json, memory-mapped data and other.

- **hdf5** is a versatile data model that can represent complex objects. It offers good performance with large files. **module: h5py**

- **json**, like xml is a sort of markup language that is designed for human-readable data interchange. **module: json**

- **memory-mapped data** (such as LMDB) offer higher I/O performance. **module: mmap**

- **xls**, or excel spreadsheets. **module: xlrd**

- **xml**, **module: xml**

- **rdf**, **module: rdflib**

and many more...