## Lesson 6 - Python and IPython Review

* McKinney Appendix: Python Language Essentials
* McKinney Chapter 3

### Python Review (plus a few new features)

For this lesson, we will open up an IPython terminal (type `ipython` at your bash prompt) and follow along. Then we will work though an example script, executed from the command line.

#### List Methods

from [Python Documentation](https://docs.python.org/2/tutorial/datastructures.html)

Here are all of the methods of list objects:

**list.append(x)**
Add an item to the end of the list; equivalent to a[len(a):] = [x].

**list.extend(L)**
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.

**list.insert(i, x)**
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).

**list.remove(x)**
Remove the first item from the list whose value is x. It is an error if there is no such item.

**list.pop([i])**
Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position. You will see this notation frequently in the Python Library Reference.)

**list.index(x)**
Return the index in the list of the first item whose value is x. It is an error if there is no such item.

**list.count(x)**
Return the number of times x appears in the list.

**list.sort(cmp=None, key=None, reverse=False)**
Sort the items of the list in place (the arguments can be used for sort customization, see sorted() for their explanation).

**list.reverse()**
Reverse the elements of the list, in place.

An example that uses most of the list methods:

	>>> a = [66.25, 333, 333, 1, 1234.5]
	>>> print a.count(333), a.count(66.25), a.count('x')
	2 1 0
	>>> a.insert(2, -1)
	>>> a.append(333)
	>>> a
	[66.25, 333, -1, 333, 1, 1234.5, 333]
	>>> a.index(333)
	1
	>>> a.remove(333)
	>>> a
	[66.25, -1, 333, 1, 1234.5, 333]
	>>> a.reverse()
	>>> a
	[333, 1234.5, 1, 333, -1, 66.25]
	>>> a.sort()
	>>> a
	[-1, 1, 66.25, 333, 333, 1234.5]
	>>> a.pop()
	1234.5
	>>> a
	[-1, 1, 66.25, 333, 333]

You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed -- they return the default None. This is a design principle for all mutable data structures in Python.

#### Dict Operations

from [Python Documentation](https://docs.python.org/2/tutorial/datastructures.html) 

Another useful data type built into Python is the dictionary (see Mapping Types — dict). Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

The main operations on a dictionary are storing a value with some key and extracting the value given the key. It is also possible to delete a key:value pair with del. If you store using a key that is already in use, the old value associated with that key is forgotten. It is an error to extract a value using a non-existent key.

The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the in keyword.

Here is a small example using a dictionary:

	>>>
	>>> tel = {'jack': 4098, 'sape': 4139}
	>>> tel['guido'] = 4127
	>>> tel
	{'sape': 4139, 'guido': 4127, 'jack': 4098}
	>>> tel['jack']
	4098
	>>> del tel['sape']
	>>> tel['irv'] = 4127
	>>> tel
	{'guido': 4127, 'irv': 4127, 'jack': 4098}
	>>> tel.keys()
	['guido', 'irv', 'jack']
	>>> 'guido' in tel
	True

The dict() constructor builds dictionaries directly from sequences of key-value pairs:

	>>>
	>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
	{'sape': 4139, 'jack': 4098, 'guido': 4127}

In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions:

	>>>
	>>> {x: x**2 for x in (2, 4, 6)}
	{2: 4, 4: 16, 6: 36}

When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:

	>>>
	>>> dict(sape=4139, guido=4127, jack=4098)
	{'sape': 4139, 'jack': 4098, 'guido': 4127}

#### Ordered Dicts

Dicts are inherently unordered, but there are ways to order them if you need to. If you wanted to print the contents of a dict in its natural disordered state, you could write:

	for key, value in mydict.iteritems():
	    print key, value

To print the dict sorted by key, you can retrieve the keys as a list, sort them, and then print each key-value pair one by one.

	D = {'a': 1, 'b': 2, 'c': 3}
	sorted_keys = D.keys()   # Unordered keys list
	sorted_keys.sort()       # Sorted keys list
	for key in sorted_keys:  # Iterate through sorted keys
	    print key, '=>', D[key]
	    
The sorted function for dicts saves some time by generated the list of sorted keys.
	
	sorted_keys = sorted(D)
	for key in sorted_keys:
	    print key, '=>', D[key]
	    
Whether you sort your dict or not, being able to print it out or iterate through it is useful for debugging and in your programs.

If you really want a dict to be ordered, and not just *printed* in order using a list of sorted keys, you can use the OrderedDict module.

	from collections import OrderedDict
	
	print 'Regular dictionary:'
	d = {}
	d['a'] = 'A'
	d['b'] = 'B'
	d['c'] = 'C'
	d['d'] = 'D'
	d['e'] = 'E'
	
	for k, v in d.items():
	    print k, v
	
	print '\nOrderedDict:'
	d = OrderedDict()
	d['a'] = 'A'
	d['b'] = 'B'
	d['c'] = 'C'
	d['d'] = 'D'
	d['e'] = 'E'
	
	for k, v in d.items():
	    print k, v
	    
And getting into the really nitty-gritty:

**dict.items()** Return a copy of the dictionary’s list of (key, value) pairs. Creates the items all at once and returns a list. 

**dict.iteritems()** Return an iterator over the dictionary's (key, value) pairs. Returns a generator--a generator is an object that "creates" one item at a time every time next() is called on it.


#### Looping Techniques

from [Python Documentation](https://docs.python.org/2/tutorial/datastructures.html)

When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the enumerate() function.

	>>>
	>>> for i, v in enumerate(['tic', 'tac', 'toe']):
	...     print i, v
	...
	0 tic
	1 tac
	2 toe

To loop over two or more sequences at the same time, the entries can be paired with the zip() function.

	>>>
	>>> questions = ['name', 'quest', 'favorite color']
	>>> answers = ['lancelot', 'the holy grail', 'blue']
	>>> for q, a in zip(questions, answers):
	...     print 'What is your {0}?  It is {1}.'.format(q, a)
	...
	What is your name?  It is lancelot.
	What is your quest?  It is the holy grail.
	What is your favorite color?  It is blue.

To make a dictionary from those two lists:

	>>>
	>>> foo = dict(zip(questions, answers))
	>>> foo
	{'quest': 'the holy grail', 'name': 'lancelot', 'favorite color': 'blue'}

To loop over a sequence in reverse, first specify the sequence in a forward direction and then call the reversed() function.

	>>>
	>>> for i in reversed(xrange(1,10,2)):
	...     print i
	...
	9
	7
	5
	3
	1

To loop over a sequence in sorted order, use the sorted() function which returns a new sorted list while leaving the source unaltered.

	>>>
	>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
	>>> for f in sorted(set(basket)):
	...     print f
	...
	apple
	banana
	orange
	pear

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the iteritems() method.

	>>>
	>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
	>>> for k, v in knights.iteritems():
	...     print k, v
	...
	gallahad the pure
	robin the brave

It is sometimes tempting to change a list while you are looping over it; however, it is often simpler and safer to create a new list instead.

	>>>
	>>> import math
	>>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8]
	>>> filtered_data = []
	>>> for value in raw_data:
	...     if not math.isnan(value):
	...         filtered_data.append(value)
	...
	>>> filtered_data
	[56.2, 51.7, 55.3, 52.5, 47.8]

#### List Comprehensions

from [Python Documentation](https://docs.python.org/2/tutorial/datastructures.html)

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

For example, assume we want to create a list of squares, like:

	>>>
	>>> squares = []
	>>> for x in range(10):
	...     squares.append(x**2)
	...
	>>> squares
	[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can obtain the same result with:

	squares = [x**2 for x in range(10)]

This is also equivalent to `squares = map(lambda x: x**2, range(10))`, but it’s more concise and readable.

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:

	>>>
	>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
	[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

and it’s equivalent to:

	>>>
	>>> combs = []
	>>> for x in [1,2,3]:
	...     for y in [3,1,4]:
	...         if x != y:
	...             combs.append((x, y))
	...
	>>> combs
	[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

Note how the order of the for and if statements is the same in both these snippets.

If the expression is a tuple (e.g. the `(x, y)` in the previous example), it must be parenthesized.

	>>>
	>>> vec = [-4, -2, 0, 2, 4]
	>>> # create a new list with the values doubled
	>>> [x*2 for x in vec]
	[-8, -4, 0, 4, 8]
	>>> # filter the list to exclude negative numbers
	>>> [x for x in vec if x >= 0]
	[0, 2, 4]
	>>> # apply a function to all the elements
	>>> [abs(x) for x in vec]
	[4, 2, 0, 2, 4]
	>>> # call a method on each element
	>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
	>>> [weapon.strip() for weapon in freshfruit]
	['banana', 'loganberry', 'passion fruit']
	>>> # create a list of 2-tuples like (number, square)
	>>> [(x, x**2) for x in range(6)]
	[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]
	>>> # the tuple must be parenthesized, otherwise an error is raised
	>>> [x, x**2 for x in range(6)]
	  File "<stdin>", line 1
	    [x, x**2 for x in range(6)]
	               ^
	SyntaxError: invalid syntax
	>>> # flatten a list using a listcomp with two 'for'
	>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
	>>> [num for elem in vec for num in elem]
	[1, 2, 3, 4, 5, 6, 7, 8, 9]

List comprehensions can contain complex expressions and nested functions:

	>>>
	>>> from math import pi
	>>> [str(round(pi, i)) for i in range(1, 6)]
	['3.1', '3.14', '3.142', '3.1416', '3.14159']

### Example: Survey for Python Students

```python
# survey_sio209.py

print """
#################################
SIO 209: Python for Data Analysis
#################################

Please answer the following questions.
"""

def get_free_text(prompt):
    answer = raw_input(prompt)
    return answer

def get_score(prompt):
    while True:
        answer = raw_input(prompt)
        if answer in ["0", "1", "2", "3"]:
            break
        else:
            print "You must select a number from 0 to 3."
    return answer

def get_yes_no(prompt):
    answer = ""
    while answer not in ["Y", "N"]:
        answer = raw_input(prompt)
        answer = answer[0].upper()
    return answer

def get_computer():
    answer = ""
    while answer not in ["Mac", "Linux", "Windows"]:
        answer = raw_input(
            "Which operating system does your computer run? " + 
            "(Mac, Linux, Windows) \n> ")
    return answer

first_name = get_free_text("First name? \n> ")
last_name = get_free_text("Last name? \n> ")
computer_yes_no = get_yes_no("Do you have your own laptop? (Y, N) \n> ")
computer_operating_system = get_computer()

print """
For the following questions, select a number from 0 to 3, where
\t0 - none
\t1 - some
\t2 - moderate
\t3 - experienced

What is your experience with the following:"""

score_command_line = get_score("Command line? \n> ")
score_bash = get_score("Bash? \n> ")
score_r = get_score("R? \n> ")
score_matlab = get_score("MATLAB? \n> ")
score_perl = get_score("Perl? \n> ")
score_python = get_score("Python? \n> ")

answers = {'name_first': first_name, 
           'name_last': last_name, 
           'computer_has': computer_yes_no, 
           'computer_os': computer_operating_system, 
           'score_command': score_command_line, 
           'score_bash': score_bash, 
           'score_r': score_r, 
           'score_matlab': score_matlab, 
           'score_perl': score_perl, 
           'score_python': score_python}
keys = answers.keys()
keys.sort()

outfile = "answers_" + first_name + "_" + last_name + ".csv"
f = open(outfile, 'w')
for key in keys:
    f.write("%s,%s\n" % (key, answers[key]))
f.close()

print """
Thank you for completing the survey!

Your answers have been stored in the file 'answers_%s_%s.csv'.
""" % (first_name, last_name)
```

### The star operator

http://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean

### IPython Review (plus a few new features)

#### IPython Interpreter

Launch `ipython2`, which is IPython using Python 2. You don't want Jupyter's default `ipython`, which uses Python 3. When you launch, notice all the information in the welcome message:

	[luke@jupyter ~]$ ipython2
	Python 2.7.5 (default, Jun 24 2015, 00:41:19) 
	Type "copyright", "credits" or "license" for more information.
	
	IPython 3.2.1 -- An enhanced Interactive Python.
	?         -> Introduction and overview of IPython's features.
	%quickref -> Quick reference.
	help      -> Python's own help system.
	object?   -> Details about 'object', use 'object??' for extra details.

**Tab completion** is a major feature of IPython. Here are some examples from McKinney Ch. 3:

	In [1]: an_apple = 27 
	In [2]: an_example = 42
	In [3]: an<Tab>

	In [4]: b = [1, 2, 3]
	In [5]: b.<Tab>

	In [6]: import datetime
	In [7]: datetime.<Tab>

**Introspection** is what we do with the question mark (?) after an object. One question mark `object?` shows us the docstring: details about the object and its use. Two question marks `object??` shows us the source code: the actual code the source code, if available.  ? and ?? can be used with *any* object in Python.

**The %run command** allows you to run a Python program from within IPython. This is similar to running at a system prompt `python file args`, but with the advantage of giving you IPython’s tracebacks, and of loading all variables into your interactive namespace for further use (unless -p is used). Try using `%run` with an existing Python script. Check that you can access the variables created by running `dir()` to return the names in the current scope (note that `%who` is far more useful). 

	In [1]: %run regex_ex4.py
	In [2]: dir()
	
**Magic commands** are special commands that give you addtional control over your session and computer:

	%quickref  IPython reference card
	%who       Display variables defined in interactive namespace
	%whos      Display variables defined in interactive namespace with types and info
	%reset     Delete all variables in interactive namespace
	%hist      Print command history
	%run       Run a Python script inside IPython
	%magic     Documentation on all the magic commands
	!pwd       Execute pwd or any other system shell command

**Command history** is recalled interactively. Just type the first few letters of your command, then the up-arrow, to see all the previous commands you've typed that begin with those letters.

**Input and output variables** are stored for use later on:

	In [1]: 2 ** 27
	Out[1]: 134217728
	
	In [2]: In[1]       # input of command 1
	Out[2]: u'2 ** 27'
	
	In [3]: Out[1]      # output of command 1
	Out[3]: 134217728
	
	In [4]: _           # previous output
	Out[4]: 134217728
	
	In [5]: __          # previous previous output
	Out[5]: 134217728
	
	In [6]: _1          # output of command 1
	Out[6]: 134217728
	
	In [7]: _i1         # input of command 1
	Out[7]: u'2 ** 27'

#### IPython Notebook

Using the IPython notebook is similar to using the IPython interpreter, but the user interface is a bit different. It takes some getting used to, but it's worth it. The notebook will let you compartmentalize your code, provide rich comments with Markdown, and see output plots inline (using the magic `%matplotlib inline`).

Shortcut   | Action
:--        |:--
Shift-Enter| run cell
Ctrl-Enter | run cell in-place
Alt-Enter  | run cell, insert below
Ctrl-m x | cut cell
Ctrl-m c | copy cell
Ctrl-m v | paste cell
Ctrl-m d | delete cell
Ctrl-m z | undo last cell deletion
Ctrl-m - | split cell
Ctrl-m a | insert cell above
Ctrl-m b | insert cell below
Ctrl-m o | toggle output
Ctrl-m O | toggle output scroll
Ctrl-m l | toggle line numbers
Ctrl-m s | save notebook
Ctrl-m j | move cell down
Ctrl-m k | move cell up
Ctrl-m y | code cell
Ctrl-m m | markdown cell
Ctrl-m t | raw cell
Ctrl-m 1-6 | heading 1-6 cell
Ctrl-m p | select previous
Ctrl-m n | select next
Ctrl-m i | interrupt kernel
Ctrl-m . | restart kernel
Ctrl-m h | show keyboard shortcuts