# Python tips and tricks
MCL Tutorial / Spring 2016 / David Dumas <david@dumas.io>

## Inspired by / see also

* R. Hettinger's PyCon 2013 presentation "Transforming code into Beautiful, Idiomatic Python"

  * [slides](https://speakerdeck.com/pyconslides/transforming-code-into-beautiful-idiomatic-python-by-raymond-hettinger-1)
  * [video](http://www.youtube.com/watch?v=OSGv2VnC0go)
  * [notes & code as github repo, by Jeff Paine](https://gist.github.com/JeffPaine/6213790)
* D. Beazley and B. Jones, _Python Cookbook_, 3ed.  O'Reilly, 2013.
  * [O'Reilly page](http://shop.oreilly.com/product/0636920027072.do)
  * [Amazon](http://www.amazon.com/Python-Cookbook-Third-David-Beazley/dp/1449340377)
  
## Warning

This presentation is about Python 3.  Most of the tips apply to Python 2, sometimes requiring minor changes to syntax or names.

However, let's be clear:  Python 3 is the present and the future.  Python 2 is the past.  Use Python 3.

*Python 2 is the Windows XP of the Python world.*

---
## Python delivered, free

* [Sage Math Cloud](cloud.sagemath.com)
* [PythonAnywhere](https://www.pythonanywhere.com/)

## Takeout

* [Anaconda](https://www.continuum.io/downloads) -- Python + many common scientific libraries, packaged for easy installation on Windows, Mac, Linux.  Easiest way for most users to go to from "zero to python" in a few minutes.
* [Official installers](https://www.python.org/downloads/) from `python.org`

---
# Use the iterator protocol

If `range()` and `len()` appear regularly in your code, there is probably a better way!

In [None]:
L = ['lion','zebra','bear','asp']

# Don't do this
for i in range(len(L)):
    print(L[i])

In [None]:
# Instead, do this
for x in L:
    print(x)

In [None]:
# Better yet
zoo = ['lion','zebra','bear','asp']
for animal in zoo:
    print(animal)

Advantage: **for object in collection** is flexible, expresses intent, avoids need for index variable

In [None]:
# works for unordered collections, too
teams = { 'White Sox', 'Cubs' ,'Yankees','Red Sox'} # this is a set
for t in teams:
    print(t)

---
# Aside: `set`, an unordered collection of distinct elements

In [None]:
L = [1,2,3,3,3]
set(L)  # set constructed from a list

In [None]:
type({})    # empty dict

In [None]:
type(set())  # empty set

In [None]:
A = {1,2,3,3,999}
B = {4,5,6,6,999}
A | B  # union

In [None]:
A & B  # intersection

In [None]:
A - B  # difference

These overloaded operators (|, &, -) are shorthand for calling methods of the set object.  However, calling the methods directly has an advantage: They can take the union, difference, etc. with *any iterable*.

In [None]:
#  A | [7,8,9]   # -> TypeError
#  A & [7,8,9]   # -> TypeError
#  A - [7,8,9]   # -> TypeError
A.union( [7,8,9] )
A.intersection( [7,8,9] )
A.difference( [7,8,9] )

There are also methods to modify (mutate) the set object itself, e.g. union becomes "update".

In [None]:
S = {'alpha', 'gamma'}
S.update( {'beta'} )    # equivalent to S = S | {'beta'}
S

---
# Back to iteration.

# If you really need the index: `enumerate`

In [None]:
L = ['lion','zebra','bear','asp']

# Ask for it
for i,x in enumerate(L):
    print('Animal with index {} is: {}'.format(i,x))

---
# Iterating over sorted collection:  `sorted`

In [None]:
L = ['lion','zebra','bear','asp']

for x in sorted(L):
    print(x)

---
# Custom sort: `key`

`key` is an optional keyword parameter of sorted, is applied to each element of L before comparisons

In [None]:
L = ['lion','zebra','bear','asp']

print('Animals by length of name:')
for x in sorted(L,key=len):
    print(x)

    
print('\nAnimals by number of "e"s:')
num_e = lambda s:s.count('e')
for x in sorted(L,key=num_e):
    print(x)

## Also operating on iterables (lists, sets, etc):

* `all` : cast to boolean and take logical and
* `any` : cast to boolean and take logical or
* `min`, `max` : These accept `key`, like `sorted`
* `sum`

---
# Iterating over dictionaries

In [None]:
d = {'a': 1, 'b': 2, 'c': 8675309}

for k in d:    # iterates over KEYS
    print(k)

In [None]:
d = {'a': 1, 'b': 2, 'c': 8675309}

for k in d:
    print(d[k])

If you really want keys and values together: `items()`

In [None]:
d = {'a': 1, 'b': 2, 'c': 8675309}

for k,v in d.items():
    print(k,'->',v)

Related: Testing membership tests **keys**

In [None]:
d = {'a': 1, 'b': 2, 'c': 8675309}

8675309 in d

---
# Extended slice syntax

In [None]:
L = ['Mathematical','science','is','in','my',
     'opinion','an','indivisible','whole']

In [None]:
L[0]  # First element

In [None]:
L[-1] # Last element

In [None]:
L[:-1] # All elements except the last (drop one from the end)

In [None]:
L[2:-1] # Drop two from the start, one from the end

In [None]:
L[::2] # Every other element (steps of 2)

In [None]:
L[::-1] # Reversed list (steps of -1)

In [None]:
L[-2:1:-1] # Reversed list, except first and last elements

---
# Use tuple unpacking

In [None]:
f = (34,8812,'Anne Example')  # Maybe this came from a file or database query
age = f[0]
idnum = f[1]
name = f[2]

name

In [None]:
# Better
age, idnum, name = (34,8812,'Anne Example')

name

---
# List and generator comprehensions

In [None]:
# Squares of integers congruent to 2 mod 7
[ x*x for x in range(100) if x % 7 == 2 ]

In [None]:
( x*x for x in range(100) if x % 7 == 2 )

In [None]:
# Generate the whole list, then iterate over it
for y in [ x*x for x in range(100) if x % 7 == 2 ]:
    print(y)

In [None]:
# Generate elements one by one, run loop body as they are produced
for y in ( x*x for x in range(100) if x % 7 == 2 ):
    print(y)

Generators are instances.  They can be assigned.  However, they are single-use.

In [None]:
G = ( x*x for x in range(100) if x % 7 == 2 )
for x in G:
    print('foo')
for x in G:  # Iteration already ended; will not run
    print('bar')

A function which creates and returns a list can often be replaced by a generator.
Statement `yield x` replaces "append x to the list we will later return".

---
# Gems from the `collections` module

## `namedtuple`:  creates a class which behaves like a `tuple` (immutable ordered collection), but whose entries have names as well as indices.

In [None]:
from collections import namedtuple

PersonDatum = namedtuple('PersonDatum',['age','idnum','name'])

F = PersonDatum(34,8812,'Anne Example')
F

In [None]:
print(F.age)
print(F.idnum)
print(F.name)

Alternative: A dict with keys 'age', 'idnum', 'name'.  *How to choose?*

* A dict is not a bad choice.
* Use `namedtuple` when the keys/attributes are **fixed** and will available in a fixed **order**.

## `defaultdict`: a dictionary where missing keys are assigned a default value on first use

Example: quick word length histogram

In [None]:
# Instead of this...
counts = {}
L = ['Mathematical','science','is','in','my',
     'opinion','an','indivisible','whole']

for word in L:
    n = len(word)
    if n not in counts:
        counts[n] = 0
    counts[n] += 1

for l,count in counts.items():
    print(count,'words of length',l)

In [None]:
# A defaultdict makes it a bit cleaner
from collections import defaultdict
counts = defaultdict(int)
L = ['Mathematical','science','is','in','my',
     'opinion','an','indivisible','whole']

for word in L:
    n = len(word)
    counts[n] += 1

for l,count in counts.items():
    print(count,'words of length',l)

Passing `int` as the parameter to `defaultdict` means that `int()` is called to generate a new value when an unknown key is accessed.  Since `int()` returns zero, this means the default zero is zero.

Drop in `defaultdict(list)` and now the same code *groups* words by their length.

## Also from `collections`:

* `OrderedDict` : Dictionary which remembers order in which keys were added, uses this order for iteration
* `deque` : Like a list, but removing the first element is **not** an expensive operation

---
# Gems from the `itertools` module

## built-in, but related: `zip`

In [None]:
L = [1,   2,   3,   5,  8, 13 ]
M = ['a', 'b', 'c', 'd']
N = ['foo','bar']
for l,m,n in zip(L,M,N):
    print(l,'-',m,'-',n)

## `groupby`

Take an iterator, return iterators over longest runs in which some quantity is constant

In [None]:
from itertools import groupby

L = ['Mathematical','science','is','in','my',
     'opinion','an','indivisible','whole']
for l,words in groupby(L,len):
    print('Found a run of words of length',l,':',list(words))

## permutations

In [None]:
from itertools import permutations

L = [1,2,3,4]
list(permutations(L))

In [None]:
L = [1,2,3,4]
list(permutations(L,2))

## combinations

In [None]:
from itertools import combinations

L = [1,2,3,4]
list(combinations(L,2))

# Useful decorators

Modify a function's behavior.  (They are functions that map functions to functions.)

## `functools.lru_cache`

Suppose you have a function which is expensive, but which will be called many times with the same arguments.

In [None]:
# Simple
def long_running(n):
    # expensive computation
    return result

In [None]:
# Better, use caching
def long_running(n,cache={}):
    if n in cache:
        return cache[n]
    else:
        # expensive computation
        cache[n] = result
        return result

In [None]:
# Cleaner and imposes size limits
from functools import lru_cache

@lru_cache(maxsize=128)
def long_running(n):
    # expensive computation
    return result

long_running(123)  # Will take a long time
long_running(123)  # Will return instantly

## Others (OO)
* `staticmethod` : method with no "self"; can treat a class as a "bag of functions"
* `classmethod` : method that gets class (not instance) as first parameter; useful for alternate constructors
* `property` : zero-parameter method turns into a computed attribute, i.e. `A.foo` instead of `A.foo()`

# Use keyword arguments for clarity

In [None]:
move_data('asd.dat','jkl.dat',True)
# Moved... where?  What's the source?  And what is "True" for?
# Did I just overwrite my priceless data?

In [None]:
move_data(source='asd.dat',dest='jkl.dat',overwrite_existing=True)

# Use context managers for file I/O

The file is closed on exit from the with-block (e.g. normally or by exception)

In [None]:
with open('out.txt','wt') as outfile:
    outfile.write('First line\n')
    risky_operation()  # possible exception terminates program?
    outfile.write('Possible second line\n')

# Error suppression context manager

Possibly more readable than a bunch of try...except pass blocks

In [None]:
from contextlib import suppress
import os

with suppress(FileNotFoundError):
    os.remove('does_not_exist.dat')

# Assertions

Not very popular in the Python community, but can be useful for debugging.  Assertion checking can be disabled with the '-O' command-line parameter.

In [None]:
assert True, 'no problem'
assert False, 'big problem'

Assertions should be reserved for statements that will be true unless a contract has been broken, so that it would not make sense to continue.  Assertions are not for catching routine errors.

In [None]:
def handle_everything():
    queue = get_work_queue()
    queue.process_all_events()
    assert queue.is_empty(), 'Queue not empty after process_all_events()"
    cleanup()

* If the condition shows that an error has occurred, raise an exception
* If the condition shows that a bug is present, *maybe* check it with an assertion

# New-style string formatting

In [None]:
# OLD printf style
'%s runs %d %s projects every semester' % ('MCL',4,'amazing')

In [None]:
# new style
'{} runs {} {} projects every semester'.format('MCL',4,'amazing')

In [None]:
# new style, positional indices
'{0} runs {2} {1} projects every semester'.format('MCL','amazing',4)

In [None]:
# new style, named fields
'{lab} runs {number} {adjective} projects every semester'.format(lab='MCL',number=4,adjective='amazing')

In [None]:
# new style, fields from dict
d = {'lab': 'MCL', 'number': 4, 'adjective': 'amazing'}
'{lab} runs {number} {adjective} projects every semester'.format(**d)