![Py4Eng](img/logo.png)

# Pythonic Idioms
## Yoav Ram

# The Zen of Python

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


# PEP8

The Python coding style recommendation.

- Four spaces per indentation level
- Never mix tabs and spaces
- One blank line between functions
- Two blank lines between classes
- Space after `,` and `:`, but not before
- Space before and after operators (`=`, `+`, etc.), except in argument list
- `joined_lowercae` for variables and functions
- `ALL_CAPS` for constants
- `StudlyCaps` for classes
- `camelCase` only if adding to existing code that uses it
- `_` and `__` prefixes for "hidden" and builtin methods.
- Maximum 80 chars per line; break in arguments lists and dicts or use `\` if you must.
- One line per statement; One statement per line
- Docstrings (`"""`) for how to use the code
- Comments (`#`) for why and how the code is written

# Swapping values

In some languages:

In [2]:
a = 5
b = 3
print(a, b)

tmp = a
a = b
b = tmp
print(a, b)

5 3
3 5


In Python we can use _tuple packing and unpacking_:

In [3]:
print(a, b)
a, b = b, a
print(a, b)

3 5
5 3


This uses tuple packing and unpacking.

# Last statement is `_`

In [4]:
5 + 4

9

In [5]:
print(_)

9


# Zip sequences

`zip` takes several iterables and creates a new iterator in which object at index `i` is a tuple of the elements at index `i` in the original iterables.

In [6]:
given = ['John', 'Eric', 'Terry', 'Michael']
family = ['Cleese', 'Idle', 'Gilliam', 'Palin']
zip(given, family)

<zip at 0x10fdad048>

To use the `zip` object returned we need to iterate it or convert it to another type:

In [7]:
pythons = dict(zip(given, family))
print(pythons)

{'John': 'Cleese', 'Eric': 'Idle', 'Terry': 'Gilliam', 'Michael': 'Palin'}


# Implicit casting to `bool`

All types can be converted to `bool` implicitly:

In [8]:
lst = []
if lst:
    print(lst[0])
else:
    print("Empty list")

Empty list


In [9]:
a = 51
if a:
    print(a)
else:
    print('a is zero')

51


In [10]:
myname = ''
if not myname:
    print("I am nameless")
myname = 'Slim Shaddy'
if myname:
    print("My name is", myname)

I am nameless
My name is Slim Shaddy


In [11]:
class A:
    
    def __init__(self, a):
        self.a = a
        
    def __bool__(self):
        return bool(self.a)
    
a1 = A(1)
a2 = A(0)
if a1:
    print(a1.a)
if a2:
    print(a2.a)

1


# Default arguments

Functions can have default arguments:

In [12]:
def foo(a=1):
    print(a)

foo(5)
foo()

5
1


If the default value is mutable, we should take care:

In [13]:
def foo(item, container=[]):
    container.append(item)
    return container

In [14]:
print(foo(5))
print(foo(5))
print(foo(5))

[5]
[5, 5]
[5, 5, 5]


The default value is set once, at function definition, so a mutable value will be mutated at each call to the function.

So what can we do?

In [15]:
def foo(item, container=None):
    if container is None:
        container = [] 
    container.append(item)
    return container

In [16]:
print(foo(5))
print(foo(5))
print(foo(5))

[5]
[5]
[5]


# String formatting

Python used to use the `%` formatting, but the new approach is to use the `format` method of string.

See [Python String Format Cookbook](https://mkaz.github.io/2012/10/10/python-string-format/) for help.

In [17]:
chorus = """{0} bottles of beer on the wall,
{0} bottles of beer.
Take one down, pass it around,
{1} bottles of beer on the wall..."""

for bottles in range(3, 0, -1):
    print(chorus.format(bottles, bottles - 1))

3 bottles of beer on the wall,
3 bottles of beer.
Take one down, pass it around,
2 bottles of beer on the wall...
2 bottles of beer on the wall,
2 bottles of beer.
Take one down, pass it around,
1 bottles of beer on the wall...
1 bottles of beer on the wall,
1 bottles of beer.
Take one down, pass it around,
0 bottles of beer on the wall...


## Exercise

Given the variables below, print the string:

> Io was discovered in 1610 by Galileo Galilei and was last visited in 2007 by New Horizons. It's radius is 1822 km and it's mass is 8.93194e+22 kg.

In [20]:
name = 'Io'
discovery_year = 1610
discoverer = 'Galileo Galilei'
radius = 1821.6
mass = 8.931938e22
last_visited = '2007'
last_visitor = 'New Horizons'



# Sorting

Python has two builtin sorting functions that are very flexible.

- `sorted` accepts a collection and returns a (new) sorted copy.
- `sort` is a method that sorts in-place.

In [19]:
data = [32, 44, 1, 22, 30, -53, 75]

print(data)
print(sorted(data))
data.sort()
print(data)

[32, 44, 1, 22, 30, -53, 75]
[-53, 1, 22, 30, 32, 44, 75]
[-53, 1, 22, 30, 32, 44, 75]


You can also pass `sorted` and `sort` a `key` function that will determine the sorting:

In [20]:
names = ['Kobe', 'Shaq', 'MJ', 'Magic', 'Larry']
print("Unsorted:\n", names)
print("Sorted by deafult order:\n", sorted(names))
print("Sorted by length:\n", sorted(names, key=len))

Unsorted:
 ['Kobe', 'Shaq', 'MJ', 'Magic', 'Larry']
Sorted by deafult order:
 ['Kobe', 'Larry', 'MJ', 'Magic', 'Shaq']
Sorted by length:
 ['MJ', 'Kobe', 'Shaq', 'Magic', 'Larry']


**Note**: Python's sort - [timsort](https://en.wikipedia.org/wiki/Timsort) - is considered a very good sorting alogirthm. It was implemented by Tim Peters in 2002 (Python v2.3) and has since been ported to Java and other platforms.

# Hello World!

Because _Hello World!_ is such a common getting started exercise, the Python developers decided to facilitate such exercises:

In [21]:
import __hello__

Hello world!


There are some additional easter eggs in Python, we'll see some of them as we advance.

## $\lambda$ expressions

Python supports anonymous functions using the `lambda` statement:

In [22]:
(lambda x: x + 2)(6)

8

In [23]:
type(lambda x: x + 2)

function

In [24]:
f = lambda x: x + 2
print(f(6))
print(type(f))

8
<class 'function'>


# High-order functions

High-order function are function that return other functions, and some time also get functions as input.

Here is a function that, given a power, returns a function that raises numbers to that power:

In [25]:
def make_pow(x):
    def pow(y):
        return y**x
    return pow
square = make_pow(2)
square(5)

25

Note that the above is an example of a [**closure**](https://en.wikipedia.org/wiki/Closure_%28computer_programming%29): the `square` function has access to `x` even though it is defined in the scope of `make_pow`, which already returned.

## Exercise

Below you will find the function `compose(inner, outer)` that given two functions `inner(x)` and `outer(x)` defines a new function `composed(x) = outer(inner(x))`.

Use `compose` and `lambda` to define the function `f(x) = 2*x + 1`:

# Decorators

Decorators are used to *modify functions* (and class decorators modify classes). A decorator is a function that takes a function as an argument (and possibly other arguments, too) and returns a function as its output.

For example, maybe you want to print the output of some functions before they return. Instead of rewriting those functions, you can modify them with a decorator:

In [26]:
# some functions
def foo(x):
    return x + 1
def bar(x):
    return 2 * x

foo(10)
bar(10)

20

In [27]:
# the decorator
def print_output(func):
    def new_func(x):
        ret_val = func(x)
        print(ret_val)
        return ret_val
    return new_func

foo = print_output(foo)
bar = print_output(bar)
foo(10)
bar(10)

11
20


20

If you write the decorated function after you already wrote the decorator, you can use `@decorator_name` before the function definition:

In [28]:
@print_output
def square(x):
    return x**2

square(2) + square(3)

4
9


13

## Exercise

Write a decorator function called `memoize` that adds *memoization* to a function. The decorator saves results of the decorated function in a dictionary that maps input (assume a single immutble argument) to output (assume a single return value). Then, if the decorated function is called again with the same input, the result it pulled from the dictionary instead of being re-calculated.

Below is a recursive Fibonacci function, which is very inefficient without memoization.
The code below will compare the calculation of the 30th Fibonacci number with and without memoization.

In [216]:
def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)

In [217]:
%timeit -n 1 fib(30)
%timeit -n 1 memoize(fib)(30)

1 loop, best of 3: 549 ms per loop
1 loop, best of 3: 6.71 ms per loop


# Collections

Some intresting idioms from the `collections` module.

## `namedtuple`

`namedtuple` is a factory for classes that only have specific fields. These classes are subtypes of `tuple`, but their fields are not only ordered but also named.

In [29]:
from collections import namedtuple

Card = namedtuple('card', ('suit', 'number'))

cards = [Card(s, n) for s in 'H C S D'.split() for n in list(range(11)) + 'J Q K A'.split()]
cards[:10]

[card(suit='H', number=0),
 card(suit='H', number=1),
 card(suit='H', number=2),
 card(suit='H', number=3),
 card(suit='H', number=4),
 card(suit='H', number=5),
 card(suit='H', number=6),
 card(suit='H', number=7),
 card(suit='H', number=8),
 card(suit='H', number=9)]

`namedtuple` are still `tuple` so they can be iterated:

In [30]:
for x in cards[0]:
    print(x)

H
0


But the values can also be access via the properties:

In [31]:
cards[-1].suit

'D'

## `defaultdict` 

A `dict` that can be used with a default value. 
When creating a new instance you provide a function that will return a default value for missing keys.

In [32]:
from collections import defaultdict

d = defaultdict(int)
d['a'] = 1

print(d)
print(d['a'], d['b'])

defaultdict(<class 'int'>, {'a': 1})
1 0


This is nice as it saves the trouble of calling `d.get`, but it can be extra useful when trying to build a multi-dict, in which each key has an associated collection of values, rather than a single value.

Instead of doing:

In [33]:
d = dict()
lst = d.get('a', [])
lst.append(1)
d['a'] = lst

print(d)

{'a': [1]}


you can use a single call:

In [34]:
d = defaultdict(list)
d['a'].append(1)

print(d)

defaultdict(<class 'list'>, {'a': [1]})


This results in a shorter, nicer code, but also a more efficient one, lookup and insertion are done in one step in C, so we save a function call.

## Example

This example integrates a bunch of idioms we saw.

We want to find the set of letters that follows each letter in *Gulliver's Travels*.

**Notes** 
- we use a dictionary in which the key is a letter and the value is a `set` that keeps track of letters following the key.
- `str.isalpha` returns `True` if the string is a letter (a-z, A-Z).
- `next` advances an iterator
- `StopIteration` is raised when an iterator is depleted. We usually use iterators in a `for` loops, which stops when `StopIteration` is raised, but if we manually advance an iterator with `next` then we need to use a `try-except` block.
- [`itertools.tee`](https://docs.python.org/3.5/library/itertools.html#itertools.tee) creates two (or more, you can specify how many) identical but independent iterators.

In [35]:
from itertools import tee
from collections import defaultdict

d = defaultdict(set)

with open('../data/gulliver.txt') as f:
    lines = map(str.lower, f)
    for line in lines: # read line by line
        letters = filter(str.isalpha, line) # filter out non-letters
        letters, next_letters = tee(letters) # create two identical independent letters iterators
        try:
            next(next_letters) # next_letters should be one ahead of letters
        except StopIteration:
            continue # no letters, go to next line
        for ltr, next_ltr in zip(letters, next_letters): # iterate over letters and non-letters
            d[ltr].add(next_ltr)
        
# now to print the results
for k,v in sorted(d.items()):
    print('{0}: {1}'.format(k, str.join('', sorted(v))))

a: abcdefghijklmnopqrstuvwxyz
b: abcdehijlmnoprstuvwy
c: abcdeghiklmnopqrstuwy
d: abcdefghijklmnopqrstuvwy
e: abcdefghijklmnopqrstuvwxyz
f: abcdefghijklmnopqrstuvwy
g: abcdefghijklmnopqrstuvwy
h: abcdefghijklmnopqrstuvwyz
i: abcdefghiklmnopqrstuvwxz
j: aeilou
k: abcdefghijlmnoprstuwy
l: abcdefghiklmnopqrstuvwy
m: abcdefghijklmnoprstuvwy
n: abcdefghijklmnopqrstuvwy
o: abcdefghijklmnopqrstuvwxyz
p: abcdefghiklmnoprstuwy
q: u
r: abcdefghijklmnopqrstuvwyz
s: abcdefghijklmnopqrstuvwy
t: abcdefghijklmnopqrstuvwxyz
u: abcdefghilmnopqrstwxyz
v: aeilouy
w: abcdefghiklmnoprstuvwy
x: abcdefhimnoprstuwy
y: abcdefghijklmnopqrstuvwy
z: aceilouyz


## Exercise

Now we want to keep track of the frequencies of the letters after each letter, not just their identiy.

We can easily do this by using `collections.Counter` as the dictionary value instead of `set`.

Change the code to implement this change:

In [125]:
from itertools import tee
from collections import defaultdict

d = defaultdict(set)

with open('../data/gulliver.txt') as f:
    lines = map(str.lower, f)
    for line in lines: # read line by line
        letters = filter(str.isalpha, line) # filter out non-letters
        letters, next_letters = tee(letters) # create two identical independent letters iterators
        try:
            next(next_letters) # next_letters should be one ahead of letters
        except StopIteration:
            continue # no letters, go to next line
        for ltr, next_ltr in zip(letters, next_letters): # iterate over letters and non-letters
            d[ltr].add(next_ltr)
        
# now to print the results
for k,v in sorted(d.items()):
    print('{0}: {1}'.format(k, str.join('', sorted(v))))

In [126]:
# now to print the results
for k1,c in sorted(d.items()):
    print(' {} \n---'.format(k1))
    for k2,v in sorted(c.items()):
        print('{}: {}'.format(k2, v), end=' | ')
    print()

 a 
---
a: 22 | b: 608 | c: 711 | d: 984 | e: 8 | f: 290 | g: 463 | h: 78 | i: 669 | j: 146 | k: 244 | l: 1598 | m: 524 | n: 3816 | o: 13 | p: 547 | q: 5 | r: 1946 | s: 2063 | t: 2507 | u: 185 | v: 520 | w: 191 | x: 14 | y: 327 | z: 13 | 
 b 
---
a: 173 | b: 19 | c: 1 | d: 9 | e: 1165 | h: 3 | i: 161 | j: 32 | l: 488 | m: 8 | n: 3 | o: 583 | p: 3 | r: 175 | s: 127 | t: 18 | u: 418 | v: 3 | w: 2 | y: 480 | 
 c 
---
a: 592 | b: 2 | c: 143 | d: 8 | e: 1170 | g: 2 | h: 1237 | i: 272 | k: 282 | l: 286 | m: 5 | n: 6 | o: 1445 | p: 5 | q: 7 | r: 242 | s: 26 | t: 526 | u: 273 | w: 28 | y: 27 | 
 d 
---
a: 997 | b: 432 | c: 150 | d: 209 | e: 1482 | f: 257 | g: 140 | h: 325 | i: 1334 | j: 13 | k: 28 | l: 202 | m: 393 | n: 203 | o: 761 | p: 149 | q: 16 | r: 341 | s: 656 | t: 1131 | u: 213 | v: 85 | w: 353 | y: 127 | 
 e 
---
a: 2718 | b: 436 | c: 1160 | d: 2809 | e: 1184 | f: 776 | g: 376 | h: 483 | i: 1153 | j: 22 | k: 144 | l: 1114 | m: 1216 | n: 2858 | o: 864 | p: 618 | q: 123 | r: 4381 | s: 2

## `deque`

[Deques](https://docs.python.org/3.5/library/collections.html#collections.deque) are a **generalization of stacks and queues** (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient **appends and pops from either side** of the deque with approximately the same **O(1) performance** in either direction. Though `list` objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for `pop(0)` and `insert(0, v)` operations which change both the size and position of the underlying data representation.

In [36]:
from collections import deque

lst = [range(100000)]
deck = deque([range(100000)])

In [37]:
%timeit lst.append(1)
%timeit lst.insert(0, 1)

76 ns ± 5.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
83.8 ms ± 9.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [38]:
%timeit deck.append(1)
%timeit deck.appendleft(1)

92.8 ns ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
87 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


Since deques can be appended and poped from both sides, they are great for communication between threads, and deque is also thread-safe.

## Example

Bounded length deques provide functionality similar to the tail filter in Unix:

In [39]:
def tail(filename, n=10):    
    with open(filename) as f:
        return deque(f, maxlen=n)

In [41]:
last10 = tail('../data/heart.csv', n=10)
for line in last10:
    print(line, end='')

63.0
12.0
29.0
48.0
297.0
50.0
68.0
26.0
161.0
14.0


# Future statements

A [future statement](https://docs.python.org/3.5/reference/simple_stmts.html#future) facilitates migration to future versions of Python that introduce **incompatible changes** to the language. It allows use of the new features on a per-request basis before the release in which the feature becomes standard. 

A future statement must appear near the top of the module.

When working with Python 3.5, no future statements are needed. However, if you wish to allow the code to run with Python 2.x, future statements are very useful. For example, for using `print` as a function (`print(x)`) rather than a statement (`print x`), you can use `from __future__ import print_function`:

In [42]:
from __future__ import print_function
print('Hello World!')

Hello World!


If you really want to support Python 2, I suggest using [_python-future_](http://python-future.org/): it allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

## Exercise

The _curly brace delimited blocks_ feature was requested several times by users coming from other programming languages.

Use the future statement `braces` to run `fast_pow` written with curly brackets to compute $5^{17}$.

In [None]:
def fast_pow(x, y) {
    if y == 0 {
        return 1
    } elif y % 2 == 0 {
        tmp = fast_pow(x, y // 2)
        return tmp * tmp
    } else {
        return x * fast_pow(x, y - 1)
    }
}
print("5^17 =", fast_pow(5, 17))

# References
- Some code and ideas were taken from [Code like a Pythonista](http://python.net/~goodger/projects/pycon/2007/idiomatic/presentation.html)
- [Decorators I: Introduction to Python Decorators](http://www.artima.com/weblogs/viewpost.jsp?thread=240808)
- [Python syntax and semantics](https://en.wikipedia.org/wiki/Python_syntax_and_semantics) on Wikipedia
- The [Collections module](https://docs.python.org/3.5/library/collections.html) implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers.

## Colophon
This notebook was written by [Yoav Ram](http://python.yoavram.com) and is part of the [_Python for Engineers_](https://github.com/yoavram/Py4Eng) course.

The notebook was written using [Python](http://python.org/) 3.6.1.
Dependencies listed in [environment.yml](../environment.yml), full versions in [environment_full.yml](../environment_full.yml).

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)