# Transforming Code into Beautiful, Idiomatic Python
https://www.youtube.com/watch?v=OSGv2VnC0go

A talk by Raymond Hettinger in 2013, one of the senior developers at Python, on some common patterns that are usually coded badly, and how to improve them.

## For loops
For loops in Python are not the same as in other languages - a better description would be 'for each', since they loop over collections using iterator

### Looping over a range of numbers

In [1]:
# ugly way
for i in [0,1,2,3,4,5]:
    print(i**2)

0
1
4
9
16
25


In [2]:
# less ugly way
for i in range(6):
    print(i**2)

0
1
4
9
16
25


Range _used to_ replicate the 1st block: it creates a list of 6 elements and iterates over it. So if you have 1m+ elements the list is going big (~32MB)

Now `range` has been replaced with what used to be `xrange`: an iteratorised version of range, which passes the values one at a time, so no memory (or nearly none) is used

### Looping over a collection

In [3]:
colors = ['red','green','blue','yellow']

# ugly way, from C
for i in range(len(colors)):
    print(colors[i])

red
green
blue
yellow


In [4]:
# Better way
for color in colors:
    print(color)

red
green
blue
yellow


### Looping backwards

In [5]:
# the VERY ugly way
for i in range(len(colors)-1,-1,-1):
    print(colors[i])

yellow
blue
green
red


In [6]:
# the python way
for color in reversed(colors):
    print(color)

yellow
blue
green
red


### Looping over a collection and indices

In [7]:
# ugly way
for i in range(len(colors)):
    print(i,'-->',colors[i])

0 --> red
1 --> green
2 --> blue
3 --> yellow


In [8]:
# Python way
for index, color in enumerate(colors):
    print(index,'-->',color)

0 --> red
1 --> green
2 --> blue
3 --> yellow


### Looping over two collections

In [9]:
names = ['Raymond','Rachel','Matthew']

# ugly way
n=min(len(names),len(colors))
for i in range(n):
    print(names[i],'-->',colors[i])

Raymond --> red
Rachel --> green
Matthew --> blue


In [10]:
# Python way
for name, color in zip(names, colors):
    print(name, color)

Raymond red
Rachel green
Matthew blue


similar to xrange, there used to be an izip. Believe this replaced zip. 

### Loop in sorted order
For stuff with simple comparisons, like sorting alphabetically

In [11]:
for color in sorted(colors):
    print(color)

blue
green
red
yellow


In [12]:
# reversal
for color in sorted(colors, reverse=True):
    print(color)

yellow
red
green
blue


But if you want custom sorting

In [13]:
# Used to be like this using custom comparators - 
# doesn't work any more
def compare_length(c1,c2):
    if len(c1) < len(c2): return -1
    if len(c1) > len(c2): return 1
    return 0

# print(sorted(colors, cmp=compare_length))

In [14]:
# Custom comparators SUCK. Key functions are better.
print(sorted(colors,key=len))

['red', 'blue', 'green', 'yellow']


## Sentinals and guardians

Traditionally if you want a loop until you reach a 'sentinal' value to terminate the sequence you would have a `while True` loop with a `break` somewhere in it

In [15]:
# blocks=[]
# while True:
#     block = f.read(32)
#     if block == '':
#         break
#     blocks.append(block)
    
# f isn't defined so this won't actually run

This is sucky for serveral reasons. 

In [16]:
# from functools import partial

# blocks = []
# for block in iter(partial(f.read, 32),''):
#     blocks.append(block)

The important bit is the 2 args being passed to `iter()`. The first argument is the function you call over and over again, the second is the sentinal value, the 'break' signal.

the strength here is that you've made it iterable, which are a powerful tool in Python. Lots of stuff you can do with this,

The `partial` function is pretty useful too, but not covered here - it reduces the amount of arguments you need to pass to a function.

Also note, sentinals are  actually problematic, not always good to use them. 

## Multiple exit points in loops, flag variables

For loops all have an if function built in, to test if they're at the end of the sequence yet. We since we have an if, we also have an else

In [17]:
def find(seq, target):
    found = False # the flag variable
    for i, value in enumerate(seq):
        if value == target:
            found = True
            break
    if not found:
        return -1
    return i

find(colors, 'yellow')

3

In [18]:
# Pythonic way

def find(seq, target):
    for i, v in enumerate(seq):
        if v == target:
            break
    else: return -1
    return i

find(colors, 'yellow')

3

This can be hard to remember - `else` is not descriptive (any more). Think of it instead of `nobreak`. A for has two outcomes: you finish the loop break out of it.

## Dictionaries
Dicts are the fundamental tool for expressing relationships, linking counting and grouping. You need to be super good at them

### Looping over dictionary keys

In [19]:
d = {'Raymond': 'red', 'Rachel': 'green', 'Matthew': 'blue'}

for k in d:
    print(k)

Raymond
Rachel
Matthew


In [20]:
# another way he shows which doesn't actually work - 
# his point was that you should use this when you want to mutate
# a dictionary, but this is specifically preventing that!
# Wonder what changed?

# for k in d.keys():
#     if k.startswith('R'):
#         del(d[k])
        
# it does actually delete the first k,v, then throws an exception
# if you run it 3 times all the r keys are deleted and it doesn't 
# throw anymore 

In [21]:
# he does put up but skip over this - like list comp
# for dicts?
{k:d[k] for k in d if not k.startswith('R')}

{'Matthew': 'blue'}

### Looping over KV pairs

In [22]:
# looping over keys and values: trad way, slow
for k in d:
    print(k,'-->',d[k])

Raymond --> red
Rachel --> green
Matthew --> blue


In [23]:
# good way
for k,v in d.items():
    print(k,'-->',v)

# R talks about iteritems - this was deprecated. 
# Not sure if it just replaced items, like xrange 

Raymond --> red
Rachel --> green
Matthew --> blue


### Creating a dictionary from pairs

In [24]:
d = dict(zip(names,colors))
d

{'Raymond': 'red', 'Rachel': 'green', 'Matthew': 'blue'}

### Counting with dictionaries


In [25]:
# most basic way
colors = ['red','green','red','blue','green','red']
d = {}
for color in colors:
    if color not in d:
        d[color] = 0
    d[color] += 1
d

{'red': 3, 'green': 2, 'blue': 1}

In [26]:
# next level: get
d = {}
for color in colors:
    d[color] = d.get(color, 0) + 1
    
d

{'red': 3, 'green': 2, 'blue': 1}

In [27]:
# ultra level - though the last one is still useful
from collections import defaultdict
d = defaultdict(int)
for color in colors:
    d[color] += 1
d

defaultdict(int, {'red': 3, 'green': 2, 'blue': 1})

### Grouping with dictionaries

In [28]:
names = ['raymond','rachel','matthew','roger',
        'betty','melissa','judith','charlie']

# basic idiom
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = []
    d[key].append(name)
d

{7: ['raymond', 'matthew', 'melissa', 'charlie'],
 6: ['rachel', 'judith'],
 5: ['roger', 'betty']}

In [29]:
# better way - setdefault - like get but has side effect of 
# inserting a missing key. Set default is crappy nondescriptive
# name, but hey ho
d = {}
for name in names:
    key = len(name)
    d.setdefault(key,[]).append(name)
d

{7: ['raymond', 'matthew', 'melissa', 'charlie'],
 6: ['rachel', 'judith'],
 5: ['roger', 'betty']}

In [30]:
# modern way
d = defaultdict(list)
for name in names:
    key = len(name)
    d[key].append(name)
    
d

defaultdict(list,
            {7: ['raymond', 'matthew', 'melissa', 'charlie'],
             6: ['rachel', 'judith'],
             5: ['roger', 'betty']})

### `popitems()` in dictionaries

In [31]:
d = {'Raymond': 'red', 'Rachel': 'green', 'Matthew': 'blue'}

while d:
    key, value = d.popitem()
    print(key,'-->',value)
    
# this is atomic apparently, thats important for threading
# apparently

Matthew --> blue
Rachel --> green
Raymond --> red


### Linking dictionaries with `ChainMap`
Not touching this


In [32]:
import argparse
defaults = {'color':'red','user':'guest'}
parser = argparse.ArgumentParser()
parser.add_argument('-u','--user')
parser.add_argument('-c','--color')
namespace = parser.parse_args([])
command_line_args = {k:v for k,v in
                    vars(namespace).items() if v}

In [33]:
import os
d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)
d
# this is bad apparently because it copies like crazy

{'color': 'red',
 'user': 'guest',
 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35

In [34]:
from collections import ChainMap
d = ChainMap(command_line_args, os.environ, defaults)
d

ChainMap({}, environ({'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*

## other
### Put keywords args in all your functions

### Named Tuples
are better than tuples to make your outputs (including error messages)more readable. Subclass of tuples


In [35]:
TestResults = (0,4)
TestResults

(0, 4)

In [36]:
from collections import namedtuple
TestResults = namedtuple('TestResults',['failed','attempted'])
TestResults
# not sure I'm doing that right

__main__.TestResults

### Getting values from a sequence

In [37]:
p = 'Raymond', 'Hettinger', 0x30, 'python@example.com'
# note this is tuple- apparently () are optional

In [38]:
# sucky way
fname = p[0]
lname = p[1]
age = p[2]
email = p[3]
fname

'Raymond'

In [39]:
# smart way
fname,lname,age,email = p
fname

'Raymond'

### simultaneous updates
Very important - eliminated entire class or errors due to out of order updates. Allows high level thinking; 'chunking', thinking in Excel 'rows' 

In [40]:
# dunder way
def fibonacci(n):
    x=0
    y=1
    for i in range(n):
        print(x)
        t=y
        y=x+y
        x=t

fibonacci(10)

0
1
1
2
3
5
8
13
21
34


In [41]:
# smart way
def fibonacci(n):
    x,y = 0,1
    for i in range(n):
        print(x)
        x,y = y,x+y 
        # this is the simultaneous updates

fibonacci(10)

0
1
1
2
3
5
8
13
21
34


Don't add strings with +, use join.

for lists, if you see `del names[0]`, `names.pop(0)`, or `names.insert(0,'mark')`, you're doing it wrong

the correct data structure is `deque`

In [42]:
from collections import deque

names = deque(['raymond','rachel','matthew','roger',
        'betty','melissa','judith','charlie'])

del names[0]

In [43]:
names.popleft()

'rachel'

In [44]:
names.appendleft('mark')

In [45]:
names

deque(['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie'])

Decorators and context managers rock. Good naming is essential. Use wisely

In [46]:
def web_lookup(url, saved={}):
    if url in saved:
        return saved[url]
    page = urllib.urlopen(url).read()
    saved[url]=page
    return page

This has an business logic part (opening a url and returning a webpage) and an admin component (cacheing the webpage for future use and returning the cached version if it exists.

This is usually a bad idea - mixes logic types and cache logic is not reusable. should be

In [47]:
def cache(func):
    saved = {}
    def wrap(*args):
        if args in saved:
            return wrap(*args)
        result = func(*args)
        saved[args] = result
        return result
    return wrap

@cache
def web_lookup(url):
    return urllib.urlopen(url).read()