# Hettinger on iterables

Raymond Hettinger says that `for` should have been called `foreach`.

This is because indices should almost always _never_ be used to do array lookups.

## `range`

The list of ten numbers below will be allocated entirely in memory:


In [2]:
print('handmade list:')

for i in [0, 1, 2, 3, 4, 6, 7, 8, 9]:
    if (i % 2 == 0): print(i)

handmade list:
0
2
4
6
8


The `range` function was first developed to replicate the list above—including the memory allocation.

Then in order to save memory `xrange` was developed.

Finally, `range` replaced `xrange` in order to do it right the first time:


In [5]:
print('range() function:')

for i in range(0, 9, 2):
    print(i)

range() function:
0
2
4
6
8


Indices should almost always _never_ be used to do array lookups.

This means that this C++ habit is not cool:

In [8]:
colors = ['red', 'green', 'blue', 'yellow']

print('crappy C++ style:')

for i in range(len(colors)):
    print(colors[i])

crappy C++ style:
red
green
blue
yellow


`for` in Python is just like `foreach` in C#:

In [10]:
print('cool Python style:')

for color in colors:
    print(color)

cool Python style:
red
green
blue
yellow


LINQ of the .NET Framework: `Enumerable.Range()` is the equivalent of `range(<iterable>)` in Python.

## `reversed`

We can use `for` to loop backwards through a list using crappy C++ style:

In [15]:
colors = ['red', 'green', 'blue', 'yellow']

print('crappy C++ style:')

for i in range(len(colors) - 1, -1, -1):
    print(colors[i])

crappy C++ style:
yellow
blue
green
red


Or we can just use the `reversed` function:

In [16]:
print('cool Python style:')

for color in reversed(colors):
    print(color)

cool Python style:
yellow
blue
green
red


LINQ of the .NET Framework: `IEnumerable.Reverse()` is the equivalent of `reversed(<iterable>)` in Python.

## `enumerate`

We can use `for` and `range` to loop over a list, showing indices, using crappy C++ style:

In [19]:
colors = ['red', 'green', 'blue', 'yellow']

print('crappy C++ style:')

for i in range(len(colors)):
    print(i, '-->', colors[i])

crappy C++ style:
(0, '-->', 'red')
(1, '-->', 'green')
(2, '-->', 'blue')
(3, '-->', 'yellow')


Or we can just use the `enumerate` function:

In [21]:
print('cool Python style:')

for i, color in enumerate(colors):
    print(i, '-->', colors[i])

cool Python style:
(0, '-->', 'red')
(1, '-->', 'green')
(2, '-->', 'blue')
(3, '-->', 'yellow')


LINQ of the .NET Framework: `IEnumerable<string>.Select((color, i) => $"{i} --> {color}")` is the equivalent of the use of `enumerate(<iterable>)` above in Python.

## `zip`

We can use `for`, `range` and `min` to loop over two lists using crappy C++ style:

In [24]:
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']

print('crappy C++ style:')

n = min(len(names), len(colors))
for i in range(n):
    print(names[i], '-->', colors[i])

crappy C++ style:
('raymond', '-->', 'red')
('rachel', '-->', 'green')
('matthew', '-->', 'blue')


Or we can just use the `zip` function:

In [27]:
print('cool Python style:')

for name, color in zip(names, colors):
    print(name, '-->', color)

cool Python style:
('raymond', '-->', 'red')
('rachel', '-->', 'green')
('matthew', '-->', 'blue')


LINQ of the .NET Framework: `IEnumerable<string>.Zip(IEnumerable<string>, (name, color) => $"{name} --> {color}")` is the equivalent of the use of `zip(<iterable>)` above in Python.

Raymond Hettinger says that `zip` goes back 50 years to the original paper on Lisp.

This decades-old history also implies that it is not taking advantage
of modern caching in CPUs as `zip` uses a third list to combine the two original lists.

It follows that Raymond Hettinger recommends the use of `izip`.

However, the efficiency of `izip` is in the `zip` function of Python 3.x.

## `sorted`

The `sorted()` function has no crappy C++ equivalents:

In [29]:
colors = ['red', 'green', 'blue', 'yellow']

for color in sorted(colors):
    print(color)

print('\nsort descending:')

for color in sorted(colors, reverse=True):
    print(color)

blue
green
red
yellow

sort descending:
yellow
red
green
blue


When a custom sort order is required, crappy C++ qsort habits have emerged (the -1,1,0 convention):

In [31]:
def compare_length(c1, c2):
    if len(c1) < len(c2): return -1
    if len(c1) > len(c2): return 1
    return 0

print('crappy C++ style with `cmp=`:')

try:
    print(sorted(colors, cmp=compare_length))
except:
    print('Python 3.x no longer supports `cmp=`.')


crappy C++ style with `cmp=`:
['red', 'blue', 'green', 'yellow']


Use the `key` parameter of the `sorted()` function before resorting to qsort habits:

In [33]:
print('\ncool Python style:')

print(sorted(colors, key=len))


cool Python style:
['red', 'blue', 'green', 'yellow']


LINQ of the .NET Framework: `IEnumerable<string>.OrderBy(color => color.Length)` is the equivalent to the use of `sorted(<iterable>)` here in Python.

## `iter`

Raymond Hettinger wants to tell us that Python has `iter` to remedy the traditional approach to the sentinel-value (or control-break) pattern.

Before he can do that he has to show us crappy way it was done in, say, C++.

He shows us through the use of `open`, reading a text file:

In [35]:
from functools import partial
import os

txt = f'{os.path.dirname(__file__)}/05-iter.txt'
size = 8

try:
    with open(txt) as f:

        print('crappy C++ style:')

        blocks = []
        while True:
            block = f.read(size)
            if block == '':
                break
            blocks.append(block)

        print(blocks)

    with open(txt) as f:

        print('\ncool Python style:')

        blocks = []
        for block in iter(partial(f.read, size), ''):
            blocks.append(block)

        print(blocks)

except IOError as err:
    print(err)

SyntaxError: invalid syntax (<ipython-input-35-7f61a9167b48>, line 4)

We see that `iter` depends on the use of `partial` such that `TextIOBase.Read()` can be a callable object without any parameters.

## `for`-`else`

We can use `for` and `enumerate` to find a list item by index using crappy C++ style:

In [37]:
colors = ['red', 'green', 'blue', 'yellow']

print('crappy C++ style:')


def find_cpp_style(seq, target):
    found = False
    for i, value in enumerate(seq):
        if value == target:
            found = True
            break
    if not found:
        return -1
    return i

print(find_cpp_style(colors, 'blue'))

crappy C++ style:
2


Or we can the `for`-`else` construct:

In [40]:
print('cool Python style:')


def find_py_style(seq, target):
    for i, value in enumerate(seq):
        if value == target:
            break
    else:
        return -1
    return i

print(find_py_style(colors, 'blue'))

cool Python style:
2


Raymond Hettinger says that inside every `for` is an `if` and `goto` which makes the use of `else` with `for` kind of make sense.

Raymond would prefer that instead of `for`-`else` we would have `for`-`nobreak`.

## dictionary

Raymond Hettinger says that mastering dictionaries is a _fundamental_ Python skill and he starts his Python classes with three days of work on dictionary relationships, counting, grouping and linking.

The key-value relationship in a dictionary has been expressed in two ways.

In [42]:
d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

print('dictionary key-value immutable (default) relationship:')

for k in d:
    print(k)

print('\ndictionary key-value mutable relationship:')

for k in list(d.keys()):  # Python 3.x requires the use of list()
    if k.startswith('r'):
        del d[k]
print(d)

dictionary key-value immutable (default) relationship:
matthew
rachel
raymond

dictionary key-value mutable relationship:
{'matthew': 'blue'}


Explicitly printing out the relationship between keys and values has only one anti-pattern in Python 3.x:

In [44]:
print('dictionary key-value relationship (anti-pattern):')

for k in d:
    print(k, '-->', d[k])  # causes re-hashing with every lookup

print('\ndictionary key-value relationship:')

for k, v in d.items():
    print(k, '-->', v)

dictionary key-value relationship (anti-pattern):
('matthew', '-->', 'blue')

dictionary key-value relationship:
('matthew', '-->', 'blue')


The use of `items()` was replaced by `iteritems()` in Python 2.x because `items()` caused memory allocation(s) (tuple unpacking).

In Python 3.x, `items()` is now the equivalent of `iteritems()`.

In Python 3.x, we can generate a dictionary (`dict`) with `zip`, _linking_ two lists:

In [46]:
print('\n`dict` and `zip`:')

names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue']

print(dict(zip(names, colors)))


`dict` and `zip`:
{'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}


Raymond Hettinger shows us two ways to count with a dictionary.

First, we see the beginners way:

In [48]:
colors = ['red', 'green', 'red', 'blue', 'green', 'red']

print('\nbeginners counting:')

d = {}
for color in colors:
    if color not in d:
        d[color] = 0
    d[color] += 1
print(d)


beginners counting:
{'blue': 1, 'green': 2, 'red': 3}


Then we discover the `get()` method of `dict`:

In [50]:
print('\ncounting with `dict.get()`:')

d = {}
for color in colors:
    d[color] = d.get(color, 0) + 1
print(d)


counting with `dict.get()`:
{'blue': 1, 'green': 2, 'red': 3}


Then we get more modern with a subclass of `dict`, the `defaultdict`:

In [52]:
from collections import defaultdict

print('\ncounting with `defaultdict`:')

d = defaultdict(int)
for color in colors:
    d[color] += 1
print(d)


counting with `defaultdict`:
defaultdict(<type 'int'>, {'blue': 1, 'green': 2, 'red': 3})


We have our traditional way of grouping:

In [54]:
names = [
    'raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith',
    'charlie'
]

print('\ntraditional grouping (by string length):')

d = {}
for name in names:
    key = len(name)  # grouping by string length
    if key not in d:
        d[key] = []
    d[key].append(name)
print(d)


traditional grouping (by string length):
{5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}


Then, we have our traditional, Python-specific way of grouping with `dict.setdefault()`:

In [56]:
print('traditional, Python-specific way of grouping:')

d = {}
for name in names:
    key = len(name)  # grouping by string length
    d.setdefault(key, []).append(name)
print(d)

traditional, Python-specific way of grouping:
{5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}


The modern way of grouping returns again to `defaultdict`:

In [58]:
print('modern way of grouping with `defaultdict`:')

d = defaultdict(list)
for name in names:
    key = len(name)  # grouping by string length
    d[key].append(name)
print(d)

modern way of grouping with `defaultdict`:
defaultdict(<type 'list'>, {5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']})


Finally, we see that the linking dictionaries was traditionally involved with command-line argument parsing with `argparse`:

In [59]:
import argparse
import os

defaults = {'color': 'red', 'user': 'guest'}
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args([])
command_line_args = {k: v for k, v in vars(namespace).items() if v}

print('\ntraditional, memory-intensive way of linking dictionaries:')

d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)

print(d)

print('\nmodern way of linking dictionaries:')

from collections import ChainMap

d = ChainMap(command_line_args, os.environ)

print(d)


traditional, memory-intensive way of linking dictionaries:
{'COMPIZ_BIN_PATH': '/usr/bin/', 'QT_QPA_PLATFORMTHEME': 'appmenu-qt5', 'XDG_GREETER_DATA_DIR': '/var/lib/lightdm-data/rasx', 'color': 'red', 'UPSTART_EVENTS': 'xsession started', 'TERM_PROGRAM_VERSION': '1.21.0', 'LESSOPEN': '| /usr/bin/lesspipe %s', 'UNITY_HAS_3D_SUPPORT': 'true', 'XDG_SESSION_TYPE': 'x11', 'QT_IM_MODULE': 'ibus', 'LOGNAME': 'rasx', 'USER': 'rasx', 'PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/rasx/appRoot/dotnet-script.0.13.0/dotnet-script/:/home/rasx/appRoot/dotnet-script.0.13.0/dotnet-script/', 'XDG_VTNR': '7', 'GNOME_KEYRING_CONTROL': '', 'VSCODE_IPC_HOOK': '/run/user/1000/vscode-e59f0211-1.21.0-main.sock', 'NODE_PATH': '/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript', 'DISPLAY': ':0', 'TERM_PROGRAM': 'vscode', 'LANG': 'en_US.UTF-8', 'TERM': 'xterm-color', 'SHELL': '/bin/bash', 'XDG_SESSION_PATH': '/org/freede

ImportError: cannot import name ChainMap

## `namedtuple`

Raymond Hettinger introduces the `namedtuple()` factory function in the context of `doctest` where the output of `doctest.testmod()` is a named tuple:

In [61]:
import doctest

print(doctest.testmod())

TestResults(failed=0, attempted=0)


We then replicate this output by defining our own named tuple:

In [63]:
from collections import namedtuple

TestResults = namedtuple('TestResults', ['failed', 'attempted'])

print(TestResults(0, 4))

TestResults(failed=0, attempted=4)


## sequence unpacking

In [64]:
p = 'Raymond', 'Hettinger', 0x30, 'python@example.com'

print('unpacking the c-language-neutral (crappy) way:')

fname = p[0]
lname = p[1]
age = p[2]
email = p[3]

print(fname, lname, age, email)

print('\nunpacking the Python way:')

fname, lname, age, email = p

print(fname, lname, age, email)

unpacking the c-language-neutral (crappy) way:
('Raymond', 'Hettinger', 48, 'python@example.com')

unpacking the Python way:
('Raymond', 'Hettinger', 48, 'python@example.com')


## updating multiple variables

In [66]:
def fibonacci_basic(n):
    x = 0
    y = 1
    for i in range(n):
        print(i, '-->', x)
        t = y
        y = x + y
        x = t


print('basic function:')
print(fibonacci_basic(10))


def fibonacci_modern(n):
    x, y = 0, 1
    for i in range(n):
        print(i, '-->', x)
        x, y = y, x + y # no temp variables!


print('\nmodern function:')
print(fibonacci_modern(10))

basic function:
(0, '-->', 0)
(1, '-->', 1)
(2, '-->', 1)
(3, '-->', 2)
(4, '-->', 3)
(5, '-->', 5)
(6, '-->', 8)
(7, '-->', 13)
(8, '-->', 21)
(9, '-->', 34)
None

modern function:
(0, '-->', 0)
(1, '-->', 1)
(2, '-->', 1)
(3, '-->', 2)
(4, '-->', 3)
(5, '-->', 5)
(6, '-->', 8)
(7, '-->', 13)
(8, '-->', 21)
(9, '-->', 34)
None


## join

In [68]:
names = [
    'raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith',
    'charlie'
]

print('do not do this:')

s = names[0]
for name in names[1:]:
    s += ', ' + name
print(s)


print('\nuse `join()`:')

print(', '.join(names))

do not do this:
raymond, rachel, matthew, roger, betty, melissa, judith, charlie

use `join()`:
raymond, rachel, matthew, roger, betty, melissa, judith, charlie


## `deque`

In [70]:
names = [
    'raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith',
    'charlie'
]

print('this is way too slow:')

del names[0]
names.pop(0)
names.insert(0, 'mark')

print(names)

print('\nuse the `deque` container instead:')

from collections import deque

names = deque([
    'raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith',
    'charlie'
])

del names[0]
names.popleft()
names.appendleft('mark')

print(names)

this is way too slow:
['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie']

use the `deque` container instead:
deque(['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie'])


## urlopen and cache decorator

In [71]:
import urllib.request
from functools import lru_cache


def web_lookup_old_school(url, saved={}):
    if url in saved:
        return saved[url]
    with urllib.request.urlopen(url) as req:
        page = req.read()
        saved[url] = page
        return page


@lru_cache(maxsize=32)
def web_lookup_modern(url):
    with urllib.request.urlopen(url) as req:
        return req.read()

ImportError: No module named request