# 6. Built-in Modules

Python takes a "batteries included" approach to the standard library.

The full set of standard modules is too large to cover in one chapter. But some of these built-in packages are so closely interwined with idiomatic Python that they may as well be part of the lan spec.


## Item 42: Define Function Decorators with functools.wraps

Python has special syntax for decorators that can be applied to functions. Decorators have the ability to run additional code before and after any calls to the functions they wrap. This allows them to access and modify input arguments and return values. This functionality can be useful for enforcing semantics, debugging, registering functions, and more.

For example, say you want to print the arguments and return value of a function call. This is especially helpful when debugging a stack of function calls from a recursive function.

In [1]:
def trace(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' % 
            (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fib(n):
    if n in (0, 1):
        return n
    return fib(n-1) + fib(n-2)

fib(3)

fib((1,), {}) -> 1
fib((0,), {}) -> 0
fib((2,), {}) -> 1
fib((1,), {}) -> 1
fib((3,), {}) -> 2


2

The `@` symbol is equivalent to calling the deco on the func it wraps and assigning the return value to the original name **in the same scope**.

```python
fib = trace(fib)
```

This works well, but the func doesn't think it's named `fib`.

In [2]:
fib

<function __main__.trace.<locals>.wrapper>

This behavior is problematic because it undermines tools that do introspection, such as debuggers and object serializers. e.g. the `help` func.

The solution is to use the `wraps` helper func from the `functools` module. **Applying it to the wrapper func will copy all of the important metadata about the inner func to the outer function**.

In [5]:
from functools import wraps

def trace(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' % 
            (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fib(n):
    """Return the n-th Fibonacci number."""
    if n in (0, 1):
        return n
    return fib(n-1) + fib(n-2)

fib(3)

fib((1,), {}) -> 1
fib((0,), {}) -> 0
fib((2,), {}) -> 1
fib((1,), {}) -> 1
fib((3,), {}) -> 2


2

In [6]:
fib

<function __main__.fib>

In [7]:
help(fib)

Help on function fib in module __main__:

fib(n)
    Return the n-th Fibonacci number.



Things to Remember
* Decorators are Python syntax for allowing one function to modify another function at runtime.
* Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
* Use the wraps decorator from the functools built-in module when you define your own decorators to avoid any issues.

## Item 43: Consider contextlib and with Statements for Reusable try/finally Behavior

The with statement in Python is used to indicate when code is running in a special context. For example, mutual exclusion locks (see Item 38: “Use Lock to Prevent Data Races in Threads”) can be used in with statements to indicate that the indented code only runs while the lock is held.

In [9]:
from multiprocessing import Lock

lock = Lock()
with lock:
    print('Lock is held')

Lock is held


In [11]:
# It's equivalent to
lock.acquire()
try:
    print('Lock is held')
finally:
    lock.release()

Lock is held


The `with` statement version of this is better because it **eliminates the need to write the repetitive code of the try/finally construction**. It’s easy to make your objects and functions capable of use in with statements by using the **contextlib** built-in module. This module contains the **contextmanager decorator**, which lets a simple function be used in with statements. This is much easier than defining a new class with the special methods `__enter__` and `__exit__` (the standard way).

For example, say you want a region of your code to have more debug logging sometimes. Here, I define a function that does logging at two severity levels:

In [12]:
import logging

logging.basicConfig(level=logging.WARNING)

def my_func():
    logging.debug('Some debug data')
    logging.error('Error log data')
    logging.debug('More debug data')
    
my_func()

In [15]:
import logging
from contextlib import contextmanager


def my_func():
    logging.debug('Some debug data')
    logging.error('Error log data')
    logging.debug('More debug data')


@contextmanager
def debug_logging(level):
    logger = logging.getLogger()
    old_level = logger.getEffectiveLevel()
    logger.setLevel(level)
    try:
        yield
    finally:
        logger.setLevel(old_level)

with debug_logging(logging.DEBUG):
    print('Inside: ')
    my_func()

print('After: ')
my_func()


Inside: 
After: 


### Using with Targets

The context manager passed to a `with` statement may also return an object. This object is assigned to a local variable in the as part of the compound statement. This gives the code running in the `with` block the ability to directly interact with its context.

For example, say you want to write a file and ensure that it’s always closed correctly. You can do this by passing `open` to the `with` statement. open returns a file handle for the `as` target of `with` and will close the handle when the `with` block exits.

```python
with open('/tmp/my_output.txt', 'w') as handle:
    handle.write('This is some data!')
```

**This approach is preferable to manually opening and closing the file handle every time.** It gives you confidence that the file is eventually closed when execution leaves the with statement. It also encourages you to reduce the amount of code that executes while the file handle is open, which is good practice in general.

To enable your own functions to supply values for as targets, all you need to do is `yield` a value from ur context manager.

Things to Remember
* The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
* The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
* The value yielded by context managers is supplied to the as part of the with statement. It’s useful for letting your code directly access the cause of the special context.

## Item 44: Make pickle Reliable with copyreg

The pickle built-in module can serialize Python objects into a stream of bytes and deserialize bytes back into objects. Pickled byte streams shouldn’t be used to communicate between untrusted parties. The purpose of pickle is to let you pass Python objects between programs that you control over binary channels.

**The pickle module’s serialization format is unsafe by design**. The serialized data contains what is essentially a program that describes how to reconstruct the original Python object. This means a malicious pickle payload could be used to compromise any part of the Python program that attempts to deserialize it.

In contrast, **the json module is safe by design**. Serialized JSON data contains a simple description of an object hierarchy. Deserializing JSON data does not expose a Python program to any additional risk. Formats like JSON should be used for communication between programs or people that don’t trust each other.

## Item 45: Use datatime Instead of time for Local Clocks

Coordinated Universal Time (UTC) is the standard, time-zone-independent representation of time. UTC works great for computers that represent time as seconds since the UNIX epoch. But UTC isn’t ideal for humans. Humans reference time relative to where they’re currently located. People say “noon” or “8 am” instead of “UTC 15:00 minus 7 hours.” If your program handles time, you’ll probably find yourself converting time between UTC and local clocks to make it easier for humans to understand.

Python provides two ways of accomplishing time zone conversions. The old way, using the `time` built-in module, is disastrously error prone. The new way, using the `datetime` built-in module, works great with some help from the community-built package named `pytz`.

You should be acquainted with both time and datetime to thoroughly understand why **datetime is the best choice and time should be avoided**.


In [12]:
from time import localtime, strftime

now = 1407694710
local_tuple = localtime(now)
time_format = '%Y-%m-%d %H:%M:%S'
time_str = strftime(time_format, local_tuple)
time_str

'2014-08-11 02:18:30'

In [13]:
from time import mktime, strptime

time_tuple = strptime(time_str, time_format)
utc_now = mktime(time_tuple)
print(utc_now)

1407694710.0


**The time module fails to consistently work properly for multiple local times.** Thus, you should avoid the time module for this purpose. If you must use time, only use it to convert between UTC and the host computer’s local time. **For all other types of conversions, use the `datetime` module.**

...

To use `pytz` effectively, you should **always convert local times to UTC first**. Perform any datetime operations you need on the UTC values (such as offsetting). Then, convert to local times as a final step.

## Item 46: Use Built-in Algorithms and Data Structures

When you’re implementing Python programs that handle a non-trivial amount of data, you’ll eventually see slowdowns caused by the algorithmic complexity of your code. This usually isn’t the result of Python’s speed as a language (see Item 41: “Consider concurrent.futures for True Parallelism” if it is). The issue, more likely, is that you aren’t using the best algorithms and data structures for your problem.

Luckily, the Python standard library has many of the algorithms and data structures you’ll need to use built in. Besides speed, using these common algorithms and data structures can make your life easier. Some of the most valuable tools you may want to use are tricky to implement correctly. Avoiding reimplementation of common functionality will save you time and headaches.

### Double-ended Queue

The `deque` class from `collections` module is a double-ended deque. **It provides constant time operations for inserting or removing items from its beginning or end**. This makes it **ideal for FIFO queues**.

In [16]:
from collections import deque

fifo = deque()
fifo.append(1)
x = fifo.popleft()
x

1

### Ordered Dictionary

Standard dictionaries are unordered. That means a `dict` with the same keys and values can result in different orders of iteration. This behavior is a surprising byproduct of the way the dictionary’s fast hash table is implemented.

In [17]:
from random import randint

a = {}
a['foo'] = 1
a['bar'] = 2

while True:
    z = randint(99, 1013)
    b = {}
    for i in range(z):
        b[i] = i
    b['foo'] = 1
    b['bar'] = 2
    for i in range(z):
        del b[i]
    if str(b) != str(a):
        break
        
print(a)
print(b)

{'bar': 2, 'foo': 1}
{'foo': 1, 'bar': 2}


The `OrderedDict` class from the `collections` module is a special type of dictionary that **keeps track of the order in which its keys were inserted**. Iterating the keys of an `OrderedDict` has predictable behavior. This can vastly simplify testing and debugging by making all code deterministic.

In [18]:
from collections import OrderedDict

a = OrderedDict()
a['foo'] = 1
a['bar'] = 2

b = OrderedDict()
b['foo'] = 'red'
b['bar'] = 'blue'

for v1, v2 in zip(a.values(), b.values()):
    print(v1, v2)

1 red
2 blue


### Default Dictionary

Dictionaries are useful for bookkeeping and tracking statistics. One problem with dictionaries is that you can’t assume any keys are already present. That makes it clumsy to do simple things like increment a counter stored in a dictionary.

The `defaultdict` class from the `collections` module simplifies this by automatically storing a default value when a key doesn’t exist. All you have to do is **providing a function that will return the default value each time a key is missing**. In this example, the `int` built-in function returns 0 (see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for another example). Now, incrementing a counter is simple.

In [19]:
from collections import defaultdict

stats = defaultdict(int)
stats['counter'] += 1
print(stats['counter'])

print(stats['counter2'])

1
0


### Heap Queue

Heaps are useful data structures for maintaining a priority queue. The `heapq` module provides functions for **creating heaps in standard list types with functions like heappush, heappop, and nsmallest**.

Items of any priority can be inserted into the heap in any order.

**Items are always removed by highest priority (lowest number) first.**

In [22]:
from heapq import heappush, heappop

a = []
heappush(a, 5)
heappush(a, 3)
heappush(a, 7)
heappush(a, 4)

print(a)
print(heappop(a), heappop(a), heappop(a), heappop(a))

[3, 4, 7, 5]
3 4 5 7


Accessing the 0 index of the heap will always return the smallest item.

**Each of these heapq operations takes logarithmic time in proportion to the length of the list.** Doing the same work with a standard Python list would scale linearly.

### Bisection

Searching for an item in a list takes linear time proportional to its length when you call the index method.

The `bisect` module’s functions, such as `bisect_left`, provide an efficient binary search through a sequence of sorted items. The index it returns is the insertion point of the value into the sequence.

The complexity of a binary search is logarithmic. That means using `bisect` to search a list of 1 million items takes roughly the same amount of time as using index to linearly search a list of 14 items. **It’s way faster!**

In [34]:
from bisect import bisect_left

x = list(range(10**8))

In [35]:
i = x.index(99991234)

In [36]:
# much faster
i = bisect_left(x, 99991234)

### Iterator Tools

The `itertools` built-in module contains a large number of functions that are **useful for organizing and interacting with iterators** (see Item 16: “Consider Generators Instead of Returning Lists” and Item 17: “Be Defensive When Iterating Over Arguments” for background). Not all of these are available in Python 2, but they can easily be built using simple recipes documented in the module. See help(itertools) in an interactive Python session for more details.

**The itertools functions fall into three main categories**:

**Linking iterators together**
* chain: Combines multiple iterators into a single sequential iterator.
* cycle: Repeats an iterator’s items forever.
* tee: Splits a single iterator into multiple parallel iterators.
* zip_longest: A variant of the zip built-in function that works well with iterators of different lengths.

**Filtering items from an iterator**
* islice: Slices an iterator by numerical indexes without copying.
* takewhile: Returns items from an iterator while a predicate function returns True.
* dropwhile: Returns items from an iterator once the predicate function returns False for the first time.
* filterfalse: Returns all items from an iterator where a predicate function returns False. The opposite of the filter built-in function.

**Combinations of items from iterators**
* product: Returns the Cartesian product of items from an iterator, which is a nice alternative to deeply nested list comprehensions.
* permutations: Returns ordered permutations of length N with items from an iterator.
* combination: Returns the unordered combinations of length N with unrepeated items from an iterator.

There are even more functions and recipes available in the itertools module that I don’t mention here. Whenever you find yourself dealing with some tricky iteration code, it’s worth looking at the itertools documentation again to see whether there’s anything there for you to use.

Things to Remember
* Use Python’s built-in modules for algorithms and data structures.
* Don’t reimplement this functionality yourself. It’s hard to get right.

## Item 47: Use decimal When Precision Is Paramout

Python is an excellent language for writing code that interacts with numerical data. Python’s integer type can represent values of any practical size. Its double-precision floating point type complies with the **IEEE 754 standard**. The language also provides a **standard complex number type** for imaginary values. However, these aren’t enough for every situation.

In [2]:
# cost of calling
rate = 1.45
seconds = 3*60 + 42
cost = rate * seconds / 60
cost

5.364999999999999

In [3]:
round(cost, 2)

5.36

In [4]:
rate = 0.05
seconds = 5
cost = rate * seconds / 60
print(cost)
print(round(cost, 2))

0.004166666666666667
0.0


The solution is to use the `Decimal` class from the `decimal` built-in module. The Decimal class provides fixed point math of 28 decimal points by default. It can go even higher if required. This works around the precision issues in IEEE 754 floating point numbers. The class also gives you more control over rounding behaviors.

In [8]:
from decimal import Decimal, ROUND_UP

rate = Decimal('1.45')
seconds = Decimal('222')
cost = rate * seconds / Decimal('60')
print(cost)

# rounding
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(rounded)

5.365
5.37


In [10]:
rate = Decimal('0.05')
seconds = Decimal('5')
cost = rate * seconds / Decimal('60')
print(cost)

# rounding
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(rounded)

0.004166666666666666666666666667
0.01


`Decimal` class still has limitations in its precision (e.g., 1/3). For representing retional numbers with no limit to precision, consider using the **Fraction** class in `fractions` module.

## Item 48: Know Where to Find Community-Built Modules

“Python has a central repository of modules (https://pypi.python.org) for you to install and use in your programs. These modules are built and maintained by people like you: the Python community. When you find yourself facing an unfamiliar challenge, the Python Package Index (PyPI) is a great place to look for code that will get you closer to your goal.

To use the Package Index, you’ll need to use a command-line tool named `pip`. `pip` is installed by default in Python 3.4 and above (it’s also accessible with `python -m pip`). For earlier versions, you can find instructions for installing `pip` on the Python Packaging website (https://packaging.python.org).

Once installed, using `pip` to install a new module is simple. For example, here I install the `pytz` module that I used in another item in this chapter (see Item 45: “Use datetime Instead of time for Local Clocks”):

```shell
$ pip install pytz
```

Each module in the PyPI has its own software license. Most of the packages, especially the popular ones, have free or open source licenses (see http://opensource.org for details). In most cases, these licenses allow you to include a copy of the module with your program (when in doubt, talk to a lawyer).

Things to Remember
* The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
* pip is the command-line tool to use for installing packages from PyPI.
* pip is installed by default in Python 3.4 and above; you must install it yourself for older versions.
* The majority of PyPI modules are free and open source software.