<img src='img/logo.png'>
<img src='img/title.png'>
<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [Pythonics](#Pythonics)
	* [Simple idioms](#Simple-idioms)
	* [Avoid global variables](#Avoid-global-variables)
	* [Rationalize imports (nothing circular)](#Rationalize-imports-%28nothing-circular%29)
	* [Avoid type or class checking](#Avoid-type-or-class-checking)
		* [Consider using Abstract Base Classes](#Consider-using-Abstract-Base-Classes)
		* [Even better, use *duck typing*](#Even-better,-use-*duck-typing*)
	* [Equality isn't identity](#Equality-isn't-identity)
	* [Use docstrings](#Use-docstrings)
		* [Usage](#Usage)
		* [Functions](#Functions)
		* [Modules](#Modules)
	* [Iteration tricks](#Iteration-tricks)
		* [Exercise (rewrite higher-order funcs as comprehension)](#Exercise-%28rewrite-higher-order-funcs-as-comprehension%29)
		* [(Potentially) infinite sequences](#%28Potentially%29-infinite-sequences)
	* [Pick good collection types](#Pick-good-collection-types)
	* [Context managers](#Context-managers)
	* [Conventions](#Conventions)
		* [Style](#Style)


# Learning Objectives:

After completion of this module, learners should be able to:

* Designing better code:
  * 'pythonic' idioms
  * conventions
  * PEP8

Don't reinvent the wheel - check if the process you want to do has already been solved, in the standard library or elsewhere.

## Pythonics

List of examples to expand upon
  * List comprehensions
  * Generator expressions
  * Generators
  * Loop and Iterator techniques (zip, enumerate->itertools->toolz)
  * Avoid index counters
  * Contexts
  * Use of _
  * Documentation?
  * Avoid type() -> assert and fail early
  * Custom exceptions and defensive programming
  * `__main__` and scripts/CLI
  * Partial functions? Closures?

Many things are described in more detailed in Advanced Python; they are listed here for completeness, as special python idioms worth remembering.

## Simple idioms

In [None]:
# un/packing tuples
a, b = (1, 2)
a, b = b, a
out = a, b   # commas define tuples, not parens
print(*out)

In [None]:
# avoiding index variables, e.g.,
(4 in range(9), 
 'requirements:\n' in open('data/graphviz-meta.yaml'),
 'D' in "Martin Durant"
)
# rather than looping or using find functions

In [None]:
# similarly for dictionaries
mydict = {'a': 0, 'b': 1}
if 'a' in mydict:
    print("no need for has_key()")
    
print(mydict.get('c', 'Default Value'))

In [None]:
# "empty" containers, None, and zero-like values are Falsy, everything else Truthy:
if None:
    print("Never happens")

mylist = ['anything', 5]
if mylist:
    print("non-empty thing")
# so you never do
if len(mylist) > 0:
    print('also non-empty thing')

## Avoid global variables

## Rationalize imports (nothing circular)

**Bad**

`modA.py`:
```python
import modB

def function1(x):
    y = modB.function1(x)
    # return something else

def function2(x):
    # return something
```

`modB.py`:
```python
import modA

def function1(x):
    # return something
    
def function2(x):
    y = modA.function2(x)
    # return something else
```

**Better**

`modC.py`:
```python
def function1(x):
    # return something
    
def function2(x):
    # return something else
```

`modA.py`:
```python
import modC

def function1(x):
    y = modC.function1(x)
    # return something
```

`modB.py`:
```python
import modC

def function2(x):
    y = modC.function2(x)
    # return something
```

## Avoid type or class checking

In [None]:
# Either assume and allow exception or use isinstance
import collections
odict = collections.OrderedDict([('a', 1), ('b', 2)])

def process_dict(d):
    "ONLY works on dictionaries"
    assert type(d) == type(dict), 'Wrong type' # Fails for subclasses
    # Process d...
process_dict(odict)

In [None]:
isinstance(odict, dict)

### Consider using Abstract Base Classes

In [None]:
from collections.abc import Mapping
isinstance(odict, Mapping)

In [None]:
import src.mapping as mapping
shout = mapping.ShoutMap()

# An admittedly odd user mapping class
print(shout['Martin Durant'], len(shout), [x for x in shout])

isinstance(shout, dict), isinstance(shout, Mapping)

In [None]:
mapping??

### Even better, use *duck typing*

Most of the time in Python you aren't interested in "what something is" but "what it can do."  This is known as "duck typing" after the expression "if it walks like a duck, and quacks like a duck, let's call it a duck."

In [None]:
def process_dict(d):
    # Just try to do some operations; catch problem if they aren't supported
    try:
        keys = d.keys()
        for key in keys:
            do_something(d[key])
    except:
        print("Object does not have both .keys() and .getitem()")
        raise

## Equality isn't identity

In [None]:
# is and == are not necessarily the same
x = [1, 2, 3]
y = [1, 2, 3]
print(x == y)
print(x is y) # are they the very same object?
# why is this important

...but DO use in the case of: 

```python
if x is None: ...
if y is not None: ...
```

## Use docstrings

Docstrings for classes, modules, functions and methods are much better than comments within the code. You want the information to pop up in help and auto-generated docs, rather than forcing people to read through the source files to see how things are suposed to work.

See [PEP0257](https://www.python.org/dev/peps/pep-0257/)

See [Numpy Docstrings Standard](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt)

### Usage

For Python code blocks (packages, modules, classes, methods and functions) a string just after the declaration is a `docstring`. The string becomes the `__doc__` attribute of the object and can be used with the `help()` function or read by a utility like [Docutils](docutils.sourceforge.net) to generate documentation directly from the code.

By convention use triple quotes for `docstrings`.

In [None]:
# single line
def nudge():
    """As good as wink to a blind bat, eh?"""

In [None]:
help(nudge)

In [None]:
# multiple lines
class DeadParrot(object):
    '''It's not pining. It's passed on.
   
    It's rung down the curtain and
    joined the choir invisible.
    This is an ex-parrot!
    '''

In [None]:
help(DeadParrot)

### Functions

* Don't repeat the function signature. Return types are not guaranteed.
* for class methods don't document the `self` argument.

In [None]:
# Pointless docstring
def compound_interest_v1(n, r, A0):
    '''compound_interest(n,r,A0) -> number'''
    print('n =', n)
    print('r =', r)
    print('A0 =', A0)
    return A0*(1+0.01*r)**n

In [None]:
help(compound_interest_v1)

In [None]:
# the better way
def compound_interest_v2(period, rate, principal):
    '''Compute the compound interest on a principal investment
    
    Arguments:
        period: length of the investment
        rate: interest rate
        principal: amount of initial investment
        
    Returns:
        The compound interest
        
    Raises:
        ValueError: if principal is zero or negative
    '''
    print('period =', period)
    print('rate =', rate)
    print('principal =', principal)
    
    if principal < 0 or principal == 0:
        raise ValueError('The principal investment must be positive definite')
    return principal*(1+0.01*rate)**period

In [None]:
help(compound_interest_v2)

In [None]:
compound_interest_v2(2,0.2,-1)

### Modules

In [None]:
%%file tmp/shrubbery.py
"""A module for making beautiful shrubbery"""

class Shrubbery(list):
    '''A collection of shrubs'''
    
    def contains_family(self,family):
        '''Return True if any shrub in the shrubbery matches the family'''
        return any([family.lower() in shrub.family.lower() for shrub in self])
    
    def is_good(self):
        '''Return the goodness of the shrubbery
        
        Returns:
            True if any of the shurbs are of the laurel family
            False otherwise
        '''
        if(self.contains_family('laurel')):
            # Yes, it is a good shrubbery. I like the laurels particularly.
            return True
            
        else:
            # You must find another shrubbery!
            return False
        

class Shrub(object):
    '''A small woody plant with many stems.
    
    Attributes
        family -- The common name of the genus of the shrub.
    '''
    
    def __init__(self,family):
        '''Create a Shrub of the genus `family`'''
        self.family = family
    

In [None]:
import tmp.shrubbery as shrubbery
a_shrubbery = shrubbery.Shrubbery()
a_shrubbery.append(shrubbery.Shrub('Laurel'))
a_shrubbery.extend([shrubbery.Shrub('Magnolia'),shrubbery.Shrub('Laurel')])
a_shrubbery.is_good()

In [None]:
help(shrubbery)

## Iteration tricks

*c.f., NumPy index tricks*

In [None]:
# Consider the following pattern
inputs = list(range(6))
print("Inputs:", inputs)
outputs = []
for x in inputs:
    if x % 2 == 0:
        outputs.append(x**2 + 2)
print("Outputs:", outputs)

In [None]:
# Transforming/filtering a list is so common, that it has its own syntax
[x**2 + 2 for x in inputs if not x % 2]

This *list comprehension* allows you to make the procedure much more compact, and it now reads like you might describe this operation in words.

We understand lists, and conceptually, taking successive values and accumulating results isn't hard. However, we are assuming that that full list exists beforehand in memory, and we will process all the elements in one go to produce all the output. Some inputs are very large (e.g., lines in a file) or infinite (e.g., a stream from the web or sensor; a mathematical sequence), so we want to consider *lazy evaluation*, where the processing only happens each time a new value is required. You can get there with similar syntax.

In [None]:
outputs = (x**2 + 2 for x in inputs if x % 2 ==0)
print(outputs)        # Nothing has been evaluated yet
print(next(outputs))  # We pull the first value
print(next(outputs))  # We pull the second value

This new syntax is a *generator expression*; when it is assigned to `output`, nothing is done except define how values should be fetched. Those values are generated one at a time (hence the name), and the full sequence is not held in memory. The generator remembers where it is up to and holds that state until a new value is needed. In fact, the built-in `range` (`xrange` in python2) works this way, so that you can count to an arbitrarily large number without filling up memory. 

It is rare to use the `next()` function directly; instead, the generator can be used in a `for` loop, or passed to other functions that expect iterable things. There are many of these functions available, so that you can chain and manipulate data streams without ever evaluating them into lists.

The generator expression above is a specialized shorthand for *generators*, in which you define how subsequent values are fetched. Ever wondered what "`for line in openfile:`" did? Open files also act as generators, which is how you can avoid loading all lines at once into memory.

In [None]:
# What does this do?
list(map(lambda x: x[0], filter(lambda x: "py" in x[1],
         enumerate(open('best_conventions.ipynb')))))
# Is this a good way to do things?

# cf. `lambda x: x[0]`; Vs `operator.itemgetter(0)`

### Exercise (rewrite higher-order funcs as comprehension)

The prior cell can be expressed as a list-comprehension that most people will find easier to read.  Do so!

### (Potentially) infinite sequences

In [None]:
# A new generator
from math import sqrt
def primes(k=float('inf')):
    "Simplistic prime maker, up to maximum k"
    yield 2
    n = 3
    while True:
        for x in range(3, int(sqrt(n))+1):
            # Check all integers
            if n % x == 0:
                break
        else:
            yield n  # <- magic
        n += 2
        if n > k:
            break

for p in primes(12):
    print(p, end=' ')
print()
for p in primes(20):
    print(p, end=' ')

In [None]:
# (almost) All these tools can deal with infinite sequences efficiently
import itertools
dir(itertools)

## Pick good collection types

In [None]:
from collections import Counter
try:
    from urllib.request import urlopen
except ImportError:
    from urllib import urlopen # Python 2.7
url = urlopen('http://www.gutenberg.org/cache/epub/98/pg98.txt')
book = str(url.read(), 'utf8')
# For teaching locations w/o internet access:
# book = open('data/pg98.txt').read()
letters = Counter(book)
letters

See also: the toolz/cytoolz project (http://toolz.readthedocs.org/).

## Context managers

In [None]:
# Consider the following
output = open('tmp/tempfile', 'w')
output.write('Hello')
print(open('tmp/tempfile').read())

Why didn't we see anything?

In [None]:
# How is this different?
import os
with open('tmp/tempfile', 'w') as output:
    output.write('Hello')
print(open('tmp/tempfile').read())
os.remove('tmp/tempfile')

You will see this used particularly with global resources (e.g., locks, network connections) where you must ensure you release after use.

## Conventions

There are a number of ways to write code, but here are some typical things will, and a few things you sholdn't see in good code. The list is non-exhaustive!

In [None]:
class A:
    "Hidden class attribute"
    _private = 1
    public = 2
    visible = 3
a = A()

In [None]:
# press <TAB>
a.

This is also used for ignored/dummy variables, e.g., if we want to execute something five times, but don't care which iteration we are currently on:
```python
for _ in range(5):
    do_something()
```

In [None]:
class B:
    "Special methods"
    def __init__(self):
        print("Initialised")
        self.ready = True

    def __repr__(self):
        return "A useful class"

b = B()
b

Lists of special method:
  * http://www.diveintopython3.net/special-method-names.html
  * http://rafekettler.com/magicmethods.html

Allows you to customise how your instances behave in under standard operations/syntax, so that you can plug in classes to work in interesting new ways, e.g., adding uncertainties or units onto numerical types.

### Style

There are general guidelines, structre rules and specialized conventions for how your code should look. See the comprehensive list https://www.python.org/dev/peps/pep-0008/ , which can be checked using the built-in `pep8` module and command-line utility. It can be run a follows

```bash
% pep8 myfile.py
```

You might need to install the tool first:

```bash
% conda install -y pep8
```

or via the IDE environment (e.g., in `spyder`, Preferences -> Editor -> Code Introspection -> Style Analysis; produces style warnings in the editor, left margin).

None of these are enforced... but your IDE may warn you if you break them, and you can use automated tools (e.g., `autopep8`) to fix issues such as spacing.

  * Classes have names beginning with an upper case character
  * Instances begin with lower case
  * methods begin with lower case; commonly `use_underscores()` or `camelCase()`
  * functions usually `use_underscores()`
  * global static values are `ALL_UPPER`
  * one statement per line
  * many spacing concerns, like 80 character line limit

Google has a good general style guide:
https://google-styleguide.googlecode.com/svn/trunk/pyguide.html
(but note that specific projects or institutions may have their own details, especially around documentation).

<img src='img/copyright.png'>