# Advanced Python



# PEP 8

PEP 8 contains the Python style guide.  Here follows some of the relevant chapters from PEP 8.

## Code Lay-out

### Indentation
Use **4 spaces** per indentation level.

Continuation lines should align wrapped elements either vertically using Python's implicit line joining inside parentheses, brackets and braces, or using a hanging indent. When using a hanging indent the following should be considered; there should be no arguments on the first line and further indentation should be used to clearly distinguish itself as a continuation line.

**Yes:**
```python

# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest.
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

# Hanging indents should add a level.
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)
```
****No:****
```python
# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# Further indentation required as indentation is not distinguishable.
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

```    
The 4-space rule is optional for continuation lines.

Optional:
```python
# Hanging indents *may* be indented to other than 4 spaces.
foo = long_function_name(
  var_one, var_two,
  var_three, var_four)
```
When the conditional part of an if-statement is long enough to require that it be written across multiple lines, it's worth noting that the combination of a two character keyword (i.e. if), plus a single space, plus an opening parenthesis creates a natural 4-space indent for the subsequent lines of the multiline conditional. This can produce a visual conflict with the indented suite of code nested inside the if-statement, which would also naturally be indented to 4 spaces. This PEP takes no explicit position on how (or whether) to further visually distinguish such conditional lines from the nested suite inside the if-statement. 

Acceptable options in this situation include, but are not limited to:
```python
# No extra indentation.
if (this_is_one_thing and
    that_is_another_thing):
    do_something()

# Add a comment, which will provide some distinction in editors
# supporting syntax highlighting.
if (this_is_one_thing and
    that_is_another_thing):
    # Since both conditions are true, we can frobnicate.
    do_something()

# Add some extra indentation on the conditional continuation line.
if (this_is_one_thing
        and that_is_another_thing):
    do_something()
```
(Also see the discussion of whether to break before or after binary operators below.)

The closing brace/bracket/parenthesis on multiline constructs may either line up under the first non-whitespace character of the last line of list, as in:
```python
my_list = [
    1, 2, 3,
    4, 5, 6,
    ]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
    )
```
or it may be lined up under the first character of the line that starts the multiline construct, as in:
```python
my_list = [
    1, 2, 3,
    4, 5, 6,
]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
)
```
### Tabs or Spaces?
Spaces are the preferred indentation method.

Tabs should be used solely to remain consistent with code that is already indented with tabs.

Python 3 disallows mixing the use of tabs and spaces for indentation.

Python 2 code indented with a mixture of tabs and spaces should be converted to using spaces exclusively.

When invoking the Python 2 command line interpreter with the -t option, it issues warnings about code that illegally mixes tabs and spaces. When using -tt these warnings become errors. These options are highly recommended!

### Maximum Line Length
Limit all lines to a maximum of **79** characters.

For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

Limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns.

The default wrapping in most tools disrupts the visual structure of the code, making it more difficult to understand. The limits are chosen to avoid wrapping in editors with the window width set to 80, even if the tool places a marker glyph in the final column when wrapping lines. Some web based tools may not offer dynamic line wrapping at all.

Some teams strongly prefer a longer line length. For code maintained exclusively or primarily by a team that can reach agreement on this issue, it is okay to increase the line length limit up to 99 characters, provided that comments and docstrings are still wrapped at 72 characters.

The Python standard library is conservative and requires limiting lines to 79 characters (and docstrings/comments to 72).

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.

Backslashes may still be appropriate at times. For example, long, multiple with-statements cannot use implicit continuation, so backslashes are acceptable:
```python
with open('/path/to/some/file/you/want/to/read') as file_1, \
     open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())
```
(See the previous discussion on multiline if-statements for further thoughts on the indentation of such multiline with-statements.)

Another such case is with assert statements.

Make sure to indent the continued line appropriately.

Should a Line Break Before or After a Binary Operator?
For decades the recommended style was to break after binary operators. But this can hurt readability in two ways: the operators tend to get scattered across different columns on the screen, and each operator is moved away from its operand and onto the previous line. Here, the eye has to do extra work to tell which items are added and which are subtracted:

**No:**
```python
# operators sit far away from their operands
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)
```
To solve this readability problem, mathematicians and their publishers follow the opposite convention. Donald Knuth explains the traditional rule in his Computers and Typesetting series: "Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations" [3].

Following the tradition from mathematics usually results in more readable code:

**Yes:**
```python
# easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)
```
In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style is suggested.

### Blank Lines
Surround top-level function and class definitions with two blank lines.

Method definitions inside a class are surrounded by a single blank line.

Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).

Use blank lines in functions, sparingly, to indicate logical sections.

Python accepts the control-L (i.e. ^L) form feed character as whitespace; Many tools treat these characters as page separators, so you may use them to separate pages of related sections of your file. Note, some editors and web-based code viewers may not recognize control-L as a form feed and will show another glyph in its place.

### Source File Encoding
Code in the core Python distribution should always use UTF-8 (or ASCII in Python 2).

Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an encoding declaration.

In the standard library, non-default encodings should be used only for test purposes or when a comment or docstring needs to mention an author name that contains non-ASCII characters; otherwise, using \x, \u, \U, or \N escapes is the preferred way to include non-ASCII data in string literals.

For Python 3.0 and beyond, the following policy is prescribed for the standard library (see PEP 3131): All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren't English). In addition, string literals and comments must also be in ASCII. The only exceptions are (a) test cases testing the non-ASCII features, and (b) names of authors. Authors whose names are not based on the Latin alphabet (latin-1, ISO/IEC 8859-1 character set) MUST provide a transliteration of their names in this character set.

Open source projects with a global audience are encouraged to adopt a similar policy.

### Imports
Imports should usually be on separate lines:

**Yes:** 
```python
import os
import sys
```
****No:****
```python
import sys, os
```
It's okay to say this though:
```python
from subprocess import Popen, PIPE
```
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Imports should be grouped in the following order:

- Standard library imports.
- Related third party imports.
- Local application/library specific imports.

You should put a blank line between each group of imports.

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path):
```python
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
```
However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose:
```python
from . import sibling
from .sibling import example
```
Standard library code should avoid complex package layouts and always use absolute imports.

Implicit relative imports should never be used and have been removed in Python 3.

When importing a class from a class-containing module, it's usually okay to spell this:
```python
from myclass import MyClass
from foo.bar.yourclass import YourClass
```
If this spelling causes local name clashes, then spell them explicitly:
```python
import myclass
import foo.bar.yourclass
```
and use "myclass.MyClass" and "foo.bar.yourclass.YourClass".

Wildcard imports (`from <module> import *`) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn't known in advance).

When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

### Module Level Dunder Names
Module level "dunders" (i.e. names with two leading and two trailing underscores) such as `__all__, __author__, __version__,` etc. should be placed after the module docstring but before any import statements except `from __future__ imports`. Python mandates that future-imports must appear in the module before any other code except docstrings:

```python
"""This is the example module.

This module does stuff.
"""

from __future__ import barry_as_FLUFL

__all__ = ['a', 'b', 'c']
__version__ = '0.1'
__author__ = 'Cardinal Biggles'

import os
import sys
```

## String Quotes
In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257.

Whitespace in Expressions and Statements
### Pet Peeves
Avoid extraneous whitespace in the following situations:

Immediately inside parentheses, brackets or braces.

**Yes:** `spam(ham[1], {eggs: 2})`

****No:****  `spam( ham[ 1 ], { eggs: 2 } )`

Between a trailing comma and a following close parenthesis.

**Yes:** `foo = (0,)`

****No:****  `bar = (0, )`

Immediately before a comma, semicolon, or colon:

**Yes:** `if x == 4: print x, y; x, y = y, x`

****No:****  `if x == 4 : print x , y ; x , y = y , x`

However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted.

**Yes:**
```python
ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
ham[lower:upper], ham[lower:upper:], ham[lower::step]
ham[lower+offset : upper+offset]
ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)]
ham[lower + offset : upper + offset]
```
****No:****
```python
ham[lower + offset:upper + offset]
ham[1: 9], ham[1 :9], ham[1:9 :3]
ham[lower : : upper]
ham[ : upper]
```
Immediately before the open parenthesis that starts the argument list of a function call:

**Yes:** `spam(1)`
****No:****  `spam (1)`

Immediately before the open parenthesis that starts an indexing or slicing:

**Yes:** `dct['key'] = lst[index]`
****No:****  `dct ['key'] = lst [index]`

More than one space around an assignment (or other) operator to align it with another.

**Yes:**
```python
x = 1
y = 2
long_variable = 3
```
****No:****
```python
x             = 1
y             = 2
long_variable = 3
```

## Other Recommendations
Avoid trailing whitespace anywhere. Because it's usually invisible, it can be confusing: e.g. a backslash followed by a space and a newline does not count as a line continuation marker. Some editors don't preserve it and many projects (like CPython itself) have pre-commit hooks that reject it.

Always surround these binary operators with a single space on either side: assignment (`=`), augmented assignment (`+=`, `-=` etc.), comparisons (`==`, `<`, `>`, `!=`, `<>`, `<=`, `>=`, `in`, `not in`, `is`, `is not`), Booleans (`and`, `or`, `not`).

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator.

**Yes:**
```python
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
```
****No:****
```python
i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
```
Function annotations should use the normal rules for colons and always have spaces around the -> arrow if present. (See Function Annotations below for more about function annotations.)

**Yes:**
```python
def munge(input: AnyStr): pass
def munge() -> AnyStr: pass
```
****No:****
```python
def munge(input:AnyStr): pass
def munge()->PosInt: pass
```
Don't use spaces around the = sign when used to indicate a keyword argument, or when used to indicate a default value for an unannotated function parameter.

**Yes:**
```python
def complex(real, imag=0.0):
    return magic(r=real, i=imag)
```
****No:****
```python
def complex(real, imag = 0.0):
    return magic(r = real, i = imag)
```
When combining an argument annotation with a default value, however, do use spaces around the = sign:

**Yes:**
```python
def munge(sep: AnyStr = None): pass
def munge(input: AnyStr, sep: AnyStr = None, limit=1000): pass
```
****No:****
```python
def munge(input: AnyStr=None): pass
def munge(input: AnyStr, limit = 1000): pass
```
Compound statements (multiple statements on the same line) are generally discouraged.

**Yes:**
```python
if foo == 'blah':
    do_blah_thing()
do_one()
do_two()
do_three()
```

**Rather not:**
```python
if foo == 'blah': do_blah_thing()
do_one(); do_two(); do_three()
```
While sometimes it's okay to put an if/for/while with a small body on the same line, never do this for multi-clause statements. Also avoid folding such long lines!

**Rather not:**

```python
if foo == 'blah': do_blah_thing()
for x in lst: total += x
while t < 10: t = delay()
```

**Definitely not:**

```python
if foo == 'blah': do_blah_thing()
else: do_non_blah_thing()

try: something()
finally: cleanup()

do_one(); do_two(); do_three(long, argument,
                             list, like, this)

if foo == 'blah': one(); two(); three()
```

### When to Use Trailing Commas
Trailing commas are usually optional, except they are mandatory when making a tuple of one element (and in Python 2 they have semantics for the print statement). For clarity, it is recommended to surround the latter in (technically redundant) parentheses.

**Yes:**
```python
FILES = ('setup.cfg',)
```

**OK, but confusing:**

```python
FILES = 'setup.cfg',
```
When trailing commas are redundant, they are often helpful when a version control system is used, when a list of values, arguments or imported items is expected to be extended over time. The pattern is to put each value (etc.) on a line by itself, always adding a trailing comma, and add the close parenthesis/bracket/brace on the next line. However it does not make sense to have a trailing comma on the same line as the closing delimiter (except in the above case of singleton tuples).

**Yes:**

```python
FILES = [
    'setup.cfg',
    'tox.ini',
    ]
initialize(FILES,
           error=True,
           )
```

****No:****

```python
FILES = ['setup.cfg', 'tox.ini',]
initialize(FILES, error=True,)
```

### Comments
Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Comments should be complete sentences. The first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).

Block comments generally consist of one or more paragraphs built out of complete sentences, with each sentence ending in a period.

You should use two spaces after a sentence-ending period in multi- sentence comments, except after the final sentence.

When writing English, follow Strunk and White.

Python coders from non-English speaking countries: please write your comments in English, unless you are 120% sure that the code will never be read by people who don't speak your language.

#### Block Comments
Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).

Paragraphs inside a block comment are separated by a line containing a single #.

#### Inline Comments
Use inline comments sparingly.

An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.

Inline comments are unnecessary and in fact distracting if they state the obvious. Don't do this:

```python
x = x + 1                 # Increment x
```

But sometimes, this is useful:

```python
x = x + 1                 # Compensate for border
```

## Documentation Strings
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.

PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself:

```python
"""Return a foobang

Optional plotz says to frobnicate the bizbaz first.
"""
```

For one liner docstrings, please keep the closing `"""` on the same line.

### Google Style Docstrings

```python

def example_method(self, param1, param2):
        """Class methods are similar to regular functions.

        Note:
            Do not include the `self` parameter in the ``Args`` section.

        Args:
            param1: The first parameter.
            param2: The second parameter.

        Returns:
            True if successful, False otherwise.

        """
        return True
```

### Pandas doc string

```python
def add(num1, num2):
"""
Add up two integer numbers.

This function simply wraps the `+` operator, and does not
do anything interesting, except for illustrating what is
the docstring of a very simple function.

Parameters
----------
num1 : int
    First number to add
num2 : int
    Second number to add

Returns
-------
int
    The sum of `num1` and `num2`

See Also
--------
subtract : Subtract one integer from another

Examples
--------
>>> add(2, 2)
4
>>> add(25, 0)
25
>>> add(10, -10)
0
"""
return num1 + num2

```

### Epytext doc string
Historically a javadoc like style was prevalent, so it was taken as a base for Epydoc (with the called Epytext format) to generate documentation.

```python

"""
This is a javadoc style.

@param param1: this is a first param
@param param2: this is a second param
@return: this is a description of what is returned
@raise keyError: raises an exception
"""
```
### reST
Nowadays, the probably more prevalent format is the reStructuredText (reST) format that is used by Sphinx to generate documentation. Note: it is used by default in JetBrains PyCharm (type triple quotes after defining a method and hit enter). It is also used by default as output format in Pyment.

```python
"""
This is a reST style.

:param param1: this is a first param
:param param2: this is a second param
:returns: this is a description of what is returned
:raises keyError: raises an exception
"""
```

## Naming Conventions
The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent -- nevertheless, here are the currently recommended naming standards. New modules and packages (including third party frameworks) should be written to these standards, but where an existing library has a different style, internal consistency is preferred.

### Overriding Principle
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.

Descriptive: Naming Styles
There are a lot of different naming styles. It helps to be able to recognize what naming style is being used, independently from what they are used for.

The following naming styles are commonly distinguished:

`b` (single lowercase letter)

`B` (single uppercase letter)

`lowercase`

`lower_case_with_underscores`

`UPPERCASE`

`UPPER_CASE_WITH_UNDERSCORES`

`CapitalizedWords` (or `CapWords`, or `CamelCase` -- so named because of the bumpy look of its letters [4]). This is also sometimes known as `StudlyCaps`.

Note: When using acronyms in `CapWord`s, capitalize all the letters of the acronym. Thus `HTTPServerError` is better than `HttpServerError`.

`mixedCase` (differs from `CapitalizedWord`s by initial lowercase character!)

`Capitalized_Words_With_Underscores` (ugly!)

There's also the style of using a short unique prefix to group related names together. This is not used much in Python, but it is mentioned for completeness. For example, the os.stat() function returns a tuple whose items traditionally have names like `st_mode`, `st_size`, `st_mtime` and so on. (This is done to emphasize the correspondence with the fields of the POSIX system call struct, which helps programmers familiar with that.)

The `X11` library uses a leading `X` for all its public functions. In Python, this style is generally deemed unnecessary because attribute and method names are prefixed with an object, and function names are prefixed with a module name.

In addition, the following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):

`_single_leading_underscore`: weak "internal use" indicator. E.g. from M import * does not import objects whose names start with an underscore.

`single_trailing_underscore_`: used by convention to avoid conflicts with Python keyword, e.g.
`Tkinter.Toplevel(master, class_='ClassName')`

`__double_leading_underscore`: when naming a class attribute, invokes name mangling (inside class FooBar, `__boo` becomes `_FooBar__boo`; see below).

`__double_leading_and_trailing_underscore__`: "magic" objects or attributes that live in user-controlled namespaces. E.g. `__init__`, `__import__` or `__file__`. Never invent such names; only use them as documented.

### Prescriptive: Naming Conventions

#### Names to Avoid
Never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead.

#### ASCII Compatibility
Identifiers used in the standard library must be ASCII compatible as described in the policy section of PEP 3131.

#### Package and Module Names
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).

#### Class Names
Class names should normally use the CapWords convention.

The naming convention for functions may be used instead in cases where the interface is documented and used primarily as a callable.

Note that there is a separate convention for builtin names: most builtin names are single words (or two words run together), with the CapWords convention used only for exception names and builtin constants.

#### Type Variable Names
Names of type variables introduced in `PEP 484` should normally use `CapWords` preferring short names: `T`, `AnyStr`, `Num`. It is recommended to add suffixes `_co` or `_contra` to the variables used to declare covariant or contravariant behavior correspondingly:

```python
from typing import TypeVar

VT_co = TypeVar('VT_co', covariant=True)
KT_contra = TypeVar('KT_contra', contravariant=True)
```

#### Exception Names
Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix "Error" on your exception names (if the exception actually is an error).

### Global Variable Names
(Let's hope that these variables are meant for use inside one module only.) The conventions are about the same as those for functions.

Modules that are designed for use via `from M import *` should use the `__all__` mechanism to prevent exporting globals, or use the older convention of prefixing such globals with an underscore (which you might want to do to indicate these globals are "module non-public").

### Function and Variable Names
Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Variable names follow the same convention as function names.

`mixedCase` is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.

#### Function and Method Arguments
Always use `self` for the first argument to instance methods.

Always use `cls` for the first argument to class methods.

If a function argument's name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus class_ is better than clss. (Perhaps better is to avoid such clashes by using a synonym.)

#### Method Names and Instance Variables
Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.

Use one leading underscore only for non-public methods and instance variables.

To avoid name clashes with subclasses, use two leading underscores to invoke Python's name mangling rules.

Python mangles these names with the class name: if class Foo has an attribute named `__a`, it cannot be accessed by `Foo.__a`. (An insistent user could still gain access by calling `Foo._Foo__a`.) Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.

Note: there is some controversy about the use of `__names` (see below).

### Constants
Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include `MAX_OVERFLOW` and `TOTAL`.


In [64]:
#%load_ext pycodestyle_magic

### Builtin Types 

Numeric

-   int, long (py2), float, complex

Sequences

-   str, list, tuple, bytes (py3)

Other Iterables

-   set, frozenset, dictionary, file

Structures

-   Functions, Classes, Modules

## Python 2 amd 3 Compatibility

- Python 3 is a complete rewrite
- Not backwards compatible with Python 2

### Main Differences
- print
  + Py2 `print` is a statemet
  + Py3 `print()` is a function
  + Py2 a trailing comma suppresses new lines
  + Py3 `end` parameter which defaults to `'\n'`
- Division 
  + Py3 / float and // int
  + Py2 uses arithmetic promotion
- Strings
  + Py3 strings are Unicode by default, use  `b' ... '`  to create a Py2 compatible `bytes` object
  + Py2 srings are *ASCII* by default, use `u' ... '` ro create a Py3 compatible unicode string
  + `s.encode()` and `s.decode()` to concert str and bytes objects.
- Lists vs Generators
   + Most functions/methods that create actual lists in Py2, create generators in Py3
   + Use list(range(10)) for example to emulate Py2 behaviour

### Python 2to3
- there is a script called 2to3
- 2to3 converts Py2 code to Py3
- Makes the adjustments listed above
- Does not fix renamed modules in std lib

### Writing future proof Py2 Code

To offer backward compatibility with Python 2 from your Python 3 code, you can use the pasteurize script. This adds these lines at the top of each module:

In [65]:
from __future__ import absolute_import

Absolute is a Py3 behaviour import which favours the `stdlib` functions over modules in the path, unless you use `import . modulename`.

In [66]:
from __future__ import division

Make Python2 use `/` for *float* division and `//` for *int* division.

In [67]:
from __future__ import print_function

Creates a function called `print()` in Python2.

In [68]:
from __future__ import unicode_literals

Python 3 string literals, that is `'text'` is now unicode and `b'text'` is ASCII.

See: (Python compatible idioms)[http://www.python-future.org/compatible_idioms.html]

### Pythonic code

In [69]:
x = 5
y = 10
x, y = y, y+x

Replaces

In [70]:
old_x = x
x = y
y = y + old_x

### Quick Dictionary Creation

In [71]:
keys = list('ABC')
vals = list('abc')
list(zip(keys, vals))

[('A', 'a'), ('B', 'b'), ('C', 'c')]

In [None]:
dict(zip(keys,vals))

### Setting Default Value

In [72]:
def create_dict():
    if key in my_dict:
        return my_dict[key]
    else:
        my_dict[key] = DEFAULT

Becomes:

In [73]:
my_dict = {}
my_dict.setdefault('key', 'DEFAULT')

'DEFAULT'

Also see `defaultdict` later


# Functions 

Definition

In [74]:
def function_name():
    #statements
    pass

Calling Function

In [75]:
function_name()

### Objects
Functions in Python are first class objects. 
Using a function name without parenthesis is a reference to the function, 
when a function name is followed by parenthesis (with or without arguments) it is a function call.  

For example:

In [76]:
def square(x):
    return x**2

In [77]:
fn = square
print(fn, square)

<function square at 0x10ccfdb70> <function square at 0x10ccfdb70>


In [79]:
print(fn(2), square(3))

4 9


## Parameters

In [80]:
def fn(arg1, arg2):
    '''accpts two arguments'''

In [81]:
fn(1,2) # positional parameters or args

-   calls function fn with arg1=1 and arg2=2

In [82]:
fn(arg2=1,arg1=2) #named parameters or key word args (kwargs)

-   calls function fn with arg2=1 and arg1=2

### Parameter Passing Semantics

Parameters are passed by **value**.  
All python variables are **object refences**.
Hence Mutability is important when passing variables.

### Mutability
Mutible objects can be changed after instantiation. Immutible objects cannot. This is not the same thing as a constant since the variable that refers to the immutible object may still be reassigned.

Immutability prevents an object referred to by one refence being altered by a different reference.

Immutable objects include: 
- int 
- log 
- float
- complex
- str
- tuple

Mutable objects include:
- list
- dict
- set
- file

Strings are immutible

In [83]:
s1 = 'test'
s2 = s1
s1 is s2

True

In [84]:
s1 += 'ing'
print(s1, s2)

testing test


Lists are mutible.

In [85]:
l1 = [1,2,3]
l2 = l1
l1 is l2

True

In [86]:
l1 += [4]
l2

[1, 2, 3, 4]

### Hence ...

In [87]:
def doubler(x):
    x += x
    print(x)

a = 'a'
doubler(a)

aa


In [11]:
a

'a'

In [88]:
b = ['a']
doubler(b)

['a', 'a']


In [89]:
b

['a', 'a']

### Scope
In Python a block does not create a variable scope, only functions.
Global variables may be referenced, but not assigned (unless *global* keyword used)

In [14]:
VER = 1.0


def print_ver():
    print(VER)


print_ver()


def alter_ver():
    VER += 1


alter_ver()

1.0


UnboundLocalError: local variable 'VER' referenced before assignment

In [90]:
VER = 1.0


def global_alter_ver():
    global VER
    VER += 1


global_alter_ver()

Hence `print_ver` and `global_alter_ver` work, but `alter_ver` has the above error:

### Optional Arguments

In [91]:
def fn(arg1, arg2=0):
    'arg2 is optional'

-   statements
-   Default values make an argument optional
-   All mandatory arguments are defined before any optional arguments.

### Variable argument lists

In [92]:
def fn(*args):
    print(args)


fn(1,2,3)

(1, 2, 3)


-   **args** becomes a tuple

### Key word args

In [93]:
def fn(**kwargs):
    print(kwargs)



fn(a='A',b='B')

{'a': 'A', 'b': 'B'}


-   **kwargs** becomes a dictionary

### Type Hinting (Py3.5+)

With the rise of intelligent IDEs type hinting has become  more compelling, as types can be checked and resolved in real time. The IDE's perform the validation, not Python itself. Python doesn't use this information currently, it only stores it as metadata and ignores it during execution.


- Type hinting for:
  + Parameters and
  + Return type

In [94]:
def greeting(name: str) -> str:
    return 'Hello ' + name

-  New module : `typing`

In [95]:
from typing import Dict, Tuple, List

### Type aliases

A type alias is defined by assigning the type to the alias. In this example, Vector and List[float] will be treated as interchangeable synonyms:

In [96]:
from typing import List
Vector = List[float]


def scale(scalar: float, vector: Vector) -> Vector:
    return [scalar * num for num in vector]


# typechecks; a list of floats qualifies as a Vector.
new_vector = scale(2.0, [1.0, -4.2, 5.4])

The dictionary below will only contain strings.

In [97]:
from typing import Dict

mydict : Dict[str, str] = {}

## Unions and Optional

Types can be one of a list.   Below a `column` can be either `None` or a `str`.

In [98]:
from typing import Union, Optional

column = Union[None, str]

This achieves the same thing.

In [99]:
column = Optional[str]

## Type

In [100]:
from typing import Type

class BaseClass:
    pass

class SubClass(BaseClass):
    pass

def useclass(cs: Type[BaseClass]):
    print(type(cs))

useclass(BaseClass) 
useclass(SubClass) #passes static type checking

<class 'type'>
<class 'type'>


## Generics

Generic typing allows for type variables (of type `TypeVar`) these can be used create collections classes whose members can potentially be of any type, a particular instance can be assigned a type without sub classing the collection.

Example from the python documentations.

In [101]:
from typing import TypeVar, Generic
from logging import Logger

T = TypeVar('T')

class LoggedVar(Generic[T]):
    def __init__(self, value: T, name: str, logger: Logger) -> None:
        self.name = name
        self.logger = logger
        self.value = value

    def set(self, new: T) -> None:
        self.log('Set ' + repr(self.value))
        self.value = new

    def get(self) -> T:
        self.log('Get ' + repr(self.value))
        return self.value

    def log(self, message: str) -> None:
        self.logger.info('%s: %s', self.name, message)

## Type Checkers

The original type checker is called Mypy.  Facebook have created their own called Pyre, which is faster for large code bases.  Microsoft has a plugin for VS Code calles Pyright.

## Stubs

The type checker only needs a stubs file to let programs access a Python module. There is no need to port the entire module to mypy. A stubs file is also a good starting point for porting an entire Python module to mypy. They can also highlight potential areas of improvement in the mypy type system.

A stubs file only contains a description of the public interface of the module without any implementations. Stubs can be dynamically typed, statically typed or a mixture of both. Based on preliminary results, it seems that over 95% of the names defined in the Python standard library can be given static types. The rest will be dynamically typed (use type 'Any') or have 'Any' components, but that is fine, since mypy supports seamless mixing of dynamic and static types in programs.

Mypy uses stubs from [Type Shed](https://github.com/python/typeshed) -- all stub changes should be contributed there. Mypy developers will periodically pull latest changes from typeshed to be included with mypy. There are stubs for both Python 2.7 and 3.x, though not every stub supports both 2 and 3. Some stubs can be shared between Python 2.7 and 3.x.

## MonkeyType
MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type annotations directly to your Python code based on the types collected at runtime.

**Example**

Say `some/module.py` originally contains:

```python
def add(a, b):
    return a + b
```

And myscript.py contains:

```python
from some.module import add

add(1, 2)
```

Now we want to infer the type annotation of add in some/module.py by running myscript.py with MonkeyType. One way is to run:

`$ monkeytype run myscript.py`

By default, this will dump call traces into a SQLite database in the file monkeytype.sqlite3 in the current working directory. You can then use the monkeytype command to generate a stub file for a module, or apply the type annotations directly to your code.

Running monkeytype stub `some.module` will output a stub:

```python
def add(a: int, b: int) -> int: ...
```

Running monkeytype apply some.module will modify some/module.py to:

```python
def add(a: int, b: int) -> int:
    return a + b
```    
    
This example demonstrates both the value and the limitations of MonkeyType. With MonkeyType, it's very easy to add annotations that reflect the concrete types you use at runtime, but those annotations may not always match the full intended capability of the functions. For instance, add is capable of handling many more types than just integers. Similarly, MonkeyType may generate a concrete List annotation where an abstract Sequence or Iterable would be more appropriate. MonkeyType's annotations are an informative first draft, to be checked and corrected by a developer.

# Functional Programming

## List Comprehension

In [102]:
[x**2 for x in range(5)]

[0, 1, 4, 9, 16]

-   Actual list created

The list comprehension behaves like `map()`, except that instead of requiring a function, the expression before *for* is applied to every element in the *iterable*. 

## lambda 

Inline anonymous function

```python
lambda arg1[,...] : expression
```

-   `arg1[,...]` function argument list
-   expression is returned by the function

### Simple lambda functions 

Square

In [103]:
square = lambda x: x**2

Even

In [104]:
is_even = lambda x: x % 2 == 0

Normally one would not assign an object reference to a `lambda`, using `def` is preffered.  Lambda is normally used where a function expects a function to be passed as a *callback*. 

### Map

In [105]:
map(lambda x: x**2, range(5))

<map at 0x10d2fd6d8>

In [106]:
list(map(lambda x: x**2, range(5)))

[0, 1, 4, 9, 16]

### Filter

In [107]:
list(filter(lambda x: x % 2 == 0, range(5)))

[0, 2, 4]

### Generator expression

- Not an actual list
- Iterable object i.e. has next()

A generator expression creates an iterable object instead of a list. The difference between a generator expressions and a list comprehension is analogous to that between `range()` and `xrange()`.  If the number of elements being processed is large a generator expression will be much more memory efficient than a list comprehension.

In [109]:
for i in (x**2 for x in range(5)):
    print(i, end=' ')

0 1 4 9 16 

In [110]:
 o = (x**2 for x in range(5))

In [111]:
next(o)

0

In [112]:
o.__next__()

1

In [113]:
for i in o:
    print(i, end=' ')

4 9 16 

### Generator Functions 

- yield

-   Keyword always present in generator function
-   Function returns a generator object

```python
o = generator_func()   
next(o)
```

-   Executes up until **yield**
-   "break point" where execution temporarily stops and returns a value

Generator functions create iterable objects.  These objects have a method called *`next()`*, or *`__next__()`* in Python 3.  The each time this method is called the object will run through the function's code up until a `yield`  at which point that expression is returned.  When iterated over the generator object will have its `next()` method called successively, until it reaches the end at which point a *StopIteration* exception is raised.

### Updown

Example generator function `updown()` returns a generator object, that iterates up to (but not including) n, and the down again to 1.

In [114]:
def updown(n):
    for i in range(n):
        yield i
    for i in range(n,0,-1):
        yield i

In [115]:
for x in updown(5) : print(x, end=' ')

0 1 2 3 4 5 4 3 2 1 

In [116]:
o = updown(5)
next(o)

0

In [117]:
next(o)

1

In [118]:
o.__next__(); o.__next__(); o.__next__()

4

In [119]:
next(o)

5

In [120]:
o.__next__(); o.__next__(); o.__next__(); o.__next__(); o.__next__()

StopIteration: 

# Iteration Patterns in Python

In [121]:
d = {'A':'a', 'B':'b', 'C':'c'}
print('keys')
for k in d:
    print(k, end=' ')
print('\nvalues')
for v in d.values():
    print(v, end='')

keys
A B C 
values
abc

In [122]:
print('key => value')
for k, v in d.items():
    print(k, ' => ', v)

key => value
A  =>  a
B  =>  b
C  =>  c


In [123]:
print('key => value')
for item in d.items():
    print('%s => %s' % item)

key => value
A => a
B => b
C => c


In [124]:
print('key => value')
print('\n'.join('%s => %s' % item for item in d.items()))

key => value
A => a
B => b
C => c


### Iterating over Files

The traditional way to iterate over a file (`for line in file:`), has the disadvantage of not safely limiting the possible size pf the data read from the file at a time.  If we do not trust that the file has reasonably sized lines, we could do the following, and read 32 byte blocks.

```python
from functools import partial
with open('...') as f:
    blocks = []
    for block in iter(partial(f.read,32),''):
        blocks.append(block)
```

In [125]:
from functools import partial
with open('config.text') as f:
    blocks = []
    for block in iter(partial(f.read,32),''):
        blocks.append(block)
blocks

['# config file 1.1\n\nport = 10001 ',
 '# port to connect\nhost = localho',
 'st\nuser = pyuser\ndbname = pydb1\n',
 '   #end section\n']

`Iter` takes a sentinal, as its second argument.  Appending a list is more efficient than concatenating strings!  

### Stdlib has interesting iterables

```python
for match in re.finditer(pattern, string):
    # once for each regex match...

for root, dirs, files in os.walk('/some/dir'):
    # once for eamh sub-directory
```

## Itertools

The `itertools` module has a number of functions, useful fro iteration.

```python
# itertools is a module full of tools for playing with iteration
for num in itertools.count():
    # once for each integer... Infinite!

from itertools import chain, repeat, cycle
seq = chain(repeat(17, 3), cycle(range(4)))   # cycles forever 
for num in seq:
    #17, 17, 17, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ...
```

### Other functions useful with iterables

In [126]:
iterable = range(1,10)
new_list = list(iterable)

In [127]:
results = [square(x) for x in iterable]

In [128]:
total = sum(iterable)

In [129]:
smallest = min(iterable)
largest = max(iterable)

In [130]:
",".join('iterable')

'i,t,e,r,a,b,l,e'

### Getting the index of a list in a loop

In [131]:
my_list = "abc def ghi jkl".split()

In [132]:
#Don't do this
for i in range(len(my_list)):
    v = my_list[i]
    print(i, v)

0 abc
1 def
2 ghi
3 jkl


In [133]:
#Do it this way instead
for i, v in enumerate(my_list):
    print(i, v)

0 abc
1 def
2 ghi
3 jkl


### enumerate() makes useful pairs

In [134]:
names = ["Eiffel Tower", "Empire State", "Sears Tower"]
list(enumerate(names))

for num, name in enumerate(names):
    print(num, name)

0 Eiffel Tower
1 Empire State
2 Sears Tower


### Iteration vs indexing
* Limited:

In [135]:
for i in range(len(my_list)):
    v = my_list[i]  # indexing
    print(i, v)

0 abc
1 def
2 ghi
3 jkl


**More powerful:**
* some iterables can't be indexed, such as an open file

In [136]:
for i, v in enumerate(iterable):
    print (i, v, end = '\t')

0 1	1 2	2 3	3 4	4 5	5 6	6 7	7 8	8 9	

```python
for linenum, line in enumerate(f, start=1):
    #...
```

### Looping over two lists

In [137]:
names = ["Eiffel Tower", "Empire State", "Sears Tower"]
heights = [324, 381, 442]

In [138]:
#Non - Pythonic way
for i in range(len(names)):
    name = names[i]
    height = heights[i]
    print("%s: %s meters" % (name, height))

Eiffel Tower: 324 meters
Empire State: 381 meters
Sears Tower: 442 meters


**A: zip() makes pair-wise loops**
* A pair of streams becomes a stream of pairs
* zip() takes iterables and produces iterables

In [139]:
for name, height in zip(names, heights):
    print("%s: %s meters" % (name, height))

Eiffel Tower: 324 meters
Empire State: 381 meters
Sears Tower: 442 meters


**dict() accepts a stream of pairs**

In [140]:
dict(zip(names, heights))

{'Eiffel Tower': 324, 'Empire State': 381, 'Sears Tower': 442}

**Powerful**

In [141]:
tall_buildings = {
   "Empire State": 381, "Sears Tower": 442,
   "Burj Khalifa": 828, "Taipei 101": 509,
}

In [142]:
print(max(tall_buildings.values()))

828


In [143]:
print(max(tall_buildings.items(), key=lambda b: b[1]))

('Burj Khalifa', 828)


In [144]:
print(max(tall_buildings, key=tall_buildings.get))

Burj Khalifa


### Abstracting iteration
* When doing 2 things in the loop such as picking items out of the list and then doing something with those values.
* The picking and the doing can be two separate pieces that can be abstracted apart.

```python
nums = [88, 73, 92, 72, 40, 30, 25, 20, 98, 72]
for n in nums:
    if n % 2 == 0:
        do_something(n)
```

## Iterable Pattern
- Function evens accepts an iterable and produces a new iterable

```python
def evens(stream):
    them = []
    for n in stream:
        if n % 2 == 0:
            them.append(n)
    return them

for n in evens(nums):
    do_something(n)
```

**Functions return one value - Generators produce a stream**

* A Generator is like a function.
* A function when you call it, it runs all the statements and returns one value.
* A Generator when you call it produces an iterator and when you iterate the values in the iterator, it runs the statements in a generator, and every time it encounters a yield statement, it produces one more value.
* It's kind of like a function that can keep producing values over and over again.
* Generators are a really really powerful way to implement iteration.
* You should be writing more Generators.

**Evens generator**
- Let's turn our `evens` function into a generator

In [145]:
def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n

**Chaining iterables**

In [146]:
def do_something(n):
    print(n, end='')

In [147]:
for n in evens([1,2,3,4,5,6,7,8]):
    do_something(n)

2468

In [148]:
def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n

In [149]:
for n in evens([1,2,3,4,5,6,7,8]):
    do_something(n)

2468

## Abstracting the iteration

In [150]:
with open("config.text") as f:
    for line in f:
        line = line.strip()
        if line.startswith('#'):
            # A comment line, skip it
            continue
        if not line:
            # A blank line, skip it
            continue

        # An interesting line.
        do_something(line)

port = 10001 # port to connecthost = localhostuser = pyuserdbname = pydb1

Note:
There are essentially two parts to this code.  The first part prodices the interesting lines in the file.  The second part applies some logic to the lines.

**Your own generator**

In [151]:
def interesting_lines(f):
    for line in f:
        line = line.strip()
        if line.startswith('#'):
            continue
        if not line:
            continue
        yield line

In [152]:
def do_something_else(line):
    print(line.upper())


with open("config.text") as f:
    for line in interesting_lines(f):
        do_something(line)

with open("test.log") as f2:
    for line in interesting_lines(f2):
        do_something_else(line)

port = 10001 # port to connecthost = localhostuser = pyuserdbname = pydb12018-05-24 11:47:04,335:DEBUG:ROOT:TEST
2018-05-24 11:48:17,270:DEBUG:TEST1:TEST
2018-05-24 11:49:46,408:DEBUG:TEST1:TEST
2018-05-24 11:59:47,925:ERROR:TEST1:DEBUGGING
2018-05-24 12:10:59,632:DEBUG:TEST1:NEW MESSAGE
2018-05-24 12:11:59,727:DEBUG:TEST1:NEW MESSAGE


### Breaking out of a nested loop
- How is it done?

```python
for row in range(height):
    for col in range(width):

        value = spreadsheet.get_value(col, row)
        do_something(value)

        if this_is_my_value(value):
            break  # <-  ???
```

**Make the double loop single**
* Through the use of a generator

In [153]:
def range_2d(width, height):
    """Produce a stream of two-D coordinates."""
    for y in range(height):
        for x in range(width):
            yield x,y

```python
for col, row in range_2d(width, height):
    value = spreadsheet.get_value(col, row)
    do_something(value)
    
    if this_is_my_value(value):
        break
```

**Better: iterate cells**

```python
for cell in spreadsheet.cells():
    value = cell.get_value()
    do_something(value)
    
    if this_is_my_value(value):
        break
```

## Low Level Iteration
**Lower level**
* Iterable: produces and iterator
* Iterator: produces a stream of values
* Only operation on iterators is next()

```python
iterator = iter(iterable)  # iterable.__iter__()
value = next(iterator)     # iterator.next() or .__next
value = next(iterator)
...
```


**Low-level iteration**

```python
with open("blah.dat") as f:
    # Read the first line
    header_line = next(f)
    
    # Read the rest
    for data_line in f:
        # ...
```

**Making your object iterable**

In [154]:
class ToDoList(object):
    def __init__(self):
        self.tasks = []
        
    def __iter__(self):
        return iter(self.tasks)    # returns an iterator over the list

```python
todo = ToDoList()
...
for task in todo:
    # ...
```

**`__iter__` generators**

In [155]:
class ToDoList(object):
    def __init__(self):
        self.tasks = []

    def __iter__(self):
        for task in self.tasks:
            if not task.done:
                yield task

    def all(self):
        return iter(self.tasks)

    def done(self):
        # This is a generator expression
        return (t for t in self.tasks if t.done)

---
# Database Programming

## SQLite
The SQLite library allows SQL commands to access data in flat files.

In [156]:
import sqlite3

## Common DB API
- **`connect`**(*`filename`*) connect to a database stored in *filename* and return a *Connection* object 

- **`Connection.cursor`**() return a *cursor* object for the *connection*.

- **`Connection.commit`**() Commit Transaction (if `autocommit` is off).

- **`Connection.close`**() Close connection. Connection supports the *with* statement.

- **cursor.execute**(*sql*) execute a query in an *sql* string.

- **cursor.executemany**(*sql*,[()..]) execute parameterised query against a list of parameters.

- **cursor.fetch\[all\]**() return query result.

- **cursor.description** query meta data i.e. field names and properties.

## SQL examples

In [157]:
import sqlite3 as db
conn = db.connect(':memory:')
cur = conn.cursor()
sql = '...' # sql query as str
#cur.execute(sql)

## Create a table

In [158]:
sql = '''
   create table users(
   id int primary key,
   username varchar(64),
   password varchar(64))
   '''
cur.execute(sql)

<sqlite3.Cursor at 0x10d349f10>

## Insert a row
### Parameterised Query

In [159]:
sql = '''
    insert into users 
    values(?,?,?)'''
cur.executemany(sql,[
    (2, 'fred', 'qwerty'),
    (3, 'sarah', 'asdf')])
conn.commit() #if auto commit not on

## Selecting

In [160]:
sql = "select * from users"
cur.execute(sql)

<sqlite3.Cursor at 0x10d349f10>

** Fetching Rows **
- cur.fetchall() -> list of tuples
- cur.fetchone() ) -> tuple
- for row in cur: 
   Iterate over result set 

In [161]:
cur.fetchall()

[(2, 'fred', 'qwerty'), (3, 'sarah', 'asdf')]

## Description
- The description allows us to fetch query meta data
- SQLITE only supports the column name meta data, hence other fields are always `None`.

In [162]:
cur.description

(('id', None, None, None, None, None, None),
 ('username', None, None, None, None, None, None),
 ('password', None, None, None, None, None, None))

In [172]:
keys = [x[0] for x in cur.description]
keys

['id', 'username', 'password']

In [173]:
rows = [(2, 'fred', 'qwerty'), (3, 'sarah', 'asdf')]

In [174]:
dict_list = []
for row in rows:
    dict_list.append(dict(zip(keys,row)))
dict_list

[{'id': 2, 'username': 'fred', 'password': 'qwerty'},
 {'id': 3, 'username': 'sarah', 'password': 'asdf'}]

In [175]:
list_dict = {key: [] for key in keys}

In [176]:
list_dict

{'id': [], 'username': [], 'password': []}

In [177]:
for row in rows:
    for i, key in enumerate(keys):
        list_dict[key].append(row[i])
        
list_dict

{'id': [2, 3], 'username': ['fred', 'sarah'], 'password': ['qwerty', 'asdf']}

In [179]:
list_dict2 = dict.fromkeys(keys, [])
list_dict2

{'id': [], 'username': [], 'password': []}

In [180]:
for row in rows:
    for i, key in enumerate(keys):
        list_dict2[key].append(row[i])
        
list_dict2

{'id': [2, 'fred', 'qwerty', 3, 'sarah', 'asdf'],
 'username': [2, 'fred', 'qwerty', 3, 'sarah', 'asdf'],
 'password': [2, 'fred', 'qwerty', 3, 'sarah', 'asdf']}

---

# Modules 

-   normal python file
-   or pre-compiled c object, .dll .so
-   found in sys.path

## import 

import module

-   creates a namespace
-   named after module
-   same as file name sans extension
-   hence filename must be valid python identifier

## import invocation

```python
import mod
```

Imports a module from a file in the ``python_path`` named ``mod.py`` with a namespace ``mod``.

```python
import mod1, mod2
```

Imports both modules ``mod1`` and ``mod2``.

```python
import mod1 as alias
```

Imports a module ``mod`` with a namespace ``alias``.

```python
from mod import attribute[, ...]
```

Imports module ``mod`` and places listed attributes into the ``__main__`` namespace, or that of the current module.

```python
from mod import *
```

Imports module mod and places *all* attributes into the `__main__` namespace, or that of the current module.

## Packages
- sub directories within `the python_path`
- directory must contain `__init__.py`
  + May be empty
  + Any code is executed on `import packagename`
  + most often imports default modules from within the package

**Importing Packages**

```python
import package #effectively executes __init__.py
import package.mod #imports module within package
```

---

# class 

`class` allows custom object types.

contains

- class variables
- methods
- docstrings

In [181]:
class C1(object): pass

The above code defines an empty class name `C1`. Note that the class name is in TitleCase by convention  The value in parenthesis is, the *super* or *base* class.  That is this class inherrits from a built in class called `object`.  Classes that inherrit from `object` are *new style classes*.  An object of this class is instantiated when the classname is *called* as if it were a function.

In [182]:
new_object = C1() #instantiates the class

## class variable 

-   normal variable
-   class acts as a namespace

In [183]:
class C2(object): 
    class_var = 'Class Variable'

In [184]:
print("Static Ref:", C2.class_var) #static reference
c2_object = C2
print("non-static:", c2_object.class_var) #non-static reference to the same variable

Static Ref: Class Variable
non-static: Class Variable


Class variables behave like any other module variables, that is *global* variables defined in the module, escept that the class name acts as a namespace or qualifier before the variable name and class variables can automatically be assigned within a function, without requiring the `global` keyword.

In [185]:
def fn():
    C2.class_var += " Modified"

fn()
C2.class_var

'Class Variable Modified'

## instance vars 

-   variables contained in the *instance*

In [186]:
c1 = C2()
c1.var = 4
c1.__dict__['var']

4

Every time a new *instance* of the class is created (that is, we create an oject of that class).  A new copy of every instance variable is created.  The instance variables are stored in a dictionary called `__dict__`.  Classes similarly also have dictionaries:

In [187]:
C2.__dict__

mappingproxy({'__module__': '__main__',
              'class_var': 'Class Variable Modified',
              '__dict__': <attribute '__dict__' of 'C2' objects>,
              '__weakref__': <attribute '__weakref__' of 'C2' objects>,
              '__doc__': None})

In [188]:
c1.__dict__

{'var': 4}

In [189]:
vars(c1)

{'var': 4}

In [190]:
vars(C2)

mappingproxy({'__module__': '__main__',
              'class_var': 'Class Variable Modified',
              '__dict__': <attribute '__dict__' of 'C2' objects>,
              '__weakref__': <attribute '__weakref__' of 'C2' objects>,
              '__doc__': None})

## methods 

-   must accept *self*
-   self represents the *instance object* the method was called from

In [191]:
class C3(object):
    def show_dict(self):
        print(self.__dict__)

The above class has a single method: `show_dict`.  This method accepts `self`, which always must be the firts argument to a method.  `self` is a reference to the *instance* of the class from which the method is called is called.  The method is said to be *bound* to the method. 

In [192]:
test1 = C3()
test1.var = 10
test1.show_dict()

{'var': 10}


In [193]:
test2 = C3()
test2.other_var = 999
test2.show_dict()

{'other_var': 999}


### Function bound and unbound methods

A function is a *callable* object defined outside of a class, a method is a *callable* defined within a class.  Unbound methods are referenced using the class as a qualifier, a bound method is referenced using the *instance* as a qualifier.

In [194]:
C3.show_dict

<function __main__.C3.show_dict(self)>

In [195]:
test1.show_dict

<bound method C3.show_dict of <__main__.C3 object at 0x10d39c748>>

Both above references are to the same object.  When *calling* and ubound method it is necessary to supply an instance to satisfy the `self` parameter.

In [196]:
C3.show_dict(test2)

{'other_var': 999}


In [197]:
test2.show_dict()

{'other_var': 999}


Both above calls are to the same method.  In both cases `self` is assigned to `test2`. 

## Initialising Objects`__init__`

As can be seen from the previous examples an object has no instance variables by default.  Instance variables can be created at any time, however we normally want all objects of a certain class to have the same set (or should I say, dictionary) of instance variables;

```python
def __init__(): # initialise object
```

-   intercepts the classname() call.
-   often used to initialise instance variables

In [198]:
class C4(object):
    def __init__(self):
        self.a = 'A'
        self.b = 'B'

In [199]:
o = C4()
o.__dict__

{'a': 'A', 'b': 'B'}

**Passing `__init__` arguments**

If init accepts additional arguments, these can be passed when the instance is created.

In [200]:
class C5(object):
    def __init__(self, arg):
        self.a = arg
        self.b = 'B'

In [201]:
o = C5(20)
o.__dict__

{'a': 20, 'b': 'B'}

# Inheritance

Part of the class definition

-   `class classname(baseclass[,...])`
-   attribute resolution delegation

In Python inheritance is a matter of attribute delegation. When an attribute (variable or method) is referenced from an object (`o.attribute`) : Python first searches `o.__dict__`, if not found it then searches `ClassName.__dict__`, and after that the base classes and their base classes.   

![attribute delegation](images/attribute_delegation.png)

# Exceptions

An exception is a signal that an error or other unusual condition has occurred. There are a number of built-in exceptions, which indicate conditions like reading past the end of a file, or dividing by zero. You can also define your own exceptions.

## Handling Exceptions

```python
try: 
    block
except [exceptionclass] as e: 
    block
else: 
    block
finally: 
    block
```

In order to handle errors, you can set up exception handling blocks in your code. The keywords `try` and `except` are used to catch exceptions. When an error occurs within the try block, Python looks for a matching except block to handle it. If there is one, execution jumps there. If no exception occurs Python jumps to the `else` block.

The `finally` block will be executed no matter what.

### Unhandled Exception

In [202]:
1/0

ZeroDivisionError: division by zero

### Example of Handled Exception

In [203]:
try :
    assert False, "force exception"
    x[4]
    1/0
    print(a)

except ZeroDivisionError:
    print("can't devide by zero")
    

except NameError:
    print("name not defined")

except IndexError as e:
    print("index out of range", e)

else:
    print("no exception occured")
finally:
    print("we do this no matter what")

we do this no matter what


AssertionError: force exception

When handling multiple exceptions Python will jump to the first `except` clause that matches the exception. Exceptions are classes with an inheritance hierarchy. An exception can be matched by the class that raised the exception or any of it's base classes.  

Exception clauses therefore should be sorted from the specific to the general.  Always make sure that a *catchall* except clause identifies the true nature of the exception with a stack trace or similar.

### Built in Exception Classes.

```
BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StopAsyncIteration
      +-- ArithmeticError
      |    +-- FloatingPointError
      |    +-- OverflowError
      |    +-- ZeroDivisionError
      +-- AssertionError
      +-- AttributeError
      +-- BufferError
      +-- EOFError
      +-- ImportError
      |    +-- ModuleNotFoundError
      +-- LookupError
      |    +-- IndexError
      |    +-- KeyError
      +-- MemoryError
      +-- NameError
      |    +-- UnboundLocalError
      +-- OSError
      |    +-- BlockingIOError
      |    +-- ChildProcessError
      |    +-- ConnectionError
      |    |    +-- BrokenPipeError
      |    |    +-- ConnectionAbortedError
      |    |    +-- ConnectionRefusedError
      |    |    +-- ConnectionResetError
      |    +-- FileExistsError
      |    +-- FileNotFoundError
      |    +-- InterruptedError
      |    +-- IsADirectoryError
      |    +-- NotADirectoryError
      |    +-- PermissionError
      |    +-- ProcessLookupError
      |    +-- TimeoutError
      +-- ReferenceError
      +-- RuntimeError
      |    +-- NotImplementedError
      |    +-- RecursionError
      +-- SyntaxError
      |    +-- IndentationError
      |         +-- TabError
      +-- SystemError
      +-- TypeError
      +-- ValueError
      |    +-- UnicodeError
      |         +-- UnicodeDecodeError
      |         +-- UnicodeEncodeError
      |         +-- UnicodeTranslateError
      +-- Warning
           +-- DeprecationWarning
           +-- PendingDeprecationWarning
           +-- RuntimeWarning
           +-- SyntaxWarning
           +-- UserWarning
           +-- FutureWarning
           +-- ImportWarning
           +-- UnicodeWarning
           +-- BytesWarning
           +-- ResourceWarning
```

## Raising exceptions

```python
raise Exceptionclass, message #(py2)
raise Exceptionclass(message) #(py3)
assert boolean_expression, message
```

## traceback
the traceback module allows provides functions for the display and formatting f error
messages.

In [204]:
import traceback as tb

## traceback
- tb.**print_exception(etype, value, tb, limit=none, file=none)**
print exception up to 'limit' stack trace entries from 'tb' to 'file'.

- tb.**print_exc(limit=none, file=none)**
shorthand for 'print_exception(sys.exc_type, sys.exc_value, sys.exc_traceback, limit, file)'.

- tb.**format_exc(limit=none)**
like print_exc() but return a string.

## logging
Often the best way to track errors and debug your program is by using a log file.  The logging module can be used both procedurally and in an object oriented fasion.

In [205]:
import logging as log

Setup logging with
- **basicconfig**(*format*, *filename*, *loglevel*)

Use a convenience method to log a message.
- **debug**(*message*) ->debug,message)
- **warn**(*message*)
- **error**(*message*)


```python
except:
    print("other error")
    #log.error("some other exception occured")
    #log.error(tb.format_exc())
    log.exception("some other exception occured")  # replaces the two lines above
```



---

In [206]:
log.warning('This is a warning')



In [207]:
log.error('This is an error')

ERROR:root:This is an error


### Configuration
Initially logging can be configures with the `basicConfig` function.

In [208]:
help(log.basicConfig)

Help on function basicConfig in module logging:

basicConfig(**kwargs)
    Do basic configuration for the logging system.
    
    This function does nothing if the root logger already has handlers
    configured. It is a convenience method intended for use by simple scripts
    to do one-shot configuration of the logging package.
    
    The default behaviour is to create a StreamHandler which writes to
    sys.stderr, set a formatter using the BASIC_FORMAT format string, and
    add the handler to the root logger.
    
    A number of optional keyword arguments may be specified, which can alter
    the default behaviour.
    
    filename  Specifies that a FileHandler be created, using the specified
              filename, rather than a StreamHandler.
    filemode  Specifies the mode to open the file, if filename is specified
              (if filemode is unspecified, it defaults to 'a').
    format    Use the specified format string for the handler.
    datefmt   Use the specified 

### Formating a log

In [209]:
print(log.Formatter.__doc__)


    Formatter instances are used to convert a LogRecord to text.

    Formatters need to know how a LogRecord is constructed. They are
    responsible for converting a LogRecord to (usually) a string which can
    be interpreted by either a human or an external system. The base Formatter
    allows a formatting string to be specified. If none is supplied, the
    the style-dependent default value, "%(message)s", "{message}", or
    "${message}", is used.

    The Formatter can be initialized with a format string which makes use of
    knowledge of the LogRecord attributes - e.g. the default value mentioned
    above makes use of the fact that the user's message and arguments are pre-
    formatted into a LogRecord's message attribute. Currently, the useful
    attributes in a LogRecord are described by:

    %(name)s            Name of the logger (logging channel)
    %(levelno)s         Numeric logging level for the message (DEBUG, INFO,
    %(levelname)s       Text logging level for

```python
help(log)

...

DATA
    BASIC_FORMAT = '%(levelname)s:%(name)s:%(message)s'
    CRITICAL = 50
    DEBUG = 10
    ERROR = 40
    FATAL = 50
    INFO = 20
    NOTSET = 0
    WARN = 30
    WARNING = 30
```

Adding a time filed to the log message.

In [210]:
FORMAT = '%(asctime)s:' + log.BASIC_FORMAT
FORMAT

'%(asctime)s:%(levelname)s:%(name)s:%(message)s'

In [211]:
log.basicConfig(filename='my.log', level=log.DEBUG, format=FORMAT)

Only one instance of log can write to a log file, hence getlogger.

In [212]:
import logging
logger = logging.getLogger(__name__)

# object special methods
- Inherited from the `object` class
- Known as dunder due to the double under scores in their names

In [213]:
class c(object): pass

c = c()
dir(c)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__']

### object special attributes
- **`__class__`**   the class used to construct the object
- **`__module__`** the name of the module the function was defined in, or none if unavailable.
- **`__doc__`** docstring

### object special attributes
- **`__hash__`** method returning a hash of the object, most immutable object are hashable, set members and dict keys **must** be hashable.
- **`__format__`** method used to create formatted str. 
- **`__dict__`** dictionary of instance variables
- **`__slots__`** Iterable of instance attribute names.  If defined eliminates need for `__dict__` and makes possible attribute fixed. Reduces the memory footprint per instance.

**Slot Example**

In [214]:
class TestSlot(object):
    __slots__ = ['a','b','c']


c = TestSlot()
c.a

AttributeError: a

In [215]:
c.a =1
c.b =2
c.c =2
c.d =2

AttributeError: 'TestSlot' object has no attribute 'd'

In [216]:
c.__dict__

AttributeError: 'TestSlot' object has no attribute '__dict__'

### interceptors 
![interceptors](images/tie-interceptor.jpg)

- **`__setattr__`** method intercepts attempt to set an attribute e.g. `o.x = 10`
- **`__getattribute__`** method intercepts an attempt to get an attribute e.g. `print o.x` 
- **`__getattr__`** called if no attribute found in class. this method is not inherited from *object*
- **`__delattr__`** intercepts `del o.attribute`. 

### object special attributes
- **`__weakref__`**
stores a weak reference to the object if one exists.  weak references do not prevent garbage collection. 
that is an object is destroyed when the "strong" reference count reaches 0.
weak references are created using the weakref module. 

### object creation and deletion
- **`__new__`** called when object created, used in *metaclasses*
- **`__init__`** called after object created, but before it is returned.
- **`__del__`** destructor, called when reference count is 0. *object* does not define a destructor.
- **`__subclasshook__`** determine if class is a subclass of a given class.

### object special attributes
- **`__str__`**  "to string" method, produces a string i.e.  called by *str()*
- **`__repr__`** a string representation, that can be used as a literal to create *o*.

In [217]:
s = 'hello'
s.__str__()

'hello'

In [219]:
str(s)

'hello'

In [218]:
s.__repr__()

"'hello'"

In [220]:
repr(s)

"'hello'"

### Operator overriding
- **`__add__`** Overide the `+` operator, that is `a.__add__(b)` is called when `a + b` is executed. 
- **`__sub__`** Overide the `-` operator
- **`__mull__`** Overide the `*` operator
- **`__div__`** Overide the `/` operator.
- **`__iadd__`** Overide the `+=` operator, that is `a.__iadd__(b)` is called when `a += b` is executed. 


## Descriptors

A descriptor is an object which implements the following methods:

- **`__get__`**(*`self, object[, type]`*) 

Intercepts an attempt to access the value of field. That is, if an an object *o* belongs to a class with a descriptor field called *field*, whose class is Desc, is accessed as follows:
  - `print o.field`  ... becomes ...
  - `print Desc.__get__(field, o)`

- **`__set__`**(*`self, value`*)

Intercepts an attempt to set the value of field. That is, if an an object *o* belongs to a class with a descriptor field called *field* is set as follows:
  - `o.field = 10` ... becomes ...
  - `print Desc.__set__(field, o, 10) `

- **`__delete__`**(*`self, object`*)

Intercepts an attempt to delete a field. 
-  `del(o.field)`  becomes ...
- `Desc.__del__(field, o)`

From
[python-descriptors-made-simple](https://www.smallsurething.com/python-descriptors-made-simple/)

In [221]:
# Price is the class for the descriptor
from weakref import WeakKeyDictionary

class Price(object):
    def __init__(self):
        self.default = 0
        self.values = WeakKeyDictionary()

    def __get__(self, instance, owner):
        return self.values.get(instance, self.default)

    def __set__(self, instance, value):
        if value < 0 or value > 100:
            raise ValueError("Price must be between 0 and 100.")
        self.values[instance] = value

    def __delete__(self, instance):
        del self.values[instance]

In [222]:
class Book(object):
    price = Price() #Descriptor

    def __init__(self, author, title, price):
        self.author = author
        self.title = title
        self.price = price

    def __str__(self):
        return "{0} - {1}".format(self.author, self.title)
    
    def __repr__(self):
        return "Book('{0}', '{1}', {2})".format(self.author, self.title, self.price)

In [223]:
book = Book('fred', "fred's book", 10)
book

Book('fred', 'fred's book', 10)

In [224]:
book.price = 150

ValueError: Price must be between 0 and 100.

# Decorators
- @decorator
- Wraps function

## Closure

In [225]:
def make_double(fn):
    def inner(*args,**kwargs):
        return 2*fn(*args,**kwargs)
    return inner

- Function factory: accepts and returns a function
- outer function args are available to inner
- once *inner* returned, value of *fn* is fixed 

## Applying closure

In [226]:
def square(x):
    return x**2

orig = square
square = make_double(square)

- Using square

In [227]:
print(square(3), orig(3))

18 9


In [228]:
print(square)

<function make_double.<locals>.inner at 0x10ccfdea0>


## Decorator Equivalent

In [229]:
@make_double
def square(x):
     return x**2

In [230]:
print(square)

<function make_double.<locals>.inner at 0x10d373d90>


## Decorators with args
- To create a decorator that can take args
- Create an outer function that takes the args

### Decorator Example

In [231]:
def make_x(factor):
    def real_make_x(fn):
        def inner(*args,**kwargs):
            return factor*fn(*args,**kwargs)
        return inner
    return real_make_x

In [232]:
@make_x(3)
def square(x):
    '''Retunns x**2
    >>> square(2)
    4
    '''
    return x**2

square(2) # 3 * 2**2

12

### Problem with Decorator Example
- `square.__doc__`
   Removed
- `square.__name__`
   inner
- **functools wraps** to the rescue 

In [233]:
square.__doc__

In [234]:
square.__name__

'inner'

In [235]:
orig.__name__

'square'

**Example**

In [236]:
from functools import wraps
def make_double(fn):
    @wraps(fn)
    def inner(*args,**kwargs):
        return 2*fn(*args,**kwargs)
    return inner

In [237]:
@make_double
def square(x):
    'returns the square of x'
    return x**2

In [238]:
square.__doc__

'returns the square of x'

In [239]:
square

<function __main__.square(x)>

In [240]:
square.__name__

'square'

**functools with args**

In [241]:
from functools import wraps, partial

def make_x(factor, fn=None):
    if not fn: 
        return partial(make_x, factor)
    @wraps(fn)
    def inner(*args, **kwargs):
        return factor*fn(*args, **kwargs)
    return inner

In [242]:
@make_x(3)
def square(x):
    '''Returns the square of x
    >>> square(2)
    4
    '''
    return x**2


square(2)

12

In [243]:
print(square.__name__, square.__doc__)

square Returns the square of x
    >>> square(2)
    4
    


### Decorator Exercise
- Create a logging decorator that
- Logs method as you enter it
- ... entering method x
- do method x
- Additional make a method timer decorator.

---

# Property

```python
property(fget=None, fset=None, fdel=None, doc=None)
```

- Executed in class body
- Assigns getters and setters methods

In [244]:
class Triangle(object):
    def __init__(self, base, height):
        self.base = base
        self.height = height
    
    def getarea(self):
        return 0.5 * self.base * self.height
        
    #area = property(fget=lambda self: self.height * self.base * 0.5)
    area = property(fget=getarea)
    
tri = Triangle(5, 10)
tri.area

25.0

## Example in Documentation

In [245]:
class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

## Original Class 
- person01.py

In [246]:
class Person(object):
    
    def __init__(self,name='', age=0):
        self.name = name
        self.age = age 

def test_person(pers):
    print('Name:',pers.name, 'Age:',pers.age)

if __name__ == '__main__':
    fred = Person('Fred',10)
    test_person(fred)
    test_person(Person('sally',80))

Name: Fred Age: 10
Name: sally Age: 80


### Altering Instance Variables
- name -> fname, surname
- Use property to retain attribute name as
- fname + surname

**Answer**

In [247]:
class Person(object):

    def __init__(self,name, age=0):
        self.set_name(name)
        self.age = age 

    def get_name(self):
        return self.fname + ' ' + self.surname

    def set_name(self,name):
        if ' ' in name:
            self.fname,self.surname = name.split(' ')
        else:
            self.fname = name
            self.surname = ''
    name = property(get_name, set_name)

**Testing Answer**

In [248]:
def test_person(pers):
    print('Name:',pers.name, 'Age:',pers.age)

if __name__ == '__main__':
    fred = Person('Fred',10)
    test_person(fred)
    test_person(Person('Sally Derkins',80))
    print(fred.fname)

Name: Fred  Age: 10
Name: Sally Derkins Age: 80
Fred


### Property Decorator
- @property
- Used to decorate getter and setter and deleter methods

### Property Decorator Example

In [249]:
class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        assert value < 20, "limitted to 20"
        self._x = value

    @x.deleter
    def x(self):
        del self._x

o = C()
o.x = 10
print(o.x)
del o.x

10


### Property Decorator Exercise
- Alter Person Class
- Use decorator instead

**Answer**

In [250]:
class Person(object):

    def __init__(self,_name, age=0):
        self.name = _name
        self.age = age

    @property
    def name(self):
        return self.fname + ' ' + self.surname

    @name.setter
    def name(self,_name):
        if ' ' in _name:
            self.fname,self.surname = _name.split(' ')
        else:
            self.fname = _name
            self.surname = ''

### Emulating Property Class

In [251]:
class Property(object):

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.__doc__ = doc if doc else fget.__doc__

**Emulating Property Class Cont...**

In [252]:
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

**Emulating Property Class Cont...**

In [253]:
def getter(self, fget):
    return type(self)(fget, self.fset, self.fdel, self.__doc__)

def setter(self, fset):
    return type(self)(self.fget, fset, self.fdel, self.__doc__)

def deleter(self, fdel):
    return type(self)(self.fget, self.fset, fdel, self.__doc__)

---
# Meta Programming

Meta programming is the practice of altering code using code. This can be done with *decorators* and *metaclasses*.

## Class Decorator
- decorators/decorate_class.py

In [254]:
from functools import wraps


def debug(method):
    '''Method decorator that prints debug messages'''
    @wraps(method)
    def inner(*args, **kwargs):
        print('entering', method.__name__)
        out = method(*args, **kwargs)
        print('returns', out)
        return out
    return inner

In [255]:
def debugclass(cls):
    '''Class decorator that applies the debug decorator
       to all the methods of the class.
    '''
    for key, val in vars(cls).items():
        if callable(val):
            setattr(cls, key, debug(val))
    return cls

### Apply Class Decorator

In [257]:
@debugclass
class Person(object):

    def __init__(self, name='', age=0):
        self.name = name
        self.age = age

    def getname(self):
        return self.name

    def getage(self):
        return self.age

In [258]:
def test_person(pers):
    print('Name:',pers.getname(), 'Age:',pers.getage())

## Meta Classes

### Classes are instances of type

In order understand metaclasses, we must first understand that classes are instances of type.

In [259]:
#from personDec import Person
type(Person)

type

In [260]:
isinstance(Person, type)

True

## Create Class using type()

In [261]:
Spam = type('Spam',(object,),{'a':10,'b': lambda self: self.__dict__.get('x')}) 
spam = Spam()
spam.a

10

In [262]:
spam.b

<bound method <lambda> of <__main__.Spam object at 0x10d3652b0>>

In [263]:
spam.x = 5
spam.b()

5

### Default MetaClass is type
- That is the type of a class is type
- Use the **`__metaclass__`**  attribute for alternative *type*
- Custom metaclasses can extend *type*
- subclasses inherit **`__metaclass__`**

## Example MetaClass

In [264]:
class CustomMetaclass(type):
    def __init__(cls, name, bases, dct):
        print("Creating class %s using CustomMetaclass" % name)
        super(CustomMetaclass, cls).__init__(name, bases, dct)

class BaseClass(object, metaclass=CustomMetaclass):
    '''Class udsing the Cutom Meta Class'''

class Subclass1(BaseClass):
    pass

Creating class BaseClass using CustomMetaclass
Creating class Subclass1 using CustomMetaclass


## `__new__()`

- **`__new__`** accepts a type as the first argument, and (usually) returns a new instance of that type. 

- **`__init__`** accepts an instance as the first argument and modifies the attributes of that instance. 

## Using `__new__()` to alter class properties

In [265]:
class MetaLogger(type):
    def __new__(cls,name,bases,dct):
        for attribute in dct:
            if callable(dct[attribute]):
                dct[attribute] = debug(dct[attribute])
        return super(MetaLogger,cls).__new__(cls,name,bases,dct)
    
class LoggedExample(metaclass=MetaLogger):

    def x(self, a):
        return a * 2

    
logged = LoggedExample()
logged.x(5)


entering x
returns 10


10

### Why not use `__init__()`?

The `__init__()` method could not do this task because the *new* object has already been created and therefore could not have its class' attributes altered. 

---

## ABC
**Abstract Base Classes**

```python
from abc import ABCMeta
...
metaclass=ABCMeta
```

**Abstract Base Classes**
- allows you to create abstract classes
- @abstractmethod
- @abstractproperty
Subclasses are forced to override the above

## Collections
The collections module contains classes on ABCs.

(Collections)[https://docs.python.org/3/library/collections.html#collections-abstract-base-classes]

Including:
- **`namedtuple`** factory function for creating tuple subclasses with named fields 
- **`deque`** list-like container with fast appends and pops on either end
- **`Counter`** dict subclass for counting hashable objects
- **`OrderedDict`** dict subclass that remembers the order entries were added
- **`defaultdict`** dict subclass that calls a factory function to supply missing values 


---

In [266]:
from collections import *
list_dict = defaultdict(list)

In [267]:
list_dict

defaultdict(list, {})

In [268]:
for row in rows:
    for i, key in enumerate(keys):
        list_dict[key].append(row[i])
        
list_dict

defaultdict(list,
            {'id': [2, 3],
             'username': ['fred', 'sarah'],
             'password': ['qwerty', 'asdf']})

# With and Context Managers
- The **`with`** statement provides for the automatic closing of resources
- Context managers are objects designed for use with the `with` statement.


## Opening Files

The file object is one of the best known context managers.  It makes a useful example to understand the operation of a context manager

In [269]:
   f = open('config.text', 'r')
   #using context
   with open('config.text', 'r') as f:
        'work_with_f'

### With equivalent

```python
try:
    f = open('config.text', 'r')
    'work_with_f'
finally:
    f.close()
```

In [270]:
#Equivalent to the File Class
class File(object):
    def __init__(self, file_name, method):
        self.file_obj = open(file_name, method)
    def __enter__(self):
        return self.file_obj
    def __exit__(self, type, value, traceback):
        print("Exception has been handled")
        self.file_obj.close()
        return True

The context manager simply ensures that the file will be closed once outside the *with* block. 

### How it works

```python
with contextmanager() as c:
    do_stuff
```

- Equivalent

```python
try:
    c = contextmanager().__enter__()
    do_stuff
finally:
    c.__exit__()
```

It is in fact the `__enter__()` and `__exit__()` methods that are called.

### file context
- **`__enter__:`** return self 
- **`__exit__:`** self.close()

The enter and exit methods of the *file* object are simple.

## My Context Manager

In [271]:
class test_context(object):
    def __init__(self, context=''):
        self.context = context

    def __enter__(self):
        print('entering', self.context, 'context')
        return 'my context %s' % self.context
 
    def __exit__(self, exc_type, exc_val, exc_tb):
        print('closing', self.context)

### Using My Context Manager

In [272]:
with test_context('blah') as c:
    print("Welcome to ",c)

entering blah context
Welcome to  my context blah
closing blah


## Context Decorator
- contextlib package
- contextmanager decorator

### Using Context Decorator

In [273]:
from contextlib import contextmanager

In [274]:
@contextmanager
def test_context(context=''):
    # __enter__
    print('entering',context, 'context')
    yield 'my context %s' % context
    # __exit__
    print('closing', context)

### Using contextlib

In [275]:
with test_context('blah') as c:
    print("Welcome to ",c)

entering blah context
Welcome to  my context blah
closing blah


### Supporting the Context-Management Protocol

In [276]:
from socket import socket, AF_INET, SOCK_STREAM

class LazyConnection:
    def __init__(self, address, family=AF_INET, type=SOCK_STREAM):
        self.address = address
        self.family = AF_INET
        self.type = SOCK_STREAM
        self.sock = None

    def __enter__(self):
        if self.sock is not None:
            raise RuntimeError('Already connected')
        self.sock = socket(self.family, self.type)
        self.sock.connect(self.address)
        return self.sock

    def __exit__(self, exc_ty, exc_val, tb):
        self.sock.close()
        self.sock = None

Note:
The key feature of this class is that it represents a network connection, but it doesn't actually do anything initially (e.g., it doesn't establish a connection). Instead, the connection is established and closed using the with statement (essentially on demand). For example:

### Using Solution

In [277]:
from functools import partial

conn = LazyConnection(('www.python.org', 80))
# Connection closed
with conn as s:
    # conn.__enter__() executes: connection open
    s.send(b'GET /index.html HTTP/1.0\r\n')
    s.send(b'Host: www.python.org\r\n')
    s.send(b'\r\n')
    resp = b''.join(iter(partial(s.recv, 8192), b''))
    # conn.__exit__() executes: connection closed

Note:

- The main principle behind writing a context manager is that you're writing code that's meant to surround a block of statements as defined by the use of the with statement. When the with statement is first encountered, the `__enter__()` method is triggered. The return value of `__enter__()` (if any) is placed into the variable indicated with the as qualifier. Afterward, the statements in the body of the with statement execute. Finally, the `__exit__()` method is triggered to clean up.

- This control flow happens regardless of what happens in the body of the with statement, including if there are exceptions. In fact, the three arguments to the `__exit__()` method contain the exception type, value, and traceback for pending exceptions (if any). The `__exit__()` method can choose to use the exception information in some way or to ignore it by doing nothing and returning None as a result. If `__exit__()` returns True, the exception is cleared as if nothing happened and the program continues executing statements immediately after the with block.

- One subtle aspect of this recipe is whether or not the LazyConnection class allows nested use of the connection with multiple with statements. As shown, only a single socket connection at a time is allowed, and an exception is raised if a repeated with statement is attempted when a socket is already in use. You can work around this limitation with a slightly different implementation, as shown here:

### Nested With

In [278]:
from socket import socket, AF_INET, SOCK_STREAM

class LazyConnection:
    def __init__(self, address, family=AF_INET, type=SOCK_STREAM):
        self.address = address
        self.family = AF_INET
        self.type = SOCK_STREAM
        self.connections = []

    def __enter__(self):
        sock = socket(self.family, self.type)
        sock.connect(self.address)
        self.connections.append(sock)
        return sock

    def __exit__(self, exc_ty, exc_val, tb):
        self.connections.pop().close()

### Example use

```python
from functools import partial

conn = LazyConnection(('www.python.org', 80))
with conn as s1:
     ...
     with conn as s2:
          ...
          # s1 and s2 are independent sockets
```

Note:
In this second version, the LazyConnection class serves as a kind of factory for connections. Internally, a list is used to keep a stack. Whenever `__enter__()` executes, it makes a new connection and adds it to the stack. The `__exit__()` method simply pops the last connection off the stack and closes it. It's subtle, but this allows multiple connections to be created at once with nested with statements, as shown.

Context managers are most commonly used in programs that need to manage resources such as files, network connections, and locks. A key part of such resources is they have to be explicitly closed or released to operate correctly. For instance, if you acquire a lock, then you have to make sure you release it, or else you risk deadlock. By implementing `__enter__()`, `__exit__()`, and using the with statement, it is much easier to avoid such problems, since the cleanup code in the `__exit__()` method is guaranteed to run no matter what.

---

# Design Patterns in Python

- Brandon Rhodes Vid http://pyvideo.org/video/1369/python-design-patterns-1
- Example Code: https://github.com/faif/python-patterns

## Design Patterns Implement

- Separation of Concerns
- Single point of responsibility
- Encapsulate what varies
- Code to interface not implementation
- Favour Composition over inheritance
- Dependency Inversion Principle

## Original GoF
- Creational Patterns (5)
- Structural Patterns (7)
- Behvioural Petterns (11)

![image.png](attachment:image.png)

## Creational Patterns

- Factories
- Singletons
- Prototype
- Builder

## Factories

- Trivial in Python because
    - No *new* opertor
    - function and constructor look alike
    

## Factory Method

![Factory.png](images/FactoryUML.png)

### AbstractFacory
![AbstractFactory.png](images/AbstractFactory.png)

## Singletons

- Using a global variable is bad, because you can never switch in something dynamic: mymodule.foo can not be intercepted!
- You should always make callers
  actually invoke a function to get your singleton; minimally
- Can use *The Borg* (shared instance variables rather than single object) instead in Python

![Singleton.png](images/SingletonUML.png)

In [279]:
# A kind-of singleton
class MyClass:
    pass
_singleton = MyClass()


def get_singleton():
    return _singleton

## Singleton User Provided Class

In [280]:
_singleton = None

def get_singleton(cls=MyClass):
    global _singleton
    if _singleton is None:
        _singleton = cls()
    return _singleton

## ProtoType
- Provided by Stdlib
- Copy module
  - Introspect and copy an object
  

![Prototype.png](images/PrototypeUML.png)

## Builder
- The most complex creational
pattern is actually very useful:
- a Builder receives instructions about
what to build, and hides the details
of the instances it links together

![Builder.png](images/Builder.png)

### XML without a builder

In [281]:
from xml.etree import ElementTree as etree

root = etree.Element('body')
h1 = etree.Element('h1')
h1.text = 'The Title'
h1.attrib = {'a':[1,2]}
root.append(h1)
p = etree.Element('p')
p.text = 'Always write Python'
root.append(p)
for child in root:
    print(child.tag)
    print(child.attrib)

h1
{'a': [1, 2]}
p
{}


### XML with a builder

In [282]:
from lxml.builder import E

doc = E('body',
        E('h1', 'The Title'),
        E('p', 'Always write Python'))

## Creational Patterns Summary

- Designed away by language:
    - Abstract Factory
    - Factory Method

- Trivial:
    - Singleton
    - Prototype

- Useful:
    - Builder

## Structural Patterns
- Adapter
- Bridge
- Composite
- Facade
- Flyweight
- Proxy
- Decorator

## Adapter
- Very Useful
- Wraps one class so that it behaves like another class
- Supplies a single interface to a set of interfaces within a system.
- Use When
    - A simple interface is needed to provide access to a complex system. 
    - There are many dependencies between system implementations and clients. 
    - Systems and subsystems should be layered. 

![Adapter.png](images/adaptorUML.png)

### Adapter example
- Sockets do not read and write like files do
- Tornado Future can wrap *asycio* or *trollis*

In [283]:
import socket
s = socket.socket()
s.read

AttributeError: 'socket' object has no attribute 'read'

- Instead, they “send” and “receive”

In [284]:
s.send

<function socket.send>

In [285]:
s.recv

<function socket.recv>

### Adapter in stdlib
- Standard Library provides
- socket._fileobject 
- file-like read() and write()
- operations into send() and recv()

In [286]:
f = s.makefile()
f.read

<function TextIOWrapper.read(size=-1, /)>

## Bridge

- Says not to use subclassing for
two separate purposes in one class
- Then you wind up with

![Bridge.png](images/BridgeUML.png)

**Use When**

- Abstractions and implementations should not be bound at compile time. 
- Abstractions and implementations should be independently extensible. 
- Changes in the implementation of an abstraction should have no impact on clients. 
- Implementation details should be hidden from the client. 

- Partly rationalized by limitations
  of C++ (no mixins, much subclassing)

**Solution: another layer!**

- Have one class that translates
the Plain and Bordered ideas
into primitive window operations

Then

- Have another class that can
perform these operations under
Windows, Mac, and Linux

**This principle actually applies
at many levels in Python!**

**Question**

- My models are full of business
logic — how can I write simple tests
that don't write to the database?
    - Gary Bernhardt at PyCon 2012

```python
class Person(Database_Model):
    def rename(self, new_name): ...
    def direct_deposit(self, amount): ...
```

**Answer**
- Have models that do nothing but
persist information to storage
and then
- write a business logic layer
that implements the operations
that were expressed in methods
- Tests can call *dummy* presistance class

## Composite
- Class whose instances
are designed in a tree structure
- Composition is a common pattern

- email.Message can have messages inside
- lxml.etree.Element lists child elements

![Composite.png](images/CompositeUML.png)

**Two common mechanisms:**

```python
# Store children in attributes

obj.attr = subobject

# Store a list or dict of children

obj.children = [subobject, ...]
```

**Use When**

- Hierarchical representations of objects are needed. 
- Objects and compositions of objects should be treated uniformly. 

## Facade

- An object that hides a complex
tree or network of other objects
- An etree Element lets you root.find()
and root.iter() which traverse the entire
tree of nodes for you

![Facade](images/FacadeUML.png)

- Supplies a single interface to a set of interfaces within a system.
- Use When
    - A simple interface is needed to provide access to a complex system. 
    - There are many dependencies between system implementations and clients. 
    - Systems and subsystems should be layered. 

## Flyweight

- A flyweight is a small object
that is immutable and can be
- re-used in many contexts

![Flywheight](images/FlyweightUML.png)

- Facilitates the reuse of many fine grained objects, making the utilization of large numbers of objects more efficient.
- Use When
    - Many like objects are used and storage cost is high. 
    - The majority of each object's state can be made extrinsic. 
    - A few shared objects can replace many unshared ones. 
    - The identity of each object does not matter. 

**A text box implemented without flyweights**

```python
textbox.chars = [c1, c2, c3, ...]

c1.letter = 'T'
c1.width = 9
c1.height = 10
c1.depth = 0
c1.outline = [(0, 10), (9, 10), ...]
c1.x = 0
c1.y = 14
c1.draw()

c2.letter = 'h'
⋮
c3.letter = 'e'
⋮
```

- You have to create n instances
of the letter “T” so that each can
occupy a different x and y

```python
c1.letter = 'T'
c1.width = 9
c1.height = 10
c1.depth = 0
c1.outline = [(0, 10), (9, 10), ...]
c1.x = 0
c1.y = 14
c1.draw()
```

The flyweight pattern moves this
per-object state out into the
parent or larger context

```python
textbox.chars = [(cT, 0, 14),
                 (ch, 9, 14),
                 (ce, 18, 18),
                 (c_space, 27, 14),
                 (ch, 36, 14),
                 (ce, 45, 18),
                 ...]

cT.letter = 'T'
cT.width = 9
cT.height = 10
cT.depth = 0
cT.outline = [(0, 10), (9, 10), ...]
cT.draw(x, y)
```

### Flyweight
- A small set of objects can each
appear thousands of times inside
of parent objects
-But context like the x and y passed
    - to Character.draw(x, y) will now need
    - to be passed in to flyweight methods
-Saves memory at the expense of noise
and higher coupling

### Flyweight in stdlib

 C-Python uses the flyweight pattern
internally for integer objects

Since integer objects are immutable,
it only keeps a single copy of each small
integer like 0, 1, 2, ... and hands them
to your code over and over again

## Proxy

A proxy object wraps another object called
the subject and accepts method calls and
attribute lookups on its behalf
Proxying can be performed dynamically
in Python with **`__getattr__()`** instead of
having to write n proxying methods

### Proxy

- weakref.proxy(obj) returns a proxy
- Remote procedure call libraries offer
proxies for remote APIs, including the
Standard Library xmlrpc.client.ServerProxy
and also Pyro and rpyc
- The Zope web framework used proxies
that enforced security

![Proxy.png](images/ProxyUML.png)

- Allows for object level access control by acting as a pass through entity or a placeholder object.
- Use When
    - The object being represented is external to the system. 
    - Objects need to be created on demand. 
    - Access control for the original object is required. 
    - Added functionality is required when an object is accessed. 

## Decorator

- Like the Proxy, a Decorator class
offers the same attributes and methods
as the subject that it wraps

- But instead of being completely
transparent, it varies the behavior
or edits the data passing in and
out of the subject

- Decorators tend to appear in “glue”
code in applications, not inside
Python libraries themselves

![Decorator.png](images/DecoratorUML.png)

## Behavioral Patterns
- Chain of responsibility
- Command
- Interpreter
- Iterator
- Mediator
- Memento
- Observer
- State
- State
- Strategy
- Template
- Visitor

## Chain of responsibility

Pass a request along a chain of objects
until one of them decides to handle it

![Chain.png](images/ChainUML.png)

In [287]:
def processClick(self, x, y):
    if self.active:
        self.buttonPress(x, y)
    else:
        self.next.processClick(x, y)

Used by GUIs and DOMs for mouse events
Can be used by “help” feature: maybe button
has no help, but enclosing form does

## Command

Replaces immediate actions like:

```python
paintLine(x1, y1, x2, y2)
```

Instead, you instantiate a command:

```python
cmd = PaintLineCommand(x1, y1, x2, y2)
cmd.do()
command_history.append(cmd)
```
Allows auditing and undo()

![image.png](attachment:image.png)

Crucial to version control systems

## Interpreter

Instead of compiling to machine language,
an interpreted language gets parsed into a
data structure that an interpreter program
iterates across, following the instructions
Data structure can be AST or bytecode
Python itself is interpreted!
Small domain-specific languages do sometimes
get written in Python to ease customization

![Interpreter.png](images/InterpreterUML.png)

## Iterator

- Built-in because Python is awesome
- Introduced in 2001 — the most important
- Python innovation of decade
- See Raymond Hettinger's PyCodeConf 2011
  talk “What Makes Python Awesome?”
  

![Iterator.png](images/IteratorUML.png)

Lets you for x in obj: across your own
user-defined obj, or loop by hand
with `i = iter(obj)` and `i.next()`
Two basic ways to implement iterator
First, you can create an actual class
to be your iterator that remembers:
What it is iterating across
Where it is in the sequence

### Iterator

In [288]:
class Box(object):
    def __init__(self): self.things = [10, 20, 30]
    def __iter__(self): return BoxIterator(self)

class BoxIterator(object):
    def __init__(self, box):
        self.box = box
        self.index = -1

    def next(self):
        self.index += 1
        if self.index >= len(self.box.things):
            raise StopIteration()
        return self.box.things[self.index]

**|Or, you can simply make**

In [289]:
#__iter__() a generator
class Box(object):
    def __init__(self):
        self.things = [10, 20, 30]

    def __iter__(self):
        for thing in things:
            yield thing

### Iterator

- Very powerful pattern
- Eliminates ugly mechanics of iteration
so code is more readable
- Enhances re-use because code is
easy to lock together in new ways;
again, see Raymond's talk!
-  JavaScript is littered with:
for (var i=0; i < box.length; i++) ...

## Mediator

- Parent object responds to child events
and plays out their consequences for
other child objects
- Child objects like buttons or form fields
can remain generic instead of having to
be subclassed to learn specific behavior
- Children have no references to each other

![Mediator.png](images/MediatorUML.png)

**Use When**

- Communication between sets of objects is well defined and complex. 
- Too many relationships exist and common point of control or communication is needed.
- Multiple objects need to notify the same list of observers

## Memento

- A memento is a record of an object's
internal state — might be a string, or
file, or complex data structure
- Callers ask an object instance for a
memento, then can hand it back later
to ask the object to restore itself
to its earlier state
- Similar to pickle but instead of
creating a new object like un-pickling,
it changes an existing object

## Observer

Without Observer pub-sub, you get:

In [290]:
class MyModel:
    def set_total(self, number):
        self.total = number
        self.titlebar.update()
        self.graph.update()
        self.summary.update()

Note:
Very popular pattern in GUIs and DOMs
Display elements let the framework
know that they need to redraw when
specific model attributes change
Your models stay simple and have
no knowledge of the big application
you have build around them, so long
as they signal a list of listeners
when an attribute changes

With Observer pub-sub, things are simpler:

In [291]:
class MyModel:
    def set_total(self, number):
        self.total = number

class MyTitlebar:
    def __init__(self, model):
        subscribe(model, 'total', self.redraw)


## State

```
open()    close()
TCPStart → TCPOpen → TCPClosed
```
What we want to avoid:

```python
if state == 'start':
    if action == 'open':
        ⋮
    elif action == 'close':
        ⋮
elif state == 'open':
    if action == 'open':
        ⋮
    elif action == 'close':
        ⋮
```

In [292]:
from abc import ABC, abstractmethod 

class AbstractState(ABC):
    
    @abstractmethod
    def open(self): pass

    @abstractmethod
    def close(self): pass
    

class StateClosed(AbstractState):
    
    def open(self):
        print("opening")
        return StateOpen()
    
    def close(self):
        print('already closed')
        return self
    
    def __init__(self):pass

In [293]:
 class StateOpen(AbstractState):

    def open(self):
        print("already open")
        return self

    def close(self):
        print('Closing')
        return StateClosed()
    
    def __init__(self):pass


In [294]:
class State(AbstractState):

    def __init__(self, state=StateClosed()):
        self.state = state

    def open(self):
        self.state=self.state.open()
        
    def close(self):
        self.state=self.state.close()

In [295]:
s = State()
s.state

<__main__.StateClosed at 0x10d3c4048>

In [296]:
s.open()
s.state

opening


<__main__.StateOpen at 0x10d39cba8>

open()    close()
TCPStart → TCPOpen → TCPClosed
Represent each state as a class
like TCPOpen or TCPClosed
Give each class one method for
each transition like open()
and close() that returns the
next state
Avoids huge nested if-then's, but
simple state machines can be a dict
in Python instead

## Strategy

Parametrize a big process by passing in
an object that specifies custom behavior
Python has first-class functions so we
often just pass callbacks instead
A simple example of the Strategy pattern
is the key= function that we pass to
sorted() and list.sort()

## Template

Give your classes empty methods like
aboutToStart() and justStarted()
that a subclass can customize
The original author of threading in
the Standard Library intended callers to
subclass (!) Thread and provide their
own, more interesting run() method
In Python we often pass a callable instead,
or — when there are many callbacks — pass
a Strategy object instead

## Visitor

Replaces doThis() and doThat() methods on
tree nodes, that act on the current node and
then call the same method on their children,
with a do(action) traversal method
Not frequently encountered in Python
We tend to turn this problem inside out:
for node in lxml_root.walk():  # or os.walk!
    # do something to the node
Behavioral Patterns

Are big solutions for big problems
Most patterns work fine in Python
Many of them turn up in big and
popular libraries, or even in the
Standard Library
The Iterator pattern is now a
foundation of how we write Python/

---

# Regular Expressions

- Pattern matching
- Validation
- Searching, Replacing etc...

## re module

In [297]:
import re

- search
- match
- sub
- finditer, findall
- compile

### Metachars

|Char    | Matches                        | example                    |
|:------|:--------------------------------|:----------------------------|
|**`.`**     |Match any character             |`foo.bar` foodbar             |
|**`^`**     |Match the beginning of line     |`^hi` lines starting "hi"     |
|**`$`**     | Match the end of line          |`hi$` lines ending "hi"       |
|**`[ABC]`** |Match A or B or C               |`h[ae]ll` hall, hell          |
|**`[^DEF]`**|Any character except D or E or F|`h[^ae]ll` h9ll but not hall  |
|**`[A-Z]`** |A character in the range A-Z    |`A[A-Z]T` ANT, but not AnT    |

### Special Character Sets
|Char|Meaning|
|--|:-------|
| **`\w`**|Word characters, same as `[A-Za-z0-9_]`|
| **`\W`**|Not word characters, same as `[^A-Za-z0-9_]`|
| **`\d`**|Digit same as `[0-9]`|
| **`\D`**|Not Digit, same as `[^0-9]`|
| **`\s`**|White space, that is space, tab, newline, vspace, hspace etc.|
| **`\S`**|Not whitespace|
| **`\b`**|Word break. Matches a word character to non word character transition or vice versa|

### Quantifiers
|Char|Number of matches|
|--|:-------|
| **`*`**|Any number of times|
| **`+`**|1 or more times|
| **`?`**|0 or 1|
| **`{N,M}`**|N,M are ints N <= mathes <=M|
| **`{N,}`**|At least N times (No max)|

Match an IP address, assuming that 0.0.0.0 up to 999.999.999.999 is possible, not 255.255.255.255 as in reality.

In [298]:
import re
ip_regex = r'^(\d{1,3}\.){3}\d{1,3}$'
re.search(ip_regex, '192.168.1.1')

<re.Match object; span=(0, 11), match='192.168.1.1'>

In [299]:
email_regex = r'\b[\w.-]+@[b\w.-]+\b'
re.search(email_regex, 'my email is fred.bloggs@acme-trading.com what is yours')

<re.Match object; span=(12, 40), match='fred.bloggs@acme-trading.com'>

## Groups

In [300]:
email_regex = r'\b([\w.-]+)@([b\w.-]+)\b'
match = re.search(email_regex, 'my email is fred.bloggs@acme-trading.com what is yours')

In [301]:
match.groups()

('fred.bloggs', 'acme-trading.com')

In [302]:
match.group(1)

'fred.bloggs'

In [303]:
match.group(2)

'acme-trading.com'

In [304]:
match.group(0)

'fred.bloggs@acme-trading.com'

# Profiling
- *__cProfile__* is a  deterministic profiling module for Python programs.  (Compiles in C) 
- A profile is a set of statistics that describes how often and for how long various parts of the program executed. 
- These statistics can be formatted into reports via the *__pstats__* module.

## Calling cProfile

In [305]:
import cProfile
import re
cProfile.run('re.compile("foo|bar")')

         214 function calls (207 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 enum.py:284(__call__)
        2    0.000    0.000    0.000    0.000 enum.py:526(__new__)
        1    0.000    0.000    0.000    0.000 enum.py:836(__and__)
        1    0.000    0.000    0.000    0.000 re.py:232(compile)
        1    0.000    0.000    0.000    0.000 re.py:271(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:249(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:276(_optimize_charset)
        2    0.000    0.000    0.000    0.000 sre_compile.py:453(_get_iscased)
        1    0.000    0.000    0.000    0.000 sre_compile.py:461(_get_literal_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:492(_get_charset_prefix)
        1   

### cProfile output columns
- **ncalls**
    for the number of calls,
- **tottime**
    for the total time spent in the given function (and excluding time made in calls to sub-functions)
- **percall**
    is the quotient of tottime divided by ncalls
- **cumtime**
    is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
- **percall**
    is the quotient of cumtime divided by primitive calls
- **filename:lineno(function)**
    provides the respective data of each function

### Calling cProfile
- shell `python -m cProfile -s tottime your_program.py`
- RunSnakeRun is GUI program for viewing cProfile

### Line Profiling
- Line by line profiling
- Finer grained

Note:
Sometimes the profiler may lump together methods calls from different parts of your code, or just won't be precise enough.

`pip install line_profiler`

Then, modify your code and decorate each function that you want to profile with @profile. Let's decorate our function so it will look like this:

```python
@profile
def write_sorted_letters(nb_letters=10**7):
    ...
```

You can then run your program through the line profiler like this:

`kernprof -l -v your_program.py`

- `-l` for line by line profile
- `-v` for immediately visualizing the results

### result

    Total time: 21.4412 s
    File: ./sort.py
    Function: write_sorted_letters at line 5
    Line #      Hits         Time    Per Hit   % Time  Line Contents
    ================================================================
     5                                             @profile
     6                                             def write_sorted_letters(nb_letters=10**7):
     7         1            1        1.0      0.0  random_string = ''
     8  10000001      3230206        0.3     15.1  for _ in range(nb_letters):
     9  10000000      9352815        0.9     43.6  random_string += random.choice('abcdefghijklmnopqrstuvwxyz')
    10         1      1647254  1647254.0      7.7  sorted_string = sorted(random_string)
    11                                           
    12         1         1334     1334.0      0.0  with open("sorted_text.txt", "w") as sorted_text:
    13  10000001      2899712        0.3     13.5  for character in sorted_string:
    14  10000000      4309926        0.4     20.1  sorted_text.write(character)

### Analysis
- First, note that this profiling tool is slowing down our program almost twofold, from 11 seconds to 21.

- But there is an upside: we have our program on the right hand side and can see which lines are impacting the performance of our application and how much so.

### Realtime Continuous  Profiling

- The integrated profiler is simple enough and can get you where you want in terms of performance for single-thread, local workloads. But a large, threaded, web application is a whole different beast.

- Let's have a look at the awesome Profiling module.

- First, install it with sudo pip install profiling, then run your program through it: profiling your_program.py. Don't forget to remove the @profile decorator that will only work with the line_profiler.

**Realtime Continuous Profiling Continued**

- It gives us a detailed tree-like view of the profile at the end of the program's run:


- It's interactive so you can navigate and fold/unfold each line by using the arrow keys.

- There is also a live mode for long running processes such as web servers. You can invoke it like this: profiling live-profile your_server_program.py. You can interact with it as your program is running to explore your program's performance profile as it runs.

---

# Identifying memory leaks

## The Muppy Module
Muppy allows you to get hold of all objects,

In [306]:
from pympler import muppy
all_objects = muppy.get_objects()
len(all_objects)                           

140843

Note:
Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks. Let's have a look what you can do with muppy.
The muppy module

### Using The Muppy Module
or filter out certain types of objects.

In [307]:
import types
my_types = muppy.filter(all_objects, Type=types.CodeType)
len(my_types)

14878

In [308]:
for t in my_types[:10]:
    print(t)

<code object _wrap at 0x10a762150, file "<frozen importlib._bootstrap>", line 27>
<code object _new_module at 0x10a7621e0, file "<frozen importlib._bootstrap>", line 35>
<code object _get_module_lock at 0x10a762c90, file "<frozen importlib._bootstrap>", line 157>
<code object _lock_unlock_module at 0x10a762d20, file "<frozen importlib._bootstrap>", line 194>
<code object _call_with_frames_removed at 0x10a762db0, file "<frozen importlib._bootstrap>", line 211>
<code object _verbose_message at 0x10a762e40, file "<frozen importlib._bootstrap>", line 222>
<code object _requires_builtin at 0x10a762f60, file "<frozen importlib._bootstrap>", line 230>
<code object _requires_frozen at 0x10a7ad0c0, file "<frozen importlib._bootstrap>", line 241>
<code object _load_module_shim at 0x10a7ad150, file "<frozen importlib._bootstrap>", line 253>
<code object _module_repr at 0x10a7ad1e0, file "<frozen importlib._bootstrap>", line 269>


Note:
This result, for example, tells us that the number of lists remained the same, but the memory allocated by lists has increased by 8 bytes. The correct increase for a LP64 system (see 64-Bit_Programming_Models).
The summary module

### You can create summaries

In [309]:
from pympler import summary
sum1 = summary.summarize(all_objects)
summary.print_(sum1)

                              types |   # objects |   total size
                                str |       52096 |      6.93 MB
                               dict |       13163 |      5.42 MB
                               code |       14878 |      2.05 MB
                               type |        2311 |      1.98 MB
                              tuple |       12059 |    783.04 KB
                                set |         472 |    475.75 KB
                 _io.BufferedWriter |           3 |    384.52 KB
                               list |        2128 |    354.66 KB
                            weakref |        3253 |    254.14 KB
                        abc.ABCMeta |         241 |    243.08 KB
  traitlets.traitlets.MetaHasTraits |         194 |    189.56 KB
                 wrapper_descriptor |        2382 |    186.09 KB
                  getset_descriptor |        2219 |    156.02 KB
                function (__init__) |        1110 |    147.42 KB
                         

### Compare them with other summaries.

In [310]:
sum2 = summary.summarize(muppy.get_objects())
diff = summary.get_diff(sum1, sum2)
summary.print_(diff)                          

                       types |   # objects |   total size
                        list |        9570 |      2.12 MB
                         str |        9561 |    684.40 KB
                         int |        1935 |     52.92 KB
                        dict |          45 |      8.46 KB
                       tuple |          46 |      2.98 KB
                       bytes |          22 |      2.60 KB
       zmq.sugar.frame.Frame |           7 |      1.26 KB
                     weakref |          16 |      1.25 KB
           member_descriptor |          16 |      1.12 KB
                        cell |          23 |      1.08 KB
  builtin_function_or_method |          15 |      1.05 KB
           method_descriptor |          12 |    864     B
         function (<lambda>) |           6 |    816     B
                      method |          11 |    704     B
             _asyncio.Future |           5 |    640     B


## The tracker module

In [311]:
from pympler import tracker
tr = tracker.SummaryTracker()
tr.print_diff()

                       types |   # objects |   total size
                        list |        9538 |    895.59 KB
                         str |        9535 |    682.04 KB
                         int |        1927 |     52.69 KB
                        dict |           3 |    592     B
       function (store_info) |           1 |    136     B
                        cell |           2 |     96     B
                      method |           1 |     64     B
            _ast.Interactive |           1 |     56     B
                 _ast.Module |          -1 |    -56     B
  builtin_function_or_method |          -3 |   -216     B
                       tuple |          -9 |   -632     B


Note:
A tracker object creates a summary (that is a summary which it will remember) on initialization. Now whenever you call tracker.print_diff(), a new summary of the current state is created, compared to the previous summary and printed to the console. As you can see here, quite a few objects got in between these two invocations. But if you don't do anything, nothing will change.

**Diff 2**

In [312]:
tr.print_diff()                               

                       types |   # objects |   total size
                       bytes |          13 |      1.54 KB
                       tuple |          14 |    912     B
                        list |           5 |    664     B
                         str |           7 |    659     B
  builtin_function_or_method |           4 |    288     B
                        cell |           6 |    288     B
         function (<lambda>) |           2 |    272     B
               list_iterator |           3 |    168     B
                        code |           1 |    144     B
              sqlite3.Cursor |           1 |    112     B
                     weakref |           1 |     80     B
                      method |           1 |     64     B
                       float |           1 |     24     B
              _ast.Attribute |          -1 |    -56     B
                   _ast.Call |          -1 |    -56     B


**Now check out this code snippet**

In [313]:
i = 1
l = [1,2,3,4]
d = {}
tr.print_diff()                               

        types |   # objects |   total size
         dict |          14 |      1.66 KB
         list |           8 |    680     B
          str |           7 |    595     B
     _ast.Num |           5 |    280     B
        tuple |           4 |    272     B
    _ast.Name |           3 |    168     B
  _ast.Assign |           3 |    168     B
        bytes |           1 |    157     B
      weakref |           1 |     80     B
    _ast.List |           1 |     56     B
    _ast.Dict |           1 |     56     B
          int |           1 |     28     B
        float |           1 |     24     B


Note:
As you can see both, the new list and the new dict appear in the summary, but not the 4 integers used. Why is that? Because they existed already before they were used here, that is some other part in the Python interpreter code makes already use of them. Thus, they are not new.
The refbrowser module

## Reference Browser
In case some objects are leaking and you don't know where they are still referenced, you can use the referrers browser. At first let's create a root object which we then reference from a tuple and a list.

In [314]:
from pympler import refbrowser
root = "some root object"
root_ref1 = [root]
root_ref2 = (root, )

def output_function(o):
    return str(type(o))

In [315]:
cb = refbrowser.ConsoleBrowser(root, maxdepth=2, str_func=output_function)

## ConsoleBrowser
Then we create a ConsoleBrowser, which will give us a referrers tree starting at root, printing to a maximum depth of 2, and uses str_func to represent objects. Now it's time to see where we are at.

In [316]:
>>> cb.print_tree()

<class 'str'>-+-<class 'list'>--<class 'dict'>
              +-<class 'tuple'>--<class 'dict'>
              +-<class 'dict'>-+-<class 'function'>
                               +-<class 'dict'>
                               +-<class 'module'>
                               +-<class 'dict'>
                               +-<class 'dict'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<class 'function'>
                               +-<c

Note:
What we see is that the root object is referenced by the tuple and the list, as well as by three dictionaries. These dictionaries belong to the environment, e.g. the ConsoleBrowser we just started and the current execution context.

---

# Performance Tips


1. Get it right.
2. Test it's right.
3. Profile if slow.
4. Optimise.
5. Repeat from 2.

Note:
you can only know what makes your program slow after first getting the program to give correct results, then running it to see if the correct program is slow. When found to be slow, profiling can show what parts of the program are consuming most of the time. A comprehensive but quick-to-run test suite can then ensure that future optimizations don't change the correctness of your program. In short:

Certain optimizations amount to good programming style and so should be learned as you learn the language. An example would be moving the calculation of values that don't change within a loop, outside of the loop.

## Choose the Right Data Structure
if ...
- del anylist[0]
- anylist.pop(0)
- anylist.insert(0,o)

then ...

use **deque**

** Do not do this **

```python
s = ""
for substring in list:
    s += substring
```

- Use `s = "".join(list)` instead. 
- The former is a very common and catastrophic mistake when building large strings. Similarly, if you are generating bits of a string sequentially instead of:

## Loops
- Loop *may* be less efficient than
   - map
   - filter
   - reduce

Note:
Python supports a couple of looping constructs. The for statement is most commonly used. It loops over the elements of a sequence, assigning each to the loop variable. If the body of your loop is simple, the interpreter overhead of the for loop itself can be a substantial amount of the overhead. This is where the map function is handy. You can think of map as a for moved into C code. The only restriction is that the "loop body" of map must be a function call. Besides the syntactic benefit of list comprehensions, they are often as fast or faster than equivalent use of map.

### Example Loop -> Map

```python
newlist = []
for word in oldlist:
    newlist.append(word.upper())
```

- you can use map to push the loop from the interpreter into compiled C code:
- Avoid attribute lookups by caching in a local, especially if it's occurring in a high-iteration loop. Everything that doesn't change can be cached as local variables, even instance methods. (This is one of the optimizations that the pypy JIT does automatically)

## Guido van Rossums Tips
- Avoid overengineering datastructures. Tuples are better than objects (try namedtuple too though). Prefer simple fields over getter/setter functions.

- Built-in datatypes are your friends. Use more numbers, strings, tuples, lists, sets, dicts. Also check out the collections library, esp. deque.

- Be suspicious of function/method calls; creating a stack frame is expensive.

- Don't write Java (or C++, or Javascript, ...) in Python.

- Are you sure it's too slow? Profile before optimizing!

- The universal speed-up is rewriting small bits of code in C. Do this only when all else fails.

## Avoiding dots...

```python
upper = str.upper
newlist = []
append = newlist.append
for word in oldlist:
    append(upper(word))
```

Note:
Suppose you can't use map or a list comprehension? You may be stuck with the for loop. The for loop example has another inefficiency. Both newlist.append and word.upper are function references that are reevaluated each time through the loop. The original loop can be replaced with:

This technique should be used with caution. It gets more difficult to maintain if the loop is large. Unless you are intimately familiar with that piece of code you will find yourself scanning up to check the definitions of append and upper.

### Local Variables
- Local faster than global

Note:
The final speedup available to us for the non-map version of the for loop is to use local variables wherever possible. If the above loop is cast as a function, append and upper become local variables. Python accesses local variables much more efficiently than global variables.

### Import Statement Overhead
- import only what is needed
- put imports in conditional 

Note:
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

### Data Aggregation
- Loop inside function

Note:
Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates. Here's a contrived example written in Python.

In [317]:
import time

x = 0
def doit1(i):
    global x
    x = x + i

list = range(100000)
t = time.time()
for i in list:
    doit1(i)

In [318]:
x = 0
def doit2(list):
    global x
    for i in list:
        x = x + i


### Doing Stuff Less Often
- **sys.setcheckinterval**

Note:
The Python interpreter performs some periodic checks. In particular, it decides whether or not to let another thread run and whether or not to run a pending call (typically a call established by a signal handler). Most of the time there's nothing to do, so performing these checks each pass around the interpreter loop can slow things down. There is a function in the sys module, setcheckinterval, which you can call to tell the interpreter how often to perform these periodic checks. Prior to the release of Python 2.3 it defaulted to 10. In 2.3 this was raised to 100. If you aren't running with threads and you don't expect to be catching many signals, setting this to a larger value can improve the interpreter's performance, sometimes substantially.

### Use xrange instead of range
- range creates a real lis
- use more memory
- generators rather than structures
- Py3 converts many real lists in Py2 to generators

### Re-map Functions at runtime

- Say you have a function

In [319]:
class Test:
    def check(self,a,b,c):
        if a == 0:
            self.str = b*100
        else:
            self.str = c*100

In [320]:
a = Test()
def example():
    for i in range(0,100000):
        a.check(i,"b","c")

In [321]:
import profile
profile.run("example()")

         100005 function calls in 0.368 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.368    0.368 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
   100000    0.180    0.000    0.180    0.000 <ipython-input-319-4c9847eebec3>:2(check)
        1    0.187    0.187    0.368    0.368 <ipython-input-320-ace2934187fc>:2(example)
        1    0.000    0.000    0.368    0.368 <string>:1(<module>)
        1    0.000    0.000    0.368    0.368 profile:0(example())
        0    0.000             0.000          profile:0(profiler)




And suppose this function gets called from somewhere else many times.

Well, your check will have an if statement slowing you down all the time except the first time, so you can do this:

**If runs only once**

In [322]:
class Test2:
    def check(self,a,b,c):
        self.str = b*100
        self.check = self.check_post
    def check_post(self,a,b,c):
        self.str = c*100

In [323]:
a = Test2()
def example2():
    for i in range(0,100000):
        a.check(i,"b","c")

import profile
profile.run("example2()")

         100005 function calls in 0.366 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.366    0.366 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.000    0.000    0.000    0.000 <ipython-input-322-53cf9952682c>:2(check)
    99999    0.177    0.000    0.177    0.000 <ipython-input-322-53cf9952682c>:5(check_post)
        1    0.189    0.189    0.366    0.366 <ipython-input-323-eda624e9d4ab>:2(example2)
        1    0.000    0.000    0.366    0.366 <string>:1(<module>)
        1    0.000    0.000    0.366    0.366 profile:0(example2())
        0    0.000             0.000          profile:0(profiler)




---

# Multithreading
- Multitasking
- Multiprocessing
- Multicore
- Explicit VS Implicit Multii-Threading

## GIL
- Global Interpreter Lock
- Only one python implicit thread runs concurrently per processor
- Implicit threads cannot be used to achieve multiprocessing.

## Thread and Threading
- thread package low level
    - only use thead.lock
- threaing
    - Thread class

## Using threading.Thread
- Constructor

```python
Thread(group=None, target=None, name=None, 
       args=(), kwargs=None, verbose=None)
```

- Should be called with named args as the order of the signature may be altered in future.

## Thread constructor args
- **`group`** should be None; reserved for future extension when a ThreadGroup class is implemented.

- **`target`** is the callable object to be invoked by the `run()` method. 

- **`name`** is the thread name. Defaults to a "Thread-N" where N is a unique int.
- **`args`**, **`kwargs`** arguments for *target*, empty by default.

## Two Approaches to Thread
- Two approaches subclass thread and overide *run*
- Set target to existing function
- `start()` calls *run* or *target* if not None

## Trivial Example

In [324]:
from threading import Thread
from time import sleep

def timeout_message(message=None, repeat=3, timeout=5):
    def target():
        for i in range(repeat):
            sleep(timeout)
            print('\n'+message)
    
    Thread(target=target).start()
    input('Press Enter when done')

timeout_message('hello')


hello
Press Enter when donea

hello

hello


### Daemonization?
- A *thread* is said to be a *daemon* if python can terminate in spite of *thread*.
- The previous example does not terminate unless both *&lt;enter&gt;* has been pressed and `target()`  has completed.

To Daemonize, replace:

```python
Thread(target=target).start()
```

With:

```python
t = Thread(target=target)
t.setDaemon(1)
t.start()
```

## Local Storage
- `l=threading.Local()`
- `l.get()`, `l.set()`   return differnt values for each *thread*.
- `l.__dict__` Contains values

- See the primes example.

## Safe Multi Threading
- All External Resources must be handled by a single thread.
- All other threads mus communicate via an *atomic* messaging system. e.g.:
    - Queue Module
    - Database
- OR use locks
- The logging module uses locks.

Note:
[Excellent talk by Raymond Hettinger](https://youtu.be/Bv25Dwe84g0)
[slides](https://dl.dropboxusercontent.com/u/3967849/pyru/_build/html/threading.html)

### Atomic operations
- reading or replacing a single instance attribute
- reading or replacing a single global variable
- fetching an item from a list
- modifying a list in place (e.g. adding an item using append)
- fetching an item from a dictionary
- modifying a dictionary in place (e.g. adding an item, or calling the clear method)

## Locking
- Locks 
- most fundamental synchronization mechanism

Note:
Locks are the most fundamental synchronization mechanism provided by the threading module. At any time, a lock can be held by a single thread, or by no thread at all. If a thread attempts to hold a lock that's already held by some other thread, execution of the first thread is halted until the lock is released.

Locks are typically used to synchronize access to a shared resource. For each shared resource, create a Lock object. When you need to access the resource, call acquire to hold the lock (this will wait for the lock to be released, if necessary), and call release to release it:

```python
lock = Lock()

lock.acquire() # will block if lock is already held
... access shared resource
lock.release()
```

### Always release

```python
lock.acquire()
try:
    ... access shared resource
finally:
    lock.release() # release lock, no matter what
```

Note:
- For proper operation, it's important to release the lock even if something goes wrong when accessing the resource. You can use try-finally for this purpose:

### Lock and with

```python
from __future__ import with_statement # 2.5 only

with lock:
    ... access shared resource
```

The acquire method takes an optional wait flag, which can be used to avoid blocking if the lock is held by someone else. If you pass in False, the method never blocks, but returns False if the lock was already held:

```python
if not lock.acquire(False):
    ... failed to lock the resource
else:
    try:
        ... access shared resource
    finally:
        lock.release()
```

Note:
In Python 2.5 and later, you can also use the with statement. When used with a lock, this statement automatically acquires the lock before entering the block, and releases it when leaving the block:

### lock.locked?

```python
if not lock.locked():
    # some other thread may run before we get
    # to the next line
    lock.acquire() # may block anyway```

Note:
You can use the locked method to check if the lock is held. Note that you cannot use this method to determine if a call to acquire would block or not; some other thread may have acquired the lock between the method call and the next statement.

## Problems with Simple Locking 
- Locks are difficult to manage
- Hard to reason about

Note: 
The standard lock object doesn't care which thread is currently holding the lock; if the lock is held, any thread that attempts to acquire the lock will block, even if the same thread is already holding the lock. Consider the following example:

### Locking problem example

```python
lock = threading.Lock()

def get_first_part():
    lock.acquire()
    try:
        # ... fetch data for first part from shared object
    finally:
        lock.release()
    return data

def get_second_part():
    lock.acquire()
    try:
        ... fetch data for second part from shared object
    finally:
        lock.release()
    return data
```

Note:
Here, we have a shared resource, and two access functions that fetch different parts from the resource. The access functions both use locking to make sure that no other thread can modify the resource while we're accessing it.

## Re-Entrant Locks (RLock) 
- rlock only if lock is held by another thread

Note:
The RLock class is a version of simple locking that only blocks if the lock is held by another thread. While simple locks will block if the same thread attempts to acquire the same lock twice, a re-entrant lock only blocks if another thread currently holds the lock. If the current thread is trying to acquire a lock that it's already holding, execution continues as usual.

### Rlock example

```python
lock = threading.Lock()
lock.acquire()
lock.acquire() # this will block

lock = threading.RLock()
lock.acquire()
lock.acquire() # this won't block
```

Note:
The main use for this is nested access to shared resources, as illustrated by the example in the previous section. To fix the access methods in that example, just replace the simple lock with a re-entrant lock, and the nested calls will work just fine.

## Semaphores 
- advanced lock
- counter not flag
- allows multiple but limitted no. threads 

Note:

A semaphore is a more advanced lock mechanism. A semaphore has an internal counter rather than a lock flag, and it only blocks if more than a given number of threads have attempted to hold the semaphore. Depending on how the semaphore is initialized, this allows multiple threads to access the same code section simultaneously.

### Semaphore Example

```python
semaphore = threading.BoundedSemaphore()
semaphore.acquire() # decrements the counter
... access the shared resource
semaphore.release() # increments the counter
```

Note:
The counter is decremented when the semaphore is acquired, and incremented when the semaphore is released. If the counter reaches zero when acquired, the acquiring thread will block. When the semaphore is incremented again, one of the blocking threads (if any) will run.

Semaphores are typically used to limit access to resource with limited capacity, such as a network connection or a database server. Just initialize the counter to the maximum number, and the semaphore implementation will take care of the rest.

`max_connections = 10`

semaphore = threading.BoundedSemaphore(max_connections)
If you don't pass in a value, the counter is initialized to 1.

Python's threading module provides two semaphore implementations; the Semaphore class provides an unlimited semaphore which allows you to call release any number of times to increment the counter. To avoid simple programming errors, it's usually better to use the BoundedSemaphore  class, which considers it to be an error to call release more often than you've called acquire.

## Events

```python
import threading
event = threading.Event()

# a client thread can wait for the flag to be set
event.wait()

# a server thread can set or reset it
event.set()
event.clear()
```

Note:
An event is a simple synchronization object; the event represents an internal flag, and threads can wait for the flag to be set, or set or clear the flag themselves.

If the flag is set, the wait method doesn't do anything. If the flag is cleared, wait will block until it becomes set again. Any number of threads may wait for the same event.

## Conditions 
- advanced event
- thread can wait condition to *notify*

Note:

A condition is a more advanced version of the event object. A condition represents some kind of state change in the application, and a thread can wait for a given condition, or signal that the condition has happened. Here's a simple consumer/producer example. First, you need a condition object:

### represents the addition of an item to a resource

```python
condition = threading.Condition()
```

Note:
The producing thread needs to acquire the condition before it can notify the consumers that a new item is available:

### producer thread

```python
... generate item
condition.acquire()
... add item to resource
condition.notify() # signal that a new item is available
condition.release()
```

Note:
The consumers must acquire the condition (and thus the related lock), and can then attempt to fetch items from the resource:

### consumer thread

```python
condition.acquire()
while True:
    ... get item from resource
    if item:
        break
    condition.wait() # sleep until item becomes available
condition.release()
... process item
```

Note:
The wait method releases the lock, blocks the current thread until another thread calls notify or notifyAll on the same condition, and then reacquires the lock. If multiple threads are waiting, the notify method only wakes up one of the threads, while notifyAll always wakes them all up.

To avoid blocking in wait, you can pass in a timeout value, as a floating-point value in seconds. If given, the method will return after the given time, even if notify hasn't been called. If you use a timeout, you must inspect the resource to see if something actually happened.

Note that the condition object is associated with a lock, and that lock must be held before you can access the condition. Likewise, the condition lock must be released when you're done accessing the condition. In production code, you should use `try`-`finally` or with, as shown earlier.

To associate the condition with an existing lock, pass the lock to the Condition constructor. This is also useful if you want to use several conditions for a single resource:

`lock = threading.RLock()`

`condition_1 = threading.Condition(lock)`

`condition_2 = threading.Condition(lock)`

## Queue Module
- provides a FIFO
    - first-in, first-out
- suitable for multi-threading
    - Locking for caller

Note:
The Queue module provides a FIFO implementation suitable for multi-threaded programming. It can be used to pass messages or other data between producer and consumer threads safely. Locking is handled for the caller, so it is simple to have as many threads as you want working with the same Queue instance. A Queue's size (number of elements) may be restricted to throttle memory usage or processing.

The Queue class implements a basic first-in, first-out container. Elements are added to one “end” of the sequence using `put()`, and removed from the other end using `get()`.

In [325]:
import queue
# was Queue in python 2.x

q = queue.Queue()

for i in range(5):
    q.put(i)

while not q.empty():
    print(q.get())

0
1
2
3
4


Note:
This example uses a single thread to illustrate that elements are removed from the queue in the same order they are inserted.

### LIFO Queue
- LifoQueue
- last-in, first-out
- stack

Note:
In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out ordering (normally associated with a stack data structure).
|||

In [326]:
q = queue.LifoQueue()

for i in range(5):
    q.put(i)

while not q.empty():
    print(q.get())

4
3
2
1
0


- The item most recently put() into the queue is removed by get().

### Priority Queue
- PriorityQueue
- Uses the sort order

Note:

Sometimes the processing order of the items in a queue needs to be based on characteristics of those items, rather than just the order they are created or added to the queue. For example, print jobs from the payroll department may take precedence over a code listing printed by a developer. PriorityQueue uses the sort order of the contents of the queue to decide which to retrieve.

In [327]:

class Job(object):
    def __init__(self, priority, description):
        self.priority = priority
        self.description = description
        print('New job:', description)
        return
    def __lt__(self, other):
        return self.priority < other.priority

In [328]:
q = queue.PriorityQueue()

q.put( Job(3, 'Mid-level job') )
q.put( Job(10, 'Low-level job') )
q.put( Job(1, 'Important job') )

New job: Mid-level job
New job: Low-level job
New job: Important job


In [329]:
while not q.empty():
    next_job = q.get()
    print('Processing job:', next_job.description)

Processing job: Important job
Processing job: Mid-level job
Processing job: Low-level job


In this single-threaded example, the jobs are pulled out of the queue in strictly priority order. If there were multiple threads consuming the jobs, they would be processed based on the priority of items in the queue at the time get() was called.

### Using Queues with Threads
- printing to the terminal is not a "thread safe"
- Edit primes example to place all `print` statements in a single thread
- Use Queue to send messages for printing

# Parallell Processing
- GIL Prevents Genuine Parallel Processing
- Have to explicit multi-threading or multi-processing

Note: 
CPUs with multiple cores have become the standard in the recent
development of modern computer architectures and we can not only find
them in supercomputer facilities but also in our desktop machines at
home, and our laptops; even Apple’s iPhone 5S got a 1.3 Ghz Dual-core
processor in 2013.

However, the default Python interpreter was designed with simplicity in
mind and has a thread-safe mechanism, the so-called “GIL” (Global
Interpreter Lock). In order to prevent conflicts between threads, it
executes only one statement at a time (so-called serial processing, or
single-threading).

In this introduction to Python’s `multiprocessing`
module, we will see how we can spawn multiple subprocesses to avoid some
of the GIL’s disadvantages.

### Multi-Threading vs. Multi-Processing

Note:

Depending on the application, two common approaches in parallel
programming are either to run code via threads or multiple processes,
respectively. If we submit “jobs” to different threads, those jobs can
be pictured as “sub-tasks” of a single process and those threads will
usually have access to the same memory areas (i.e., shared memory). This
approach can easily lead to conflicts in case of improper
synchronization, for example, if processes are writing to the same
memory location at the same time.

A safer approach (although it comes with an additional overhead due to
the communication overhead between separate processes) is to submit
multiple processes to completely separate memory locations (i.e.,
distributed memory): Every process will run completely independent from
each other.

## The `Process` class

In [330]:
import multiprocessing as mp
import random
import string

random.seed(123)

# Define an output queue
output = mp.Queue()

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                        string.ascii_lowercase
                        + string.ascii_uppercase
                        + string.digits)
                   for i in range(length))
    output.put(rand_str)

Note:

The most basic approach is probably to use the
`Process`  class from the
`multiprocessing`  module.
Here, we will use a simple queue function to compute the cubes for the 6
numbers 1, 2, 3, 4, 5, and 6 in 6 parallel processes.

**Setup a list of processes that we want to run**

In [331]:
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]

**Run processes**

In [332]:
for p in processes:
    p.start()

**Exit the completed processes**

In [333]:
for p in processes:
    p.join()

**Get process results from the output queue**

In [334]:
results = [output.get() for p in processes]

print(results)

['pQSVw', 'xfFcp', 'oR4at', '3BVWD']


### Ordered Results

Note:
The order of the obtained results does not necessarily have to match the
order of the processes (in the `processes` list).
Since we eventually use the `.get()` method to
retrieve the results from the `Queue` sequentially,
the order in which the processes finished determines the order of our
results.
E.g., if the second process has finished just before the first process,
the order of the strings in the `results` list could
have also been
`['PQpqM', 'yzQfA', 'SHZYV', 'PSNkD']` instead of
`['yzQfA', 'PQpqM', 'SHZYV', 'PSNkD']`

The results could be sorted by their original order in the following manner.

Put tuples in the queue with an index called `pos` below.

```python
output.put((pos, rand_str))


...
[(0, 'h5hoV'), (1, 'fvdmN'), (2, 'rxGX4'), (3, '8hDJj')]
```

### The `Pool` class

Another and more convenient approach for simple parallel processing
tasks is provided by the `Pool` class.

There are four methods that are particularly interesing:

-   `Pool.apply`

-   `Pool.map`

-   `Pool.apply_async`

-   `Pool.map_async`



The `Pool.apply` and `Pool.map`
methods are basically equivalents to Python’s in-built
[`apply`](https://docs.python.org/2/library/functions.html#apply)
and
[`map`](https://docs.python.org/2/library/functions.html#map)
functions.
Before we come to the `async` variants of the
`Pool` methods, let us take a look at a simple
example using `Pool.apply` and
`Pool.map`. Here, we will set the number of
processes to 4, which means that the `Pool` class
will only allow 4 processes running at the same time.

In [335]:
def cube(x):
    return x**3

pool = mp.Pool(processes=4)
results = [pool.apply(cube, args=(x,)) for x in range(1,7)]
print(results)
...
[1, 8, 27, 64, 125, 216]

[1, 8, 27, 64, 125, 216]


[1, 8, 27, 64, 125, 216]

In [336]:
pool = mp.Pool(processes=4)
results = pool.map(cube, range(1,7))
print(results)

[1, 8, 27, 64, 125, 216]


Note:
The `Pool.map` and `Pool.apply`
will lock the main program until all a process is finished, which is
quite useful if we want to obtain resuls in a particular order for
certain applications.
In contrast, the `async` variants will submit all
processes at once and retrieve the results as soon as they are finished.
One more difference is that we need to use the `get`
method after the `apply_async()` call in order to
obtain the `return` values of the finished
processes.

In [337]:
pool = mp.Pool(processes=4)
results = [pool.apply_async(cube, args=(x,)) for x in range(1,7)]
output = [p.get() for p in results]
print(output)

[1, 8, 27, 64, 125, 216]


# Forking other processes

- os.system
- os.popen
- subprocess

## os.system

In [338]:
import os
help(os.system)

Help on built-in function system in module posix:

system(command)
    Execute the command in a subshell.



In [339]:
os.system('echo hello')
#prints hello, returns 0 exit status of the shell

0

## subprocess

This module allows you to spawn processes, connect to their
input/output/error pipes, and obtain their return codes.  This module
intends to replace several older modules and functions:
    
- os.system
- os.spawn*
- os.popen*
- popen2.*
- commands.*

## subprocess.Popen class

```python
Popen(args, bufsize=0, executable=None,
      stdin=None, stdout=None, stderr=None,
      preexec_fn=None, close_fds=False, shell=False,
      cwd=None, env=None, universal_newlines=False,
      startupinfo=None, creationflags=0)
```

## Popen arguments
- **args**: A sequence containing the command and its arguments
- **stdout, stderr, stdin**: Standard file descriptors may be assigned to:
    - PIPE: will be piped from another process
    - file object (existing file)
    - file descriptor (existing file) 


## Popen examples

```sh
#shell command
output=`mycmd myarg`
```

Becomes

```python
output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]
```

In [340]:
from subprocess import Popen, PIPE
output = Popen(["du", "-sh"], stdout=PIPE).communicate()[0]
print(output)

b' 34M\t.\n'


### Popen object methods

```python
communicate(self, input=None)
```

Interact with process: Send data to *stdin*.  Read data from
and *stderr*, until end-of-file is reached.  Wait for
to terminate.  The optional input argument should be a
to be sent to the child process, or None, if no data
be sent to the child.
`communicate()` returns a tuple (*stdout*, *stderr*).

### Replacing shell pipe line

```sh
#shell command
output=`uptime | cut -d, -f2`
```

Becomes

In [341]:
p1 = Popen(["uptime"], stdout=PIPE)
p2 = Popen(["cut", "-d,", "-f2"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
output

b'  5:03\n'

## subprocess functions

In [345]:
from subprocess import call, check_call, check_output

```python
call(*popenargs, **kwargs):
```
Run command with arguments.  Wait for command to complete, then
return the returncode attribute.
    
The arguments are the same as for the Popen constructor.  Example:

In [346]:
retcode = call(["ls", "-l"])
retcode

0

```python
check_call(*popenargs, **kwargs)
```
Run command with arguments.  Wait for command to complete.  If the
exit code was zero then return, otherwise raise
CalledProcessError.  The CalledProcessError object will have the
return code in the returncode attribute.

The arguments are the same as for the Popen constructor.  Example:

In [347]:
check_call(["ls", "-l"])

0

```python
check_output(*popenargs, **kwargs)
```
Run command with arguments and return its output as a byte string.

If the exit code was non-zero it raises a CalledProcessError.  The
CalledProcessError object will have the return code in the returncode
attribute and output in the output attribute.

The arguments are the same as for the Popen constructor.  Example:

In [348]:
output = check_output(["ls", "-l", "/dev/null"])
output

b'crw-rw-rw-  1 root  wheel    3,   2 Nov 24 22:00 /dev/null\n'

---

# Concurrent Futures 

The `concurrent.futures` module provides a high-level interface for
asynchronously executing callables.

The asynchronous execution can be performed with threads,
using `ThreadPoolExecutor`, or separate processes,
using `ProcessPoolExecutor`. Both implement the same interface, which is
defined by the abstract Executor class.

### Executor Objects 

```python 
class concurrent.futures.Executor
```

An abstract class that provides methods to execute calls asynchronously.
It should not be used directly, but through its concrete subclasses.

```python
submit(fn, **args, **kwargs)
```

Schedules the callable, *fn*, to be executed
as `fn(*args, **kwargs)` and returns a Future object representing the
execution of the callable.

In [349]:
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=1) as executor:

    future = executor.submit(pow, 323, 1235)
    print(future.result())

7330187419711662525292446729952277967765833457839424373869116780517420149076198183898894905236339668910883614705749593461204605714622607490668683222808958179218191780471063516227460897872227090014097232715359988867409006240081206335670819048146328123935337644631341482039026277834498173918554303355637321060412263715606736911839847081166018722332660742474936263046482602637679145832497919840537694829188335160914131310111239449199642739655793719812086149415859534959085359215402107080568853413877372159233452025442228651418507639010743174496936173262981681095359156359401217962539764203947129055258900852300663811552683018727645219707243611502505332407412509113706415784954450373499498470564461122438759199401776852200401606640428677779377601753155808428527723343610817938497649317098807976560094452142608380040084257453294862872175833371967399756248792099979580669182692895291737594003884626270278670234491588018888596571060169103937561057223083861024046092685665503687387204434227037124219374130642

```python
map(func, *iterables, timeout=None, chunksize=1)
```

Equivalent to `map(func, *iterables)` except *func* is executed
asynchronously and several calls to *func* may be made concurrently.

Note:

The returned iterator raises
a concurrent.futures.TimeoutError if `next()` is called and the
result isn’t available after *timeout* seconds from the original call
to `Executor.map().timeout` can be an int or a float. If *timeout*is
not specified or None, there is no limit to the wait time. If a call
raises an exception, then that exception will be raised when its value
is retrieved from the iterator. When using `ProcessPoolExecutor`, this
method chops *iterables* into a number of chunks which it submits to the
pool as separate tasks. The (approximate) size of these chunks can be
specified by setting *chunksize* to a positive integer. For very long
iterables, using a large value for *chunksize* can significantly improve
performance compared to the default size of 1.
With ThreadPoolExecutor, *chunksize* has no effect.

Changed in version 3.5: Added the *chunksize* argument.

`shutdown(wait=True)`

Signal the executor that it should free any resources that it is using
when the currently pending futures are done executing. Calls
to Executor.submit() and Executor.map() made after shutdown will
raise RuntimeError.

If *wait* is True then this method will not return until all the pending
futures are done executing and the resources associated with the
executor have been freed. If *wait* is False then this method will
return immediately and the resources associated with the executor will
be freed when all pending futures are done executing. Regardless of the
value of *wait*, the entire Python program will not exit until all
pending futures are done executing.

You can avoid having to call this method explicitly if you use
the with statement, which will shutdown the Executor (waiting as
if `Executor.shutdown()` were called with *wait* set to True):

In [350]:
import shutil

with ThreadPoolExecutor(max_workers=4) as e:
    e.submit(shutil.copy, 'src1.txt', 'dest1.txt')
    e.submit(shutil.copy, 'src2.txt', 'dest2.txt')
    e.submit(shutil.copy, 'src3.txt', 'dest3.txt')
    e.submit(shutil.copy, 'src4.txt', 'dest4.txt')

### ThreadPoolExecutor 

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

Deadlocks can occur when the callable associated with a Future waits on the results of another Future. For example:

In [351]:
import time

def wait_on_b():
    time.sleep(5)
    print(b.result())  
    # b will never complete because it is waiting on a.
    return 5

def wait_on_a():
    time.sleep(5)
    print(a.result())  
    # a will never complete because it is waiting on b.
    return 6
executor = ThreadPoolExecutor(max_workers=2)
a = executor.submit(wait_on_b)
b = executor.submit(wait_on_a)

And:

In [352]:
def wait_on_future():
    f = executor.submit(pow, 5, 2)
    # This will never complete because there is only one worker thread and
    # it is executing this function.
    print(f.result())

executor = ThreadPoolExecutor(max_workers=1)
executor.submit(wait_on_future)

<Future at 0x11cef1240 state=running>

`class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')`

An Executor subclass that uses a pool of at most *max\_workers* threads
to execute calls asynchronously.

### ThreadPoolExecutor Example 

In [353]:
from concurrent.futures import *
import urllib.request

In [354]:
URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

In [355]:
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with ThreadPoolExecutor(max_workers=5) as executor:
     # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in as_completed(future_to_url):
    url = future_to_url[future]
    try:
        data = future.result()
    except Exception as exc:
        print('%r generated an exception: %s' % (url, exc))
    else:
        print('%r page is %d bytes' % (url, len(data)))

'http://www.bbc.co.uk/' page is 327926 bytes
'http://some-made-up-domain.com/' generated an exception: <urlopen error [Errno 8] nodename nor servname provided, or not known>
'http://www.cnn.com/' page is 1135305 bytes
'http://europe.wsj.com/' page is 2804527 bytes
'http://www.foxnews.com/' page is 231472 bytes


### ProcessPoolExecutor 

The ProcessPoolExecutor class is an Executor subclass that uses a pool
of processes to execute calls asynchronously.ProcessPoolExecutor uses
the multiprocessing module, which allows it to side-step the Global
Interpreter Lock but also means that only picklable objects can be
executed and returned.

The `__main__` module must be importable by worker subprocesses. This
means that ProcessPoolExecutor will not work in the interactive
interpreter.

Calling Executor or Future methods from a callable submitted to
a ProcessPoolExecutor will result in deadlock.

`class concurrent.futures.ProcessPoolExecutor(max_workers=None)`

An Executor subclass that executes calls asynchronously using a pool of at most **`max_workers`** processes. If **max_workers** is None or not given, it will default to the number of processors on the machine.
If **`max_workers`** is lower or equal to 0, then a ValueError will be raised.

### ProcessPoolExecutor Example

In [356]:
import concurrent.futures
import math
PRIMES = [
        112272535095293,
        112582705942171,
        112272535095293,
        115280095190773,
        115797848077099,
        1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False
    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime,
             PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

112272535095293 is prime: True
112582705942171 is prime: True
112272535095293 is prime: True
115280095190773 is prime: True
115797848077099 is prime: True
1099726899285419 is prime: False


### Future Objects 

The Future class encapsulates the asynchronous execution of a
callable. Future instances are created by Executor.submit().

- `class concurrent.futures.Future`
Encapsulates the asynchronous execution of a callable. Future instances
are created by `Executor.submit()` and should not be created directly
except for testing.

- `cancel()`
Attempt to cancel the call. If the call is currently being executed and
cannot be cancelled then the method will return False, otherwise the
call will be cancelled and the method will return True.

- `cancelled()`
Return True if the call was successfully cancelled.

- `running()`
Return True if the call is currently being executed and cannot be
cancelled.

- `done()`
Return True if the call was successfully cancelled or finished running.

- `result(timeout=None)`
Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to *timeout* seconds. 
    - If the call hasn’t completed in *timeout* seconds, then a `concurrent.futures.TimeoutError` will be raised. *timeout* can be an int or float. 
    - If *timeout* is not specified or None, there is no limitto the wait time. 
    - If the future is cancelled before completing then `CancelledError` will be raised. 
    - If the call raised, this method will raise the same exception

- `exception(timeout=None)`
Return the exception raised by the call. If the call hasn’t yet
completed then this method will wait up to *timeout* seconds. 
    - If the call hasn’t completed in *timeout* seconds, then
a `concurrent.futures.TimeoutError` will be raised. *timeout* can be an int or float. 
    - If *timeout* is not specified or None, there is no limit to the wait time. If the future is cancelled before completing then `CancelledError` will beraised. 
    - If the call completed without raising, None is returned.

- `add_done_callback(fn)`
Attaches the callable *fn* to the future. *fn* will be called, with the future as its only argument, when the future is cancelled or finishes running.

    - Added callables are called in the order that they were added and are always called in a thread belonging to the process that added them. Ifthe callable raises an Exception subclass, it will be logged and ignored. If the callable raises a BaseException subclass, the behavior is undefined.

    - If the future has already completed or been cancelled, *fn* will be called immediately.

The following Future methods are meant for use in unit tests
and Executor implementations.

- `set_running_or_notify_cancel()`
This method should only be called by Executor implementations before
executing the work associated with the Futureand by unit tests.

    - If the method returns False then the Future was cancelled,
i.e. Future.cancel() was called and returned *True*. Any threads waiting
on the Future completing (i.e. through `as_completed()` or `wait()`) will
be woken up.

    - If the method returns True then the Future was not cancelled and has
been put in the running state, i.e. calls `toFuture.running()` will
return *True*.

    - This method can only be called once and cannot be called
after `Future.set_result()` or `Future.set_exception()` have been called.

- `set_result(result)`
Sets the result of the work associated with the Future to *result*.
This method should only be used by Executor implementations and unit
tests.

- `set_exception(exception)`
Sets the result of the work associated with the Future to
the Exception *exception*.

### Module Functions 

`concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED)`

Wait for the Future instances (possibly created by
different Executor instances) given by *fs* to complete. Returns a named
2-tuple of sets. The first set, named done, contains the futures that
completed (finished or were cancelled) before the wait completed. The
second set, named not\_done, contains uncompleted futures.

*timeout* can be used to control the maximum number of seconds to wait
before returning. *timeout* can be an int or float. If *timeout* is not
specified or None, there is no limit to the wait time.

### Constants
|Constant|Description|
| ------ |-----------|
|`FIRST_COMPLETED`|The function will return when any future finishes or is cancelled.|
|`FIRST_EXCEPTION`|The function will return when any future finishes by raising an exception. If no future raises an exception then it is equivalent to `ALL_COMPLETED`.|
|`ALL_COMPLETED`|The function will return when all futures finish or are cancelled.|

`concurrent.futures.as_completed(fs, timeout=None)`

different Executor instances given by *fs* that yields futures as
they complete (finished or were cancelled). Any futures given
by *fs* that are duplicated will be returned once. Any futures that
completed before `as_completed()` is called will be yielded first. The  returned iterator raises a `concurrent.futures.TimeoutError` if `__next__()` is called and the
result isn’t available after *timeout* seconds from the original call
 to `as_completed()`.
 
*timeout* can be an int or float. If*timeout* is not specified or None, there is no limit to the wait time.

# Async Programming

- User Mode Threading (event loop)
- Single OS thread
- I/O concurrency
- Non-blocking Sockets
- Callbacks, Futures, Coroutines
- epoll

Note:

As has been noted threading can be useful for io bound processes, however race conditions can occur when the os swaps between threads.  The os is not aware of any of the inner workings of your code and thus cannot intelligently choose a "good time" to swap to a waiting thread. Asynchronous programming solves this problem by running all the code in a single thread, but in an event loop that effectively creates a user mode thread within your program.

## Coroutines

Asynchronour programming can make use of *call-backs* and *futures* however coroutines are much easier to understand and reason about.

- In Python 2.5, a slight modification to the `yield` statement was introduced (PEP-342)

In [357]:
def grep(pattern):
    print("Looking for %s" % pattern)
    while True:
        line = (yield)
        if pattern in line:
            print(line)

- Where does yield get its value
- Resulting object is now o coroutine.

### Coroutine as consumer

In [358]:
o = grep('hello')
next(o)

Looking for hello


In [359]:
o.send('hi')

In [360]:
o.send('bye')

In [361]:
o.send('hello')

hello


In [362]:
o.send('Well hello there!')

Well hello there!


### Coroutine priming
- You can only `send()` after the first `next()`
- The coroutine needs to be advanced to the first `yield`

### Coroutine decorator
- Performs the first `next()` call automatically.
- Identifies as a coroutine

In [363]:
def coroutine(func):
    def start(*args,**kwargs):
        cr = func(*args,**kwargs)
        next(cr)
        return cr
    return start

In [364]:
@coroutine
def grep(pattern):
    print("Looking for %s" % pattern)
    while True:
        line = (yield)
        if pattern in line:
            print(line)

In [365]:
o = grep('hello')
o.send('hello again')

Looking for hello
hello again


### Closing a Coroutine
- A coroutine might run indefinitely
- Calling `o.close()` will shut it down.

### Catching close

In [366]:
@coroutine
def grep(pattern):
    print("Looking for %s" % pattern)
    try:
        while True:
            line = (yield)
            if pattern in line:
                print(line)
    except GeneratorExit:
        print("End of grep!")

In [367]:
o = grep('hello')
o.send('hello')
o.send('another hello')
o.close()

Looking for hello
hello
another hello
End of grep!


## Asyncio

Python 3.5 introduces the `async` and `await` keywords (functions using these keywords are also called “native coroutines”, which replace the ). 

- `async` annotates a function as a coroutine.  That is: `async def my_coroutine()`.
- `await` replaces `yield from`, but is less flexible than yield.


- asycio introduced in py3.4
- trollius is a py2.7 backport (no longer maintained!)

### trollius example

```python
import trollius as asyncio
from trollius import From
 
@asyncio.coroutine
def my_coroutine(seconds_to_sleep=3):
    print('my_coroutine sleeping for: %d seconds' % seconds_to_sleep)
    yield From(asyncio.sleep(seconds_to_sleep))
    #py3: yield from asyncio.sleep(seconds_to_sleep) 

loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(my_coroutine())
)
loop.close()
```

### asyncio generator example

```python
import asyncio
 
@asyncio.coroutine
def my_coroutine(seconds_to_sleep=3):
    print('my_coroutine sleeping for: %d seconds' % seconds_to_sleep)
    yield from asyncio.sleep(seconds_to_sleep) 


loop = asyncio.get_event_loop()
loop.run_until_complete(
   asyncio.gather(my_coroutine())
)
loop.close()
```

## Future and Callbacks

```python
#can't run in Notebook
import trollius as asyncio
from trollius import From

@asyncio.coroutine
def my_coroutine(future, task_name, seconds_to_sleep=3):
    print('my_coroutine sleeping for: {0} seconds'.format(seconds_to_sleep))
    yield From(asyncio.sleep(seconds_to_sleep))
    1/0
    future.set_result('{0} is finished'.format(task_name))

def got_result(future):
    print future.result()
```

```python
loop = asyncio.get_event_loop()

future1 = asyncio.Future()
future2 = asyncio.Future()

tasks = [
            my_coroutine(future1, 'task1', 3),
            my_coroutine(future2, 'task2', 1)]

future1.add_done_callback(got_result)
future2.add_done_callback(got_result)


loop.run_until_complete(
    asyncio.wait(tasks)
)
loop.close()
```


# Tornado 

Tornado can be roughly divided into four major components:

A web framework (including RequestHandler which is subclassed to create web applications, and various supporting classes).

Client- and server-side implementions of HTTP (HTTPServer and AsyncHTTPClient).

An asynchronous networking library including the classes IOLoop and IOStream, which serve as the building blocks for the HTTP components and can also be used to implement other protocols.

A coroutine library (tornado.gen) which allows asynchronous code to be written in a more straightforward way than chaining callbacks. This is similar to the native coroutine feature introduced in Python 3.5 (async def). Native coroutines are recommended in place of the tornado.gen module when available.

The Tornado web framework and HTTP server together offer a full-stack alternative to WSGI. While it is possible to use the Tornado HTTP server as a container for other WSGI frameworks (WSGIContainer), this combination has limitations and to take full advantage of Tornado you will need to use Tornado’s web framework and HTTP server together.

## Tornado Web Framework

### REST
Separate actions for different HTTP Methods:

- GET
- PUT
- POST
- PATCH
- DELETE

![Sloth](images/sloth.jpeg)



REST stands for **REpresentational State Transfer**. REST is resource based 
which means that it is based on things instead of actions. An example of a 
resource could be a contact, article or product. Each resource is identified by 
a URI. Multiple URI's may refer to the same resource.

REST makes use of actions in combination with a resource. The following actions 
are supported: GET, POST, PUT, PATCH, DELETE.

A resful API for a contacts database may look something like 
The following:

Method|URI|Description
---|---|---
GET|contacts|List all contacts
GET|contacts/create|Form - create contact
POST|contacts|Add new contact
GET|contacts/2|Show contact 2
GET|contacts/2/edit|Form - edit contact 2
PUT|contacts/2|Update contact 2
DELETE|contacts/2|Delete contact 2

### Hello World

```python
import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

def make_app():
    return tornado.web.Application([
        (r"/", MainHandler),
    ])

if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()
 ```

Note:

Structure of the app includes a list of routes with handlers, 
in the above case we only have one root: `(r"/", MainHandler)`.
Once the main loop is started it will poll for requests on the roots.
When a connections occurs: a method matching the HTTP protocol of the connection,
is called from the class specified as the handler.

## Asynchronous routines.

`tornado.gen` is a generator-based interface to make it easier to work in an asynchronous environment. Code using the gen module is technically asynchronous, but it is written as a single generator instead of a collection of separate functions.  Since python 3.5 and tornado 4.3 it is possible to use native coroutines with tornado.  The tornado event loop has become a wrapper for the asyncio event loop.

### example, the following asynchronous handler:

```python
class AsyncHandler(RequestHandler):
    @asynchronous
    def get(self):
        http_client = AsyncHTTPClient()
        http_client.fetch("http://example.com",
                          callback=self.on_fetch)

    def on_fetch(self, response):
        do_something_with_response(response)
        self.render("template.html")
```

**could be written with gen as:**

```python
class GenAsyncHandler(RequestHandler):
    @gen.coroutine
    def get(self):
        http_client = AsyncHTTPClient()
        response = yield http_client.fetch("http://example.com")
        do_something_with_response(response)
        self.render("template.html")
```

```python
class GenAsyncHandler(RequestHandler):
    async def get(self):
        http_client = AsyncHTTPClient()
        response = await http_client.fetch("http://example.com")
        do_something_with_response(response)
        self.render("template.html")
```

Most asynchronous functions in Tornado return a Future; yielding this object returns its result.

You can also yield a list or dict of Futures, which will be started at the same time and run in parallel; a list or dict of results will be returned when they are all finished:

```python
async def get(self):
    http_client = AsyncHTTPClient()
    response1, response2 = await [http_client.fetch(url1),
                                  http_client.fetch(url2)]
    response_dict = await dict(response3=http_client.fetch(url3),
                               response4=http_client.fetch(url4))
    response3 = response_dict['response3']
    response4 = response_dict['response4']
```

---

# Interacting with C
Modules
- instant compile and execute c code from python
- ctypes  access shared library 
- numba   compile python code as c.

## The instant module
- Instant allows inlining of C and C++ functions in Python codes

### The hello world example:

In [None]:
from instant import inline
source = """
double hw1(double r1, double r2)
{
return sin(r1 + r2);
}
"""
hw1 = inline(source)

x = 1.0
y = 2.5
print("sin({0}+{1}) = {2}".format(x,y,hw1(x,y)))

Note:**How it works**
C/C++ code is automatically wrapped and compiled at run-time
Resulting object files are stored, only recompiled when source code is changed
Simple to use, but only works for smaller codes


## The ctypes module
- ctypes gives access to C datatypes from Python

- C libraries can be imported and called from python
Work only in Python - no need for writing wrapper code in C
Less elegant interface than a regular Python extension module

### ctypes usage

- Primitive C data types are interfaced:
- `c_int`, `c_bool`, `c_double`, `c_char` ...

- Libraries may be loaded by instantiating ctypes.CDLL:

```python
clib = ctypes.CDLL('./clib.so')
```

### ctypes usage

- Arguments and return type for library functions must be converted to the correct C type:

```python
c_arg1 = c_double(arg1)
c_arg2 = c_double(arg2)
clib.function1.restype = c_double
```

After the conversion, calling library functions is intuitive:
```python
result = clib.function1(c_arg1,c_arg2)
```

### Hello world with ctypes (1)

- Unmodified C code, compiled to a shared library;

```C
double hw1(double r1, double r2)
{
    double s;
    s = sin(r1 + r2);
    return s;
}

void hw2(double r1, double r2)
{
    double s;
    s = sin(r1 + r2);
    printf("Hello, World! sin(%g+%g)=%g\n", r1, r2, s);
}

/* special version of hw1 where the result is an argument: */
void hw3(double r1, double r2, double *s)
{
    *s = sin(r1 + r2);
}
```

### Hello world with ctypes (2)

- C library is loaded into Python and accessed directly;

```python
#!/usr/bin/env python
from ctypes import *
hw_lib = CDLL('./hw.so')  # load shared library

hw_lib.hw1.restype = c_double  # specify return type
s = hw_lib.hw1(c_double(1), c_double(2.14159))
print(s, type(s))

# automatic conversion of arguments from Python to ctypes:
hw_lib.hw1.argtypes = [c_double, c_double]
s = hw_lib.hw1(1, 2.14159)
print(s, type(s))

hw_lib.hw2.argtypes = [c_double, c_double]
hw_lib.hw1.restype = None  # returns void
hw_lib.hw2(1, 2.14159)

s = c_double()
hw_lib.hw3(c_double(1), c_double(2.14159), byref(s))
print(s.value)
```

### Ctypes bottom line

- Works well for interfacing a few C functions
- Less convenient for a large number of function calls
- Can be wrapped in a Python module, but with a performance loss