# Growth Intel - Python Programming

-----

**Alex Mitchell @data_alex**

**Prash Majmudar @prashmaj**

## Python
### Obligatory history

* Python is over 20 years old
* Creator Guido van Rossum has written up the history of Python / language design:
http://python-history.blogspot.co.uk/2009/01/introduction-and-overview.html
* Python is a multi-purpose language, it is dynamically typed, strongely typed and interpreted

### This talk

* Focus is on Python 2.7 (Python 3 features might be mentioned)
* When we talk about about Python, we're talking about CPython (Python written in C), not Jython, IronPython, PyPy etc.

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Why choose Python?

### Readable / Easy to get started 

* Readablity core to language design
 * Implies maintainable, re-usable code
 
### Productivity
* Intepreted --> Faster to iterate (no compile + link phase) e.g. with REPL
* Large standard library
* Core language has higher order functions (hides implementation detail) - generally means less code required to get stuff done (less to maintain / debug etc.)
* Large community support

### It's fun

```import antigravity```

## Why Not Python?

* Dynamic typing, lack of control over memory management: discipline is required
* If you need to write fast CPU intensive code (though extensions such as NumPy exist) - you may want to use C  / C++
* ...

## Use cases

- Web Development (Django, Flask, Tornado, ...)
- Machine learning (scikit-learn, PyBrain, ...)
- Data analysis (Pandas, IPython NB)
- Data pipelines (Luigi, PySpark, mrjob, disco)
- Natural Language Processing (NLTK, gensim)
- Machine Vision
- Scripting / sysadmin (large standard lib for sysadmin tasks)
- Networking (Twisted)
- GUIs (Tkinter)
- Games (pygame)
- Animation

Python bindings exist for most external services you might use (SQL / NoSQL databases, AWS)



##  Getting started

### Pick a development environment

Some choices:

* PyCharm - IDE

* IPython + Editor (Vim, Emacs, Sublime).
  NB: Vim / Emacs - harder to learn (needs setup, plus key bindings). Productive when setup (but takes years to perfect setup!). Available on the server and locally - same toolsets across environments.


### Source control

* Git hosted on Github is a good choice


### Package managers and dependency isolation

* Pip + virtualenv
 * `pip freeze`
 * `pip install -r requirements.txt`
* Anaconda

### Other tools

* Pylint

In [6]:
# Lets just do something
# Python borrows heavily  from C i.e. if else, while, for etc.
print "Spam"
for x in range(10):
    print x
    


Spam
0
1
2
3
4
5
6
7
8
9


### What's happening here?

* Python is compiling the source files into bytecode (note this is not machine code)
* The bytecode files are cached as .pyc files - compilation only occurs if timestamps differ
* The bytecode is run through the Python Interpreter (PVM)
* ASIDE: The Python intepreter uses a Global Interpreter Lock (i.e. each operation is locked). This has consequences for multi-threading in Python (more later)

<img src="python_compile.png">

## Core types  / structures

* Ints
* Strings
* Lists
* Dicts
* Tuples
* Sets
* Bools
* None

A few other points to note:

* Variables in Python are basically aliases / names for objects
* Nearly everything is an object. Including functions - these are First Class citizens. This means we can pass assign functions to variables, pass them into functions / methods, operate on them

## Strings - Python 2

### Unicode and str

* Strings have to representations in Python:
 * Byte strings - `str`
 * Unicode - `Unicode`
 
* Byte strings 

### Raw strings

In [36]:
# Raw strings treat escape characters differently - backslashes are not escaped
# Use for Regular expressions
import re
regex_str = r'\bHello\b'
hello_regex = re.compile(regex_str)
matches = hello_regex.search('Good day and Hello    to you all')
print matches.group()

# Otherwise I need to do this (less readable for complex regular expressions)
regex_str = r'\\bHello\\b'


Hello


In [33]:
import math
def concat_address(addresses):
    """Concat addresses."""
    
    return ", ".join(addr)
    
    
print test_func([["Level 42", "1 Canada Sq", "Canary Wharf"]])

['Level 42, 1 Canada Sq, Canary Wharf']


Python variables explained

Insert links for explaining this stuff

### Lists

In [7]:

names =  ['alex', 'bob', 'fred', 'alice']
print "Original list: ", names

last = names.pop()
print "Last list element: ", last

# Lists are mutable
names.append('james')
print "List is now: ", names

#names.count()




Original list:  ['alex', 'bob', 'fred', 'alice']
Last list element:  alice
List is now:  ['alex', 'bob', 'fred', 'james']


In [8]:
# Slicing
print "start of array up to (not including) index of 2", names[:2]
print "Last element: ", names[-1]
print "First element: ", names[0]
print "Start at index 2, up to (not inc) index 4 ", names[2:4]

start of array up to (not including) index of 2 ['alex', 'bob']
Last element:  james
First element:  alex
Start at index 2, up to (not inc) index 4  ['fred', 'james']


### Dictionaries

In [9]:
# Key-value collection
url_visits = {"test.com":1256,  "growthintel.com":5000, "bbc.co.uk":5e6}
print url_visits
print "Gi has: ", url_visits["growthintel.com"]


{'bbc.co.uk': 5000000.0, 'test.com': 1256, 'growthintel.com': 5000}
Gi has:  5000


In [12]:
# Key does not exist
try:
    url_visits["bob.com"]
except KeyError as e:
    print "Got a Key Error:", e

Got a Key Error: 'bob.com'


In [20]:
print "Value for bob.com is: ", url_visits.get("bob.com", None)

# OR check key exists first

if "bob.com" in url_visits:
    print "Key exists!"
else:
    print "Key does not exist!"

Value for bob.com is:  None
Key does not exist!


### Defaultdicts

* Extends dictionaries to have a default factory for new keys that are added

In [21]:
from collections import defaultdict
dict_of_lists = defaultdict(list)
dict_of_lists['a'].append('bob')
print dict_of_lists

defaultdict(<type 'list'>, {'a': ['bob']})


* dictionary comprehensions

In [None]:
Problem:
Create dict to store the following info
    
* Brian Cohen
* 
* 

### Sets

### Tuples

In [None]:
# Use the comma operator
a = 1,

# Single items MUST use the comma operator
b = (1,) # Don't use (1)

c = 1,2,3,4,"hello"

print a
print b
print c

* Tuples are **immutable**.
 * Used as keys in dictionaries
 * Data containers to be passed to functions / methods - data cannot be modified (also see: `namedtuples`)


In [None]:
# Generators

In [None]:
## Functional features

Map, Reduce, Filter -> 
Lambda functions

Preferable to use List  / Dict comprehensions






In [None]:
Decorators

## Organising your code

Python code is organised into files (modules)
Collections of modules are packages

Python uses the file

### Packages



### Modules

Modules are imported once. Code is executed on imported and namespace for that module created.
Note:  Modules are sensible way to have an object created once only (Singleton)

### Classes

In [None]:
FEEDBACK:
    
    Interested in other libraries e.g. Pysparrk
    Wants info on structuring an app

In [2]:
import functools
dir(functools)

['WRAPPER_ASSIGNMENTS',
 'WRAPPER_UPDATES',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 'cmp_to_key',
 'partial',
 'reduce',
 'total_ordering',
 'update_wrapper',
 'wraps']

In [5]:
import collections
dir(collections)

['Callable',
 'Container',
 'Counter',
 'Hashable',
 'ItemsView',
 'Iterable',
 'Iterator',
 'KeysView',
 'Mapping',
 'MappingView',
 'MutableMapping',
 'MutableSequence',
 'MutableSet',
 'OrderedDict',
 'Sequence',
 'Set',
 'Sized',
 'ValuesView',
 '__all__',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '_abcoll',
 '_chain',
 '_class_template',
 '_eq',
 '_field_template',
 '_get_ident',
 '_heapq',
 '_imap',
 '_iskeyword',
 '_itemgetter',
 '_repeat',
 '_repr_template',
 '_starmap',
 '_sys',
 'defaultdict',
 'deque',
 'namedtuple']

In [7]:
import itertools
dir(itertools)

['__doc__',
 '__file__',
 '__name__',
 '__package__',
 'chain',
 'combinations',
 'combinations_with_replacement',
 'compress',
 'count',
 'cycle',
 'dropwhile',
 'groupby',
 'ifilter',
 'ifilterfalse',
 'imap',
 'islice',
 'izip',
 'izip_longest',
 'permutations',
 'product',
 'repeat',
 'starmap',
 'takewhile',
 'tee']

## Sequences

* Sequence slicing
* Iterating on sequences, iterating on sequences in parallel (zip), generators, itertools


### Iterating over multiple sequences in parallel

In [37]:
#  ZIP

a = [1,2,3,4]
b  = ['a', 'b', 'c', 'd']

# Use zip to create a new list of tuples to pass to the dict constructor
dict(zip(a,b))

{1: 'a', 2: 'b', 3: 'c', 4: 'd'}

In [19]:
# For long lists consider using izip to return an iterable rather than a new list
import itertools

print zip(a,b)

print itertools.izip(a,b)

zip_gen = itertools.izip(a,b)
zip_gen.next()

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
<itertools.izip object at 0x7f152062c5f0>


(1, 'a')