Data Science In Python
=======================
Python Language
---------------
**Cambridge, MA, January 9th 2017**
<br>

<center>
<img src=http://ijstokes-public.s3.amazonaws.com/dspyr/img/AnacondaCIO_Logo width=400 />
</center>

**Copyrighted Continuum Analytics**


<br>
GitHub: https://github.com/Harvard-IACS/computefest2017-pythonml

Anaconda Cloud: https://anaconda.org/ijstokes/python-cf17-02-language/notebook
<br>

Taught by:

* Ian Stokes-Rees [ijstokes@continuum.io](mailto:ijstokes@continuum.io)
    * Twitter: [@ijstokes](http://twitter.com/ijstokes)
    * About.Me: [http://about.me/ijstokes](http://about.me/ijstokes)
    * LinkedIn: [http://linkedin.com/in/ijstokes](http://linkedin.com/in/ijstokes)

Setup
===
* Download Anaconda 4.2 for Python 2.7
* Start *Anaconda Navigator* from your desktop
    * green ouroboros ring -- yes, that's what it is
* Make sure it says *Python 2.7* in the top row
    * if not, click on *Environment* then *New Environment* and create one called `ana42py37` using Python 2.7
    * then search for "Anaconda" and install that meta-package
    * **but** note that this will kill the event APs: 300+ MB to do this.
* Launch Notebook (may require "upgrade" or "install" first)
* Alternative for command line geeks:

```bash
conda create -n ana42py27 anaconda python=2.7
source activate ana42py27
mkdir pythonds
cd pythonds
jupyter notebook
```

In [None]:
# helper function to clear namespace:
def _clear():
    for thing in globals().keys():
        if not thing.startswith("_"):
            del globals()[thing]

In [None]:
# helper function to list non-underscore, non-Jupyter attributes
def mydir():
    return [a for a in sorted(globals())
            if not (a.startswith('_') or a in 'In Out exit quit get_ipython mydir'.split())]

Reference
======
* [Python Standard Library modules](http://www.doughellmann.com/PyMOTW/py-modindex.html)
* [`builtin` functions and classes](http://docs.python.org/library/functions.html)
* [Idiomatic Python](http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html)
* Python Style Guides:
    * [PEP 8](http://legacy.python.org/dev/peps/pep-0008/)
    * [Google Python Style Guide](http://google-styleguide.googlecode.com/svn/trunk/pyguide.html)
* Magic methods:
    * [official specs](http://docs.python.org/reference/datamodel.html#special-method-names)
    * [great summary](http://www.rafekettler.com/magicmethods.html)
* [Scipy Lectures](http://www.scipy-lectures.org/)
* [Numpy Tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
    * [Numpy for Matlab Users](https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html)
* [Matplotlib Plotting Tutorial](http://matplotlib.org/users/pyplot_tutorial.html)
    * [gallery](http://matplotlib.org/gallery.html)

Numbers, Strings, and Math
==============

In [1]:
# basic math
3 + 7

10

In [3]:
# scientific notation
2.4 * 6.78E3

16272.0

In [4]:
7E6

7000000.0

In [5]:
4E-3

0.004

In [6]:
# complex numbers
(3+2j) * (3-2j)

(13+0j)

In [7]:
val = (3+2j) * (3-2j)

In [8]:
val

(13+0j)

In [9]:
type(val)

complex

In [10]:
# exponentiation
2**10

1024

In [11]:
2**20

1048576

In [12]:
# strings
"This is a short string"

'This is a short string'

In [13]:
'This string uses single quotes'

'This string uses single quotes'

In [15]:
# long strings
'''
This is a multi-line string
which can be delimited by 
either triple single quotes (')
or triple double quotes (")
'''

'\nThis is a multi-line string\nwhich can be delimited by \neither triple single quotes (\')\nor triple double quotes (")\n'

What do you think will happen if we try adding a string and an integer?
 
<center><img src="http://templeofmut.files.wordpress.com/2012/03/transmogrifier_2.gif">
</center>

In [17]:
# experiment!
4 + '6'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [18]:
# but what about adding strings?
'Ian' + 'Stokes-Rees'

'IanStokes-Rees'

In [19]:
'Ian'.__add__('Stokes-Rees')

'IanStokes-Rees'

In [20]:
# or multiplying strings?
'Ian' * 3

'IanIanIan'

References (aka *variables*)
===============

In [21]:
# expressions create objects which last as long as there is a reference to them
a = 42

In [22]:
a

42

In [23]:
a / 2

21

In [24]:
a + 10

52

In [25]:
b = a

In [26]:
b

42

In [27]:
type(a)

int

In [28]:
type(b)

int

In [29]:
id(a)

4300221480

In [31]:
id(b)

4300221480

In [32]:
c = 555

In [33]:
d = c

In [34]:
e = 555

In [36]:
id(c)

4349324168

In [37]:
id(d)

4349324168

In [38]:
id(e)

4349324096

In [39]:
c == d

True

In [41]:
c == e

True

In [43]:
c is d

True

In [44]:
c is e

False

In [45]:
x = 128
y = x
z = 128

In [47]:
x == y

True

In [48]:
x == z

True

In [49]:
x is y

True

In [50]:
x is z

True

In [51]:
id(x)

4300223368

In [52]:
id(y)

4300223368

In [53]:
id(z)

4300223368

In [55]:
x

128

In [56]:
y

128

In [57]:
y = y + 10

In [58]:
y

138

In [60]:
id(y)

4300223128

In [61]:
x

128

* these are typically called "variables", but a better name in Python is "reference"
* a "reference" refers to an object
* objects are Autonomous (no scope, heap allocated) and Anonymous (no name)
* only references have a scope
* references can be re-assigned to a new object at any time
* allowed to have multiple references to the same object

In [None]:
# all objects have an ID, but don't read too much into this: it is a unique number assigned to each object

In [62]:
# can delete references, but this doesn't delete the object
a

42

In [63]:
b

42

In [64]:
del a

In [65]:
a

NameError: name 'a' is not defined

In [66]:
b

42

In [None]:
# so how can we delete an object? (HINT: trick question)

Lists
===
* lists preserve order
* have no constraint on duplicate entries
* can be changed, reordered, added to, or removed from
* IMPORTANT: really just contain a collection of "unnamed" references, not objects

In [67]:
nums = [3, 7 , 6, 3, 0, 3, 4, 6, 5]

In [68]:
nums

[3, 7, 6, 3, 0, 3, 4, 6, 5]

In [69]:
# function calls! type names!
type(nums)

list

In [70]:
len(nums)

9

In [71]:
nums.__len__()

9

In [72]:
# referencing FROM ZERO
nums[0]

3

In [73]:
nums.__getitem__(0)

3

In [74]:
nums[-1]

5

In [76]:
colors = 'red green blue yellow white black pink'.split()

In [77]:
colors

['red', 'green', 'blue', 'yellow', 'white', 'black', 'pink']

In [78]:
colors[0]

'red'

In [79]:
# negative indexing ("back from end")

In [80]:
colors[-1]

'pink'

In [81]:
len(colors)

7

In [83]:
colors[len(colors)]

IndexError: list index out of range

In [84]:
colors[7]

IndexError: list index out of range

In [85]:
colors[len(colors) - 1]

'pink'

In [86]:
colors[-1]

'pink'

In [87]:
# selecting a subset of objects from a list (NOTE: last index is not included)

In [88]:
colors

['red', 'green', 'blue', 'yellow', 'white', 'black', 'pink']

In [90]:
colors[2:4]

['blue', 'yellow']

In [92]:
nums[2:6]

[6, 3, 0, 3]

In [93]:
nums

[3, 7, 6, 3, 0, 3, 4, 6, 5]

In [94]:
colors[:3]

['red', 'green', 'blue']

In [95]:
colors[:-1]

['red', 'green', 'blue', 'yellow', 'white', 'black']

In [97]:
colors[-4:-1]

['yellow', 'white', 'black']

In [98]:
colors[3:]

['yellow', 'white', 'black', 'pink']

In [99]:
colors[3:7]

['yellow', 'white', 'black', 'pink']

In [101]:
colors[-4:]

['yellow', 'white', 'black', 'pink']

In [103]:
id(colors)

4406498584

In [104]:
dupe = colors

In [105]:
id(dupe)

4406498584

In [106]:
dupe

['red', 'green', 'blue', 'yellow', 'white', 'black', 'pink']

In [107]:
copy = colors[:] # slice from beginning to end

In [108]:
copy

['red', 'green', 'blue', 'yellow', 'white', 'black', 'pink']

In [109]:
id(colors)

4406498584

In [110]:
id(copy)

4406540768

In [111]:
colors[3]

'yellow'

In [112]:
colors[3] = 'mauve'

In [113]:
colors

['red', 'green', 'blue', 'mauve', 'white', 'black', 'pink']

In [114]:
dupe

['red', 'green', 'blue', 'mauve', 'white', 'black', 'pink']

In [115]:
copy

['red', 'green', 'blue', 'yellow', 'white', 'black', 'pink']

In [116]:
colors[1]

'green'

In [117]:
id(colors[1])

4406779408

In [118]:
id(dupe[1])

4406779408

In [119]:
id(copy[1])

4406779408

In [120]:
copy.pop()

'pink'

In [121]:
copy.pop()

'black'

In [122]:
copy

['red', 'green', 'blue', 'yellow', 'white']

In [123]:
copy.append('purple')

In [124]:
copy

['red', 'green', 'blue', 'yellow', 'white', 'purple']

In [125]:
colors

['red', 'green', 'blue', 'mauve', 'white', 'black', 'pink']

In [126]:
dupe.sort()

In [127]:
dupe

['black', 'blue', 'green', 'mauve', 'pink', 'red', 'white']

In [128]:
colors

['black', 'blue', 'green', 'mauve', 'pink', 'red', 'white']

In [129]:
copy

['red', 'green', 'blue', 'yellow', 'white', 'purple']

Looping
====

In [130]:
nums

[3, 7, 6, 3, 0, 3, 4, 6, 5]

In [133]:
# indentation, scoping, print() function
for c in colors:
    print 'color is', c # Python 3: print('number is', x)

color is black
color is blue
color is green
color is mauve
color is pink
color is red
color is white


In [137]:
print 'START'

print nums

for index, x in enumerate(nums):

    y = 10*x + 3
    z = 4*x**2 + 7*x -5
    print index, (x, y, z)
    
print 'END'

START
[3, 7, 6, 3, 0, 3, 4, 6, 5]
0 (3, 33, 52)
1 (7, 73, 240)
2 (6, 63, 181)
3 (3, 33, 52)
4 (0, 3, -5)
5 (3, 33, 52)
6 (4, 43, 87)
7 (6, 63, 181)
8 (5, 53, 130)
END


In [138]:
ian = ('Ian', 41, 'Canadian')

In [139]:
ian

('Ian', 41, 'Canadian')

In [140]:
ian[0]

'Ian'

In [141]:
name, age, natl = ian # tuple unpacking

In [142]:
name

'Ian'

In [143]:
age

41

In [144]:
natl

'Canadian'

Functions
=====

In [1]:
# hello
def hello():
    print 'Hello Ian'

In [2]:
hello

<function __main__.hello>

In [4]:
hello()

Hello Ian


In [5]:
hi = hello

In [6]:
hi

<function __main__.hello>

In [7]:
hi()

Hello Ian


In [8]:
# hello name
def hello(name):
    print "Hello", name

In [10]:
hello('Andrew')

Hello Andrew


In [11]:
hi()

Hello Ian


In [12]:
hi('Tom')

TypeError: hello() takes no arguments (1 given)

In [13]:
id(hi)

4403562928

In [14]:
id(hello)

4403563408

In [15]:
from dis import dis

In [16]:
dis(hi)

  3           0 LOAD_CONST               1 ('Hello Ian')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        


In [17]:
dis(hello)

  3           0 LOAD_CONST               1 ('Hello')
              3 PRINT_ITEM          
              4 LOAD_FAST                0 (name)
              7 PRINT_ITEM          
              8 PRINT_NEWLINE       
              9 LOAD_CONST               0 (None)
             12 RETURN_VALUE        


In [18]:
# scoping of 'name'
name = 'Mary'

In [19]:
hello('Jane')

Hello Jane


In [20]:
name

'Mary'

In [21]:
# average (print)
nums = [1, 3, 6, 2, -4, 2, 3, 6]

In [26]:
running = 0
for n in nums:
    running += n**2 # sort-of equivalent to running = running + n**2
    y = 3*n + 2*n**2
    print n, n**2, n+10, y, running

1 1 11 5 1
3 9 13 27 10
6 36 16 90 46
2 4 12 14 50
-4 16 6 20 66
2 4 12 14 70
3 9 13 27 79
6 36 16 90 115


In [45]:
def f(x):
    ' A function f of x that calculates a second order function for y and also z'
    y = 3*x + 2*x**2
    z = 5*x**2 + 4*y**(0.5)
    return y, z # tuple packing

In [47]:
f

<function __main__.f>

In [48]:
help(f)

Help on function f in module __main__:

f(x)
    A function f of x that calculates a second order function for y and also z



In [49]:
f.__doc__

' A function f of x that calculates a second order function for y and also z'

In [51]:
f.__doc__ = 'a function that calculates y and z, returns two tuple (y,z)'

In [52]:
help(f)

Help on function f in module __main__:

f(x)
    a function that calculates y and z, returns two tuple (y,z)



In [53]:
f.color = 'green'

In [54]:
f.color

'green'

In [55]:
f.__dict__

{'color': 'green'}

In [30]:
f(3)

(27, 65.78460969082653)

In [31]:
result = f(3)

In [32]:
result

(27, 65.78460969082653)

In [33]:
type(result)

tuple

In [34]:
a = result[0]
b = result[1]

In [35]:
a

27

In [36]:
b

65.78460969082653

In [37]:
type(a)

int

In [38]:
type(b)

float

In [39]:
f(4)

(44, 106.5329983228432)

In [40]:
s, t = f(4) # tuple unpacking

In [42]:
s

44

In [43]:
t

106.5329983228432

Function Exercise
-----------------
* Create a list of 10 numbers, your choice.
* Use a `for` loop to iterate over it.
* Think about what you'd need to do to calculate the average
* Remember the `len()` function.

**Time: 5 minutes**

In [56]:
nums

[1, 3, 6, 2, -4, 2, 3, 6]

In [60]:
def myaverage(numlist):
    'calculate the mean from an iterable of numbers'
    print 'given numlist:', numlist
    total = 0
    for x in numlist:
        total += x
    mean = total / float(len(numlist))
    print "Got a mean of", mean
    return mean

In [61]:
avg = myaverage(nums)

given numlist: [1, 3, 6, 2, -4, 2, 3, 6]
Got a mean of 2.375


In [59]:
avg

2.375

WARNING WARNING DANGER DANGER
---------------------------
* In a pedagogical setting we use the `print()` function a lot -- this "just" prints some stuff to the screen
* In real programs and scripts we are much more selective and generally have only a few `print()` function calls

**QUESTION:** What do we really want to do in our `average()` function?

(someone who is new to Python)

In [None]:
# try calling average and "capturing" the result

In [None]:
# what is going on here?

In [None]:
# Dynamically typed but strongly typed
# make a prediction: what is the result going to be if we try adding 10 to an average?

In [None]:
# average

In [None]:
# what is different about this from what we saw before?

Scripts
====
* Python commands inside a file
* will be run in order from top to bottom
* only `print()` function calls will result in any output to the screen
* file can have any name
* run it with:

```
python path/to/scriptname
```

* Or for Mac and Linux add a shbang line and make the file executable
    * shbang: `#!/usr/bin/env python`
    * executable: `chmod a+x path/to/scriptname`
    
**Demonstration**

Use *Anaconda Launcher* to start *Spyder*

Script Exercise
---------------
Use the Spyder IDE to create a script `average.py`.  Put inside it:

* the `average` function
* a list of numbers called `nums`
* call `average()`, passing it in the `nums` list of numbers
* capture the result into a reference called `result`
* print the result

**Test:** Try using `[2, 3, 4, 5]` as the input, and make sure you get `3.5` as the result

** Time: 10 minutes ** (includes a break and Q&A at podium)


Improving Our Program
--------------------
* what kind of average did you implement?
* who got 3?
* who got no output?
* who got an exception/error?
* what happens if the list of numbers of empty?

In [None]:
# docstrings, float, self-increment, exceptions

In [None]:
# test empty

In [None]:
# show docstring

In [None]:
# scoping of function-internal variables

In [None]:
# functions as first class objects

In [None]:
# help

In [None]:
# alias reference

In [None]:
# help, id -- "mean" and "average" are just references to an anonymous function object

Exceptions
======
* BAFP vs. LBYL
* only catch exceptions if these two conditions hold:
    * you reasonably expect they will occur
    * you are sure that **you** should be responsible for dealing with it
* under all other circumstances simply allow the exception to occur

Q. Do I Have To Write My Own Functions?
----------------------------------
Answer: Generally, no, you *shouldn't* write your own functions if they already exist.

Q. So what should I do instead?
```
.







.
```
A. Use code that others have already written.  In Python there are 3.5 ways to do this:

* Built-in functions
    * 50 of them
* Standard library
    * *Batteries Included™* means everyone has these
    * 300 packages (aka *modules*), each with many functions included (and *classes*)
* Python Package Index (PyPI): http://pypi.python.org
    * Anaconda includes about 200 of these out of the box
* *methods* on objects ($\frac{1}{2}$)
    * methods are a kind of function
    * this relies on using *classes* that have been written by someone else

Reserved Words vs. Built-in Functions
=====================================
*Reserved Words* are part of the grammar of Python, in the way brackets, operators, colons, and other symbols are used -- **these are not objects or functions**.  Which ones have we seen so far?
```
.








.
```

**Answer:** `for in del def return try except`

There are only 33 of them, about half of which we'll see at least once today:

* 30 reserved words (aka keywords)
    * http://docs.python.org/3.5/reference/lexical_analysis.html#keywords
    * *logic:* `and, not, or, True, False`
    * *namespaces:* `import, from, as, del, global, nonlocal`
    * *object creation:* `class, def, lambda`
    * *functions:* `return, yield`
    * *looping:* `while, for, break, continue`
    * *conditional:* `if, else, elif`
    * *exeptions:* `try, except, finally, raise`
    * *misc:* `pass, assert, with, in, is, None`

*Built-in functions* are functions that you can use *"out of the box"* with Python.  We've only seen a few of these so far.  What are they?
```
.








.
```

**Answer:** `id() len() print()`

**NOTICE:** the difference between `return` as a *reserved word* and `print()` as a *built-in function*
* In Python 1.x and 2.x `print` was a *reserved word*, but it always should have been a *function*

**Question:** How many do you think there are in total?  How many do you think there are in Matlab, just for comparison?

`__builtin__`
=============
* The Python Language defines a special module called *__builtin__* that is part of the Standard Library
* It contains *functions*, *exceptions*, and *classes* that are very common:
    * 10 core types
        * *int, long, float, bool, complex, str, list, dict, tuple, set*
    * 20 supporting types
        * *file, range, object, ...*
    * 40 exceptions (upper camel case, mostly ending in *Error* or *Warning*)
    * 50 functions
        * Math: *abs min max pow round sum divmod*
        * Logic: *all any apply map filter reduce*
        * Iterable: *len range zip iter next sorted*
        * Misc: *print format reload*
        * File: *open*
        * Check: *callable isinstance issubclass*
        * Convert: *bin chr hex cmp coerce oct ord unichr*
        * Introspect: *dir id vars locals globals hasattr getattr setattr delattr compile eval execfile intern hash repr*

* Any reference lookup that doesn't find the reference in the *local* namespace (first) or the *global* (which means *module*) namespace (second) will check the `__builtin__` modules namespace (third)
* CPython automatically provides a reference to the `__builtin__` module in every *global* namespace but gives it the name `__builtins__`
    * under normal use, you never need to use this module reference
* If the *local* or *global* namespace has a reference that is found in `__builtin__` then the `__builtin__` reference will be masked

In [None]:
# average: for loop -> sum

In [None]:
# average the new way:

In [None]:
# max, min

Tuple
======
* light-weight data structure
* associate a number of entries
* ordered (index look-up)
* like a C `struct`
* immutable

In [63]:
# Person
ian = ('Ian', 42, 'Canadian')
maggie = ('Margaret', 11, 'British')

In [64]:
ian[0]

'Ian'

In [65]:
maggie[0]

'Margaret'

In [67]:
ian.append('Syracuse')

AttributeError: 'tuple' object has no attribute 'append'

In [68]:
ian[1]

42

In [69]:
ian[1] = 41

TypeError: 'tuple' object does not support item assignment

In [70]:
# person, "constants"
hilary = ('Hilary', 8, 'American')

In [71]:
hilary

('Hilary', 8, 'American')

In [72]:
family = [ian, maggie, hilary]

In [73]:
family

[('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British'),
 ('Hilary', 8, 'American')]

In [74]:
emily = ('Emily', 40, 'American')

In [75]:
family.append(emily)

In [76]:
family

[('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British'),
 ('Hilary', 8, 'American'),
 ('Emily', 40, 'American')]

In [77]:
family.append('banana')

In [78]:
family

[('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British'),
 ('Hilary', 8, 'American'),
 ('Emily', 40, 'American'),
 'banana']

In [79]:
hello

<function __main__.hello>

In [80]:
family.append(hello)

In [81]:
family

[('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British'),
 ('Hilary', 8, 'American'),
 ('Emily', 40, 'American'),
 'banana',
 <function __main__.hello>]

In [82]:
family[-1]('Steve')

Hello Steve


In [83]:
family.pop()

<function __main__.hello>

In [84]:
family.pop()

'banana'

In [85]:
family

[('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British'),
 ('Hilary', 8, 'American'),
 ('Emily', 40, 'American')]

In [86]:
family.sort()

In [87]:
family

[('Emily', 40, 'American'),
 ('Hilary', 8, 'American'),
 ('Ian', 42, 'Canadian'),
 ('Margaret', 11, 'British')]

In [None]:
# good use of a list

In [None]:
# addition of numbers, strings, lists

In [None]:
# multiplication of numbers, strings, lists

In [89]:
# (x,y) points
points = [(3,7),
         (4,2),
         (8,6),
         (1,5),
         (6,7)]

In [90]:
for pt in points:
    print pt

(3, 7)
(4, 2)
(8, 6)
(1, 5)
(6, 7)


In [91]:
for x, y in points:
    print 'x is', x, 'and y is', y

x is 3 and y is 7
x is 4 and y is 2
x is 8 and y is 6
x is 1 and y is 5
x is 6 and y is 7


Dictionary
===========
* light-weight "associative array" data structure
* aka "map" or "hash map"
* associate a number of entries
* name each entry
* unordered (name look-up)
* mutable

Also:
* foundational data structure in Python (*"everything is a `dict`"*)
* highly optimized (don't bother writing your own hash map)
* Python 3.6 provided even more memory optimization (20-25% savings in most cases!)

In [92]:
# standard dict creation syntax
ian = {'name': 'Ian',
      'age': 42,
      'natl': 'Canadian'}

In [93]:
# look-up
ian

{'age': 42, 'name': 'Ian', 'natl': 'Canadian'}

In [94]:
ian['age']

42

In [95]:
ian['natl']

'Canadian'

In [96]:
# change
ian['age'] = 41

In [97]:
ian

{'age': 41, 'name': 'Ian', 'natl': 'Canadian'}

In [98]:
# add
ian['city'] = 'Syracuse'

In [99]:
ian

{'age': 41, 'city': 'Syracuse', 'name': 'Ian', 'natl': 'Canadian'}

In [100]:
# remove entry
del ian['city']

In [101]:
ian

{'age': 41, 'name': 'Ian', 'natl': 'Canadian'}

In [102]:
# dict constructor
maggie = dict(name='Maggie', age=11, natl='British')

In [103]:
maggie

{'age': 11, 'name': 'Maggie', 'natl': 'British'}

In [104]:
# (k,v) constructor
ian.items()

[('age', 41), ('name', 'Ian'), ('natl', 'Canadian')]

In [None]:
# list of dicts

In [152]:
# update (address, colors)

Class
=====
* created with a `class` statement
* methods are "just" functions with special invocation handling
    * descriptor protocol (advanced topic, not for now)
    * instance object is passed automatically as first argument
* **dunder** (double-underscore) methods have pre-defined semantics
    * only use ones that are specifed by the language
    * don't make up your own
    
**WARNING** What we're doing next is to help you understand how classes work in Python. Only at the **end** will we finally see the conventional way to define a class.  Along the road, however, we'll gain insights into Python's handling of classes.

In [105]:
# empty Person class (incl docstring)
class Person:
    'This is an empty person class'

In [106]:
Person

<class __main__.Person at 0x1068b4870>

In [107]:
# help
'Person' in dir()

True

In [108]:
help(Person)

Help on class Person in module __main__:

class Person
 |  This is an empty person class



In [110]:
# instance
ian = Person()

In [111]:
ian

<__main__.Person instance at 0x1068b5560>

In [112]:
ian.__class__

<class __main__.Person at 0x1068b4870>

In [113]:
ian.age

AttributeError: Person instance has no attribute 'age'

In [114]:
ian.name

AttributeError: Person instance has no attribute 'name'

In [115]:
# add attributes
ian.name = 'Ian'
ian.age  = 42
ian.natl = 'Canadian'

In [116]:
ian.name

'Ian'

In [117]:
ian

<__main__.Person instance at 0x1068b5560>

In [118]:
ian.age

42

In [119]:
ian.__dict__

{'age': 42, 'name': 'Ian', 'natl': 'Canadian'}

In [120]:
ian.__dict__['natl']

'Canadian'

In [121]:
ian.__dict__['color'] = 'green'

In [122]:
ian.__dict__

{'age': 42, 'color': 'green', 'name': 'Ian', 'natl': 'Canadian'}

In [123]:
ian.color

'green'

In [160]:
# where are those attributes?

In [125]:
# instantiation via person_init function
def person_init(p, name, age, natl):
    'add attributes to a person instance object'
    p.name = name
    p.age  = age
    p.natl = natl

In [126]:
maggie = Person()

In [127]:
maggie.name

AttributeError: Person instance has no attribute 'name'

In [128]:
person_init(maggie, 'Margaret', 11, 'British')

In [129]:
maggie.name

'Margaret'

In [130]:
maggie.__dict__

{'age': 11, 'name': 'Margaret', 'natl': 'British'}

In [131]:
# put function into class
class Person:
    def __init__(p, name, age, natl): # "p" comes from __new__ calling __init__
        'add attributes to a person instance object'
        p.name = name
        p.age  = age
        p.natl = natl

In [132]:
hilary = Person('Hilary', 8, 'American')

In [133]:
hilary.name

'Hilary'

In [134]:
hilary.age

8

In [165]:
# rename function into conventional dunder name
class Person:
    def __init__(self, name, age, natl): # "p" comes from __new__ calling __init__
        'add attributes to a person instance object'
        self.name = name
        self.age  = age
        self.natl = natl

In [135]:
# define "birthday()" method to increment age
class Person:
    def __init__(self, name, age, natl): # "p" comes from __new__ calling __init__
        'add attributes to a person instance object'
        self.name = name
        self.age  = age
        self.natl = natl
        
    def birthday(self):
        self.age += 1
        
    def __str__(self):
        return "{n} is {a} years old and comes from {c}".format(n=self.name,
                                                                a=self.age,
                                                                c=self.natl)
    
    def __repr__(self):
        return "Person('{n}', {a}, '{c}')".format(n=self.name,
                                                a=self.age,
                                                c=self.natl)

In [136]:
emily = Person('Emily', 34, 'American')

In [137]:
emily

Person('Emily', 34, 'American')

In [138]:
print emily

Emily is 34 years old and comes from American


In [139]:
emily.birthday()

In [140]:
emily

Person('Emily', 35, 'American')

In [141]:
emily.birthday()

In [142]:
emily

Person('Emily', 36, 'American')

In [166]:
# define "about()" to return string

In [169]:
# define str and repr

In [167]:
# add list of children

In [168]:
# add len and getitem

In [None]:
# iterate over person

In [171]:
# add doc strings

In [172]:
# help

Class Exercise
--------------
1. Create the classic "Circle" class with a radius, color, and (x,y) coordinate for its center
    * docstrings
    * `__init__`, `__str__`, `__repr__` methods
    * `circumfrence` method that returns `math.pi*self.r**2`
2. Create a bunch of circles
3. Put them into a list
4. Iterate over the list
5. Create a package `shapes.py` and put your `Circle` class into it.
    * try to do `from shapes import Circle`

Standard Library and Namespaces
=================

In [143]:
sin(3.14/2)
# sin(theta) = opposite/adjacent (in radians)

NameError: name 'sin' is not defined

In [144]:
import math #  ? have we just done the same as #include <math.h> ???

In [145]:
'sin' in dir(math)

True

In [146]:
sin(3.14/2)

NameError: name 'sin' is not defined

In [None]:
# import math

In [147]:
# sin @ pi/2
math.sin(3.14/2)

0.9999996829318346

In [None]:
# namespaced

In [149]:
# cos for comparison
math.cos(0)

1.0

In [151]:
# atan2 (div by pi mult by 180) for angles of points
s = math.sin # create local namespace alias to math.sin

In [153]:
s(3.14/2)

0.9999996829318346

In [154]:
import math as m # import the math module, but use "m" as the local reference
                 # saves us from doing "m = math"

In [155]:
m.sin(3.14/2)

0.9999996829318346

In [156]:
from math import atan2

In [160]:
from math import sin

def f(x):
    y = x + sin(x)
    return y

In [161]:
dis(f)

  4           0 LOAD_FAST                0 (x)
              3 LOAD_GLOBAL              0 (sin)
              6 LOAD_FAST                0 (x)
              9 CALL_FUNCTION            1
             12 BINARY_ADD          
             13 STORE_FAST               1 (y)

  5          16 LOAD_FAST                1 (y)
             19 RETURN_VALUE        


In [166]:
from math import sin

def f(x):
    s = sin
    y = x + s(x)
    return y

In [167]:
dis(f)

  4           0 LOAD_GLOBAL              0 (sin)
              3 STORE_FAST               1 (s)

  5           6 LOAD_FAST                0 (x)
              9 LOAD_FAST                1 (s)
             12 LOAD_FAST                0 (x)
             15 CALL_FUNCTION            1
             18 BINARY_ADD          
             19 STORE_FAST               2 (y)

  6          22 LOAD_FAST                2 (y)
             25 RETURN_VALUE        


In [158]:
from math import acos as ac #selective import with alias

In [159]:
from math import acos as ac
from math import atan as at
from math import atan2 as at2

from math import sin,cos,tan

<function math.acos>

In [None]:
# what else is in the math namespace? dir()

In [None]:
# that isn't very helpful! What did we use before to find out about "average()"?

Import Aliasing
--------------
*But I use sin and cos a lot! This namespace thing is going to be very burdensome!*


In [None]:
# option 1: setup your own alias "m"
m = math

In [None]:
# option 2: alias "s" and "c"
s = math.sin
c = math.cos

```
.






.
```
Scratch that, there are better ways to do both of those things:


In [None]:
# option 3: alias the module at import
import math as m

In [None]:
# option 4: selectively import object from module into current namespace
from math import sin, cos

In [None]:
# option 5: selectively import AND alias
from math import sin as s, cos as c

Anything Else About The Python Standard Libary?
--------------------------------------------
Only that it is totally amazing and you should avoid it at your peril:
* stable
* always available
* optimized
    * written in C if necessary
* documented
* 100% test coverage
* used extensively in the field
    * how likely do you think it is you'll be the first to discover a bug?
    

How Do I Learn More?
-------------------
* Come to events like this!
* Google for what you need. GIYF: it will most likely point you to [python.org](http://python.org) and if not?
* Skim Chapter 10 of the Python Tutorial: [A Brief Tour of the Standard Library](https://docs.python.org/3/tutorial/stdlib.html)
    * and also possibly Chapter 11: [Part 2](https://docs.python.org/3/tutorial/stdlib2.html)
* Just browse or search the [Official Standard Library Module Index](https://docs.python.org/3/py-modindex.html)
    * make sure you're checking the version that matches your Python version
* Ask a friend!
* As a last resort, search the [Complete Index Of Everything Inside The Standard Library](https://docs.python.org/3/genindex-all.html)

So what's this **Anaconda** thing, then?
--------------------------------------
* [200+ of the most commonly used, publicly available, open source, libraries and tools](http://docs.continuum.io/anaconda/pkg-docs) for computational science in Python **that are not found in the Standard Library**
* They all "Just Work" no compilation or build chains or manual dependency resolution required
* A further 200 packages that you can install on-demand:
    * `conda install biopython`
        * 2MB -- do this if you want to try it out
* 250 MB instead of 25 MB
* Available for free, for ever, for Windows, Mac, Linux (and Raspberry Pi)
* More than just Python (don't do these now!)
    * `conda install -c r r-essentials`
    * `conda install -c ijstokes julia`
        * v0.3.10, OS X only!

But Python Package Index and `pip`?
---------------------------------
* the [Python Package Index](https://pypi.python.org/) has over 70,000 community contributed packages
* `pip install fred`
* plays nicely with Anaconda and `conda` (so feel free to mix-and-match)
* remember that no one checks the packages in PyPI: *caveat emptor*!

Files
===
* best bet is to use a data-format-specific library that can read and write files from disk for you (e.g. HDF5)
* but sometimes you need to DIY
* and you should know how to do this anyway

In [169]:
# save points to a file
points
with open('xy_basic.tab', 'w') as fh:
    for x, y in points:
        fh.write('{x}\t{y}\n'.format(x=x, y=y))

In [170]:
!cat xy_basic.tab

3	7
4	2
8	6
1	5
6	7


In [1]:
# read in file, shifted by (5, 5)
# comment: split, append, int, close
result = []
with open('xy_basic.tab') as datafile:
    for line in datafile:
        x, y = line.split()
        x = int(x)
        y = int(y)
        result.append((x,y)) # inner round-brackets create 2-tuple (x,y)

Methods
=====
* operations you can perform on an object -- e.g.
    * Perl: `sort(array)` is a function call, which returns a sorted array
    * Python: `array.sort()` is a method call, which acts on the array and sorts it

In [None]:
# copy with slice

In [None]:
# reverse

In [None]:
# append

In [None]:
# extend

In [None]:
# multi-list stuff

In [None]:
# wrong append

In [None]:
# make a prediction: what is nums[-1]?

In [None]:
# What is the correct way to "add" a list?

In [None]:
# list addition (numbers)

In [None]:
# list addition (strings)

In [None]:
# addition

In [None]:
# but has nums changed?

In [None]:
# self-increment

In [None]:
# check id()

In [None]:
# not the same as "a = a + [1,2,3]"

In [None]:
# sentence (string)

In [None]:
# title

In [None]:
# lower

In [None]:
# split

In [None]:
# chain: lower split

In [None]:
# split on letter

In [None]:
# shortcut: how to make a list of words (and methods on literals)

In [None]:
# format (name, age)

In [None]:
# number formatting :.2f

More references for string formatting:
* https://mkaz.github.io/2012/10/10/python-string-format/
* https://pyformat.info/

Booleans, Conditionals, Comprehensions
=====================

In [2]:
# -12 to 20, steps of 3
vals = range(-12,20,3)

In [3]:
# threes and evens
[v for v in vals if v % 2 == 0]

[-12, -6, 0, 6, 12, 18]

In [5]:
[v for v in vals if abs(v) == 3]

[-3, 3]

In [8]:
# keep short colors 'green red blue black yellow pink gold silver'
colors = 'green red blue black yellow pink gold silver'.split()

In [9]:
[c for c in colors if len(c) <= 4]

['red', 'blue', 'pink', 'gold']

In [None]:
# skip short        

In [None]:
# comprehension: evens

In [None]:
# short colors

In [None]:
# squared axis

In [None]:
# upper case colors

In [None]:
# random.randint

In [None]:
# How would you create a list of 20 random integers?

Comprehension Exercise
-----------------------
Create list comprehensions that:
* filter the list selecting only values greater than 3
* create a new list containing only the odd numbers, but multiply these by 10
    * odd numbers can be found by testing if `v%2 == 1`

**Time:** 5 minutes

Numpy and Pandas
=========
[Numpy](http://www.numpy.org/) (released 2006 by Travis Oliphant, founder of Continuum) provides the foundation for numerical computing in Python:
* defines an `ndarray` that can be used for vector (and matrix) computations
* functions and methods supporting linear algebra
* implemented in C
    * fast
    * memory efficient
    
[Pandas](http://pandas.pydata.org) (released 2009 by Wes McKinney, maintainer is now Jeff Reback from Continuum) 
* provides R-style `DataFrame` class
* many convenience functions
* facilitates interaction with spreadsheets (Excel, CSV) or database tables
* built on top of Numpy

In [None]:
# make a prediction: what will a + b return? (at least, what dimensions will it have?)

In [None]:
# multiply

In [None]:
# dot product

In [None]:
# -2pi to +2pi with linspace

In [None]:
# fast vectorized sin function

In [None]:
%matplotlib inline

In [None]:
# plot & title (with latex)

In [None]:
# noise from randn

In [None]:
# boolean masking arrays: >= 5

In [None]:
# flattens ndarray based on masking array

In [None]:
# big/small with np.where

In [None]:
# broadcast addition

In [None]:
# max method (3 variants)

In [None]:
# sum method (3 variants)

In [None]:
# mean method (3 variants)

Next Steps
======
* find a project where you can start using Python
* refer to the follow-up tutorials listed at the top
* join one of the Boston area Python and Data Science meetups:
    * [Boston Python User Group](http://www.meetup.com/bostonpython/)
    * Search for "Python" or "Data Science" at [meetup.com](http://www.meetup.com) near you
        * pro-tip: make sure the group is active and relevant before signing up!