NIA Intro to Python Class - May 17, 2017

# Day 3: Control Flow, plus advanced data types

Part 1 of today's talk focuses on control flow ([wiki article](http://en.wikipedia.org/wiki/Control_flow)), which is the part of a programming language's syntax that enables execution of the program to follow down one or more branches of instructions conditionaly, or going in loops.
* <code>if</code>/<code>then</code>/<code>else</code>
* <code>while</code> loops
* <code>for</code> loops
* nested <code>for</code> loops

Part 2 of todays talk will involve discussion of two new data types that are third-party extensions to Python but are universally used in the data analysis.
* Matrices using [NumPy arrays](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
* DataFrames using [Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)

---

## Preview for tomorrow:

[These](https://seaborn.pydata.org/examples/index.html) are just a few of the types of data visualizations you can do in Python.

## Review of material thus far

1. Familiarizing yourself with the Jupyter Notebook IDE
    * code completion using Tab key
    * print out all your variables
    * syntax highlighting
    * more
2. Python scalar data types
    * <code>int</code>
    * <code>float</code>
    * <code>bool</code>
3. Python iterable data types
    * <code>str</code>
    * <code>list</code>
    * <code>tuple</code>
    * <code>dict</code>
    * <code>set</code>
4. Operators and Operations on Iterables
    * add items to and delete items from iterables
    * split, slice and concatenate
    * Nested iterables
    * Basic sorting

## Conditional Statements

### The <code>if</code> statement (a simple conditional)

* Use the keyword <code>if</code>, followed by the test, followed by a colon
* lines that should be evaluated if the test is true should be indented.

In [1]:
if True:
    print( "True fact.")
print( "This line prints regardless.")

True fact.
This line prints regardless.


### <code>if</code>/<code>else</code> statements (a one alternative conditional)

* The <code>else</code> statements goes at the same indentation level as the matching <code>if</code> statement:

In [3]:
if True:
    print( "True fact.")
    print( "Yup." )
else:
    print( "This ain't gonna print." )
print( "This line prints regardless.")

True fact.
Yup.
This line prints regardless.


### Simple tests and compound tests

Use an operator inside the conditional, and use other operators to combine tests.

In [5]:
some_value = 0

if some_value < 0:
    print( str( some_value), "has a negative sign.")
else:
    print( str( some_value) , "doesn't have a negative sign.")

0 doesn't have a negative sign.


In [21]:
test_value = 0

if test_value < 10 or test_value > 20:
    print( str( test_value ), "is out-of-bounds.")
else:
    print( str( some_value ) , "is inbounds.")

0 is out-of-bounds.


Boolean expressions are evaluated left to right and have an order of operations. Use parentheses to clarify.

### <code>any()</code> and <code>all()</code>

In [13]:
some_conditions = [False, False, False, True, True]

In [10]:
any( some_conditions )

True

In [11]:
all( some_conditions )

False

### By the way....

Coercing True or False values into integers is one way to count them.

In [15]:
int(True)

1

In [16]:
int(False)

0

In [19]:
sum( some_conditions )

2

### <code>if</code>/<code>elif</code>/<code>else</code> (multi-test conditional)

In [24]:
test_value = 0

if test_value < 10:
    print( str( test_value ), "is too low")
elif test_value > 20:
    print( str( test_value ), "is too high")
else:
    print( str( test_value ) , "is just right.")

0 is too low


### The <code>pass</code> statement: no naked <code>if</code> statements!
If you want Python to do nothing if the condition is true, you can't just leave a blank line, you have to use the keyword <code>pass</code>, properly indented.

In [58]:
planet_earth = { 'population' : 6e9, 'color' : 'blue' }

# There's nothing I can do...
if planet_earth['color'] == 'blue':
    pass
else:
    print "Floating in my tin can."

### One-liner if statements

You can put the single conditionals on one line if you want.

In [60]:
if 'man' is 5: the_devil = 6

## <a name="while"><code>while</code> loops</a>

A while loop evaluates a boolean expression and does the code in the loop over and over as long as the expression evaluates to true.

In [59]:
age = 15
while age < 21:
    print( "No beer, you", str(age) + "-year-old, wait until next year.")
    age += 1

print( "You're 21, it's party time!")

No beer, you 15-year-old, wait until next year.
No beer, you 16-year-old, wait until next year.
No beer, you 17-year-old, wait until next year.
No beer, you 18-year-old, wait until next year.
No beer, you 19-year-old, wait until next year.
No beer, you 20-year-old, wait until next year.
You're 21, it's party time!


Another way to structure a while loop is to "loop forever" and use a conditional statement with the <code>break</code> keyword.

## <a name="for"><code>for</code> loops</a>

* "Python’s <code>for</code> statement iterates over the items of any sequence (a list or a string), in the order that they appear in the sequence." [reference](http://docs.python.org/2/tutorial/controlflow.html).
* Often times if you know exactly how many times you need to loop, you'll use the <code>range()</code> function, which returns a list of numbers for the for loop to iterate over.
* Each time through the loop, Python with put the next item in the sequence into the variable whose name you declare by putting it between the <code>for</code> and <code>in</code> keywords.

### Iterate N times

In [29]:
# range counts from 0
list( range(10) )

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [32]:
for i in range(10):
    if i == 1:
        suffix = 'st'
    elif i == 2:
        suffix = 'nd'
    elif i == 3:
        suffix = 'rd'
    else:
        suffix = 'th'
    print( str(i) + suffix, "time through the for loop." )

0th time through the for loop.
1st time through the for loop.
2nd time through the for loop.
3rd time through the for loop.
4th time through the for loop.
5th time through the for loop.
6th time through the for loop.
7th time through the for loop.
8th time through the for loop.
9th time through the for loop.


### Iterate over a list of objects

In [34]:
name_list = ['dick', 'jane', 'spot', 'mom', 'dad' ]

In [36]:
for name in name_list:
    print( "see", name, "run!" )

see dick run!
see jane run!
see spot run!
see mom run!
see dad run!


### Unpacking nested iterables inside the <code>for</code> loop
* See how you can unpack the tuple right inside the for loop:

In [47]:
name_list

['dick', 'jane', 'spot', 'mom', 'dad']

In [48]:
num_names = len( name_list )

In [49]:
num_names

5

In [50]:
indices = list( range( num_names ) )

In [51]:
indices

[0, 1, 2, 3, 4]

In [52]:
zipped_together = list( zip( indices, name_list ) )

In [53]:
zipped_together

[(0, 'dick'), (1, 'jane'), (2, 'spot'), (3, 'mom'), (4, 'dad')]

In [57]:
for i, name in zipped_together:
    print( "Line", i, "- See", name, "run!" )

Line 0 - See dick run!
Line 1 - See jane run!
Line 2 - See spot run!
Line 3 - See mom run!
Line 4 - See dad run!


### Using <code>enumerate()</code> to count off for you

* The code above is quivalent to using enumerate.
* Use the function <code>enumerate()</code> if you need to slap an index onto an iterable you already have.
* Each time through the loop <code>enumerate()</code> returns a tuple of two values, the first being the index, and the second being the value.

In [58]:
for i, name in enumerate( name_list ):
    print( "Line", i, "- See", name, "run!" )

Line 0 - See dick run!
Line 1 - See jane run!
Line 2 - See spot run!
Line 3 - See mom run!
Line 4 - See dad run!


## Nested <code>for</code> loops

You can put for loops inside other for loops. For example here's a brute force way to create a multiplication table.

In [None]:
import numpy as np

In [61]:
all_rows = []

for i in range(1,13):
    a_row = []
    for j in range(1,13):
        a_row.append( i * j )
    all_rows.append( a_row )

In [62]:
all_rows

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
 [11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
 [12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]]

## NumPy arrays

* Rather than nested lists, you can use a matrix.
* Use if you have data all of the same type (ints, floats, bools)
* Row indices and column indices count from 0!
* Numpy matrices have basic statistics built in.

In [63]:
# import the package and give it a nickname
import numpy as np

### Initialize a new matrix from a nested list

In [66]:
mult_table = np.array( all_rows )

In [67]:
mult_table

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

### The <code>.shape</code> attribute

In [70]:
mult_table.shape

(12, 12)

### Declare an empty matrix of a fixed size

In [None]:
np.zeros((12,12))

In [None]:
np.empty((12,12))

### Indexing on NumPy Arrays

* Use brackets <code>[]</code>
* Inside brackets, row/column indices are separated by a comma

In [None]:
for i in range(1,13):
    for j in range(1,13):
        mult_table[ i-1, j-1 ] = i * j

mult_table

### Missing data

* Oftentimes, missing data is represented as <code>np.nan</code>, which stands for Not A Number
* No missing data representation for an integer

In [87]:
decimal_mult_table = mult_table.astype(float)

In [None]:
decimal_mult_table

In [89]:
decimal_mult_table[0]

array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])

In [90]:
decimal_mult_table[0].sum()

78.0

In [91]:
decimal_mult_table[0, 11]

12.0

In [93]:
decimal_mult_table[0, 11] = np.nan

In [95]:
decimal_mult_table[0]

array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        nan])

In [94]:
decimal_mult_table.sum()

nan

In [99]:
np.nansum( decimal_mult_table[0] )

66.0

## PANDAS DataFrame

* Emulate R's <code>data.frame</code> structure.
* Basically a NumPy matrix with
    * Row and column names
    * Can have columns of different types
    * Handles missing data better

In [1]:
import pandas as pd

In [None]:
df1.dropna(how='any')