#  Python for Data Analysis and HPC
## PHIT P8010 Fundamentals of High Performance Computing
Daniel Bauer (bauer@cs.columbia.edu)

February 2, 2016

## Acknowledgment
Many of the materials in this segment are courtesy of Larry Stead.

## Overview

Objectives: 

1. Introduction to Programming in Python
2. Data Analysis with Python: 
    - *Numpy* and *pandas*
    - Visualization with *Matplotlib*
3. Python and HPC
    - PBS scripts with Python.
    - Parallelism with IPython on the HPC cluster.
    - SMP processing with the *multiprocessing* module.

Each section will take about 1h. Short breaks in between. Small lab exercises.

# Part 1: Introduction to Programming in Python

## Python Overview

In [104]:
import antigravity

**What is Python?**
- Versatile, interpreted, high-level, dynamic, programming language.
- Multi paradigm: simple procedural programming, object-orientation and functional programming.
- Scales well to different applications.
- Great developer community. Easy to get help.
- Lots of available libraries.
- Popular in science (data analytics etc.), web development, application development, scripting language as part of other software, ...

**Python design goals**

Easy to learn and use:
- automatic memory management
- high-level built-in data structures
- batteries included: Large standard library
- Typically 10x faster to develop code than in Java/C++

Readability:
- intuitive syntax
- minimal boilerplate

Dynamic Behavior:
- interpreted language
- dynamic typing
- introspection

Portable: 
- different interpreters: *CPython, cython, Jython, IronPython, PyPy*

Extendable: 
- Reusable code: Modules and Packages.
- All of Python is open source.
- Easy to write new modules in C.

**Criticism and Misconceptions**
- *“Python is a scripting language”*
   - False. Python has been used as a scripting language, but it is
also used to develop large stand-alone applications.
- *“Python is interpreted, thus slower than running native code”*, Somewhat true, but
  - Python can be used to ‘glue’ together native modules.
  - Libraries are often very efficient.
- *“Whitespaces are ugly.”*
   - You’ll get used to it.
- *“Dynamic typing is unsafe.”*
Python is strongly typed and well behaved. It can deal with type errors at runtime.

** When not to use Python**
- When implementing low-level routines of CPU bound programs.
- In large collaborative projects?
     - Problem of dynamic typing.
     - Needs good documentation / workflow.
     
    
## Python Versions
Two branches: 
- Python 2: Current and ultimate release: 2.7.10 
   - Some libraries not ported to Python 3.
- Python 3 (**used in this class**): Current release: 3.5.1 
   - Some major changes and clean-ups
   - Not backward compatible (often cannot execute 2.x code)
   - Ongoing development
   
## Installing and Running Python on the Systems Biology Cluster   
Connect to cluster. 

> ssh [username]@login.c2b2.columbia.edu 

or use *Putty* on windows.
  
The standard Python version on the cluster is old (2.6). More up-to-date version in /nfs/apps/python. To select one of these versions as your Python environment: 
> \$ export APPPATH=/nfs/apps  
> \$ export PYTHONPATH=\$APPPATH/python/3.3.3  
> \$ export PATH=\$PATH:\$PYTHONPATH/bin  

You can also add these three lines to your *.bashrc* file to make the change permanent. You can then run the interactive Python interpreter by typing 
> \$ python3  
> Python 3.3.3 (default, Jan 17 2014, 16:22:21)  
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux  
> Type "help", "copyright", "credits" or "license" for more information.  
> \>\>\>  

## Interactive Mode vs. Running Programs on the Command Line
Type an expression on the interactive interpreter to evaluate it and produce a result

In [6]:
7*6

42

We can also assign result to variables and reuse them. Once defined, variables are valid througout the session.

In [7]:
x = 7*6
y = x - 19
y

23

*IPython* (the environment you are reading this in) is another type of interactive environment.  

Alternatively we can safe a program in a file (with the suffix .py) and run it from the shell.

> \$ nano cube.py  
> \$ python3 cube.py

In [13]:
def cube(i):  
  """
  Compute i^3
  """
  return i*i*i # Compute and return the result

# By the way, this is a comment. 
i=7  
print("Hello World! The cube of " + str(i) + " is " + str(cube(7)))  

Hello World! The cube of 7 is 343


Other interpreters can be faster than the standard Python (notably [pypy](http://pypy.org/)).

## Elementary Python Syntax - Whitespaces and Blocks
In Python, indentation level and linebreaks are syntactically relevant!
- Single most hated Python feature
- Actually useful: enforces readable code.
- Warning: Never use tab stops in Python source files.

In C/C++/Java:

```  
while (x==1) {
  if (y) { 
    f1();
  }
  f2(); 
}```

In Python:
```  
while (x==1) 
..if (y)  
....f1() 
..f2()
```

## Basic Data Types

**Elementary types and constants**:

In [18]:
x = None # None type 
t,f = True, False # Boolean
n = 42 # Integer
p = 3.14 # Float

**Container Types - Overview**:

In [21]:
s = 'Hello World' # String, sequence of characters. 
l = [1, 2, 3, 17.2, 'python', None] # List, sorted sequence of objects
tup = ('Doe', 'Jane', 29, 'New York', True) # Tuple, immutable sorted sequence
d = {1:'A', 2:'B', 3:'A'} # Dictionary, map from keys (1,2,3) to values (A,B,C)

**Other types:** 
In Python everything is an object and every object has a type.

* File objects  
* Instance objects of programmer-defined classes.  
* Functions are objects too!
* Classes are objects too!

## Python Memory Model
Variables are simply names that refer to objects in memory. Variables **do not** have types, but objects do.


In [4]:
a = 42
b = "fourtytwo"
c = 123.1

<img width="300px" src="files/memory_model-0.png">

We can reassign variables to objects of a different type. **Dynamic typing**

In [5]:
a, b = b, a

<img width="300px" src="files/memory_model-1.png">

In [None]:
c = None

<img width="300px" src="files/memory_model-2.png">
When an object has no reference, it will be removed from memory by the **garbage collector**.
<img width="300px" src="files/memory_model-3.png">

Python has **mutable** and **immutable** objects.
Objects of mutable types can be modified (lists, dictionaries, sets, instance objects):

In [10]:
l = [1, 2, 3, 17.2, 'python', None] # lists are mutable
l[5] = True
l

[1, 2, 3, 17.2, 'python', True]

In [12]:
tup = ('Doe', 'Jane', 29, 'New York', True) # tuples are immutable
tup[0] = 'Bob'

TypeError: 'tuple' object does not support item assignment

Be careful when multiple variables are assigned to a mutable object! 

In [14]:
l2 = l # Now l2 and l point to the same object
l2 

[1, 2, 3, 17.2, 'python', True]

In [16]:
l[0] = 'hello'
l2 # Because l2 points to the same object as l, the output changes.

['hello', 2, 3, 17.2, 'python', True]

**Object and Value Equality**  

We can test if two variables are assigned to the same object.

In [18]:
l is l2

True

In [20]:
l3 = ['hello',2,3,17.2,'python',True]
l3 is l2 # Not the same objects as l2

False

In [21]:
l3 == l2 # But the same value.

True

## Control Flow

**Conditional statements:** This works similar to other languages. 

```
if conditionExp1: 
  statement1
  ...
elif conditionExp2:
  statement2
  ...
elif confitionExp3:
  statement3
  ...
...
else: 
  statement4
```

In [35]:
if l3[0] == "hello" and b==42:
    print("Hi.")
else: 
    print("Bye.")

Hi.


Other expressions that evaluate to False: integer 0, ```None```, empty list/string/truple/dictionary/set  
All other objects evaluate to True.

In [45]:
print('type a positive integer number:') 
n = int(input()) # Read a number

if not n:
  print('Error: n was 0.') 
elif n < 0:
  print('Error: n is negative.') 
elif n == 1 or n == 2 or n == 3:
  print('n is prime.') 
else:
  if (not n % 2 or not n % 3 or not n % 5 or not n % 7): 
    print(str(n) + ' n is not prime.')
  else:
    print(str(n) + ' is prime.')

type a positive integer number:
83
83 is prime.


Conditionals that return a value can be written more concisely as conditional expressions.

In [48]:
n=-1
if n < 0:  
    result = n + 1
else: 
    result = n - 1
result    

0

In [49]:
n + 1 if n < 0 else n -1  #conditional expression

0

**While loops:**
The standard loop construction in Python is the *while* loop. *For* loops are reserved for iterators and will be discussed below. 

In [52]:
x = int(input()) # Read a number

count = 0
while x > 0: # repeat the next block while the condition is True
  x = x/2 
  count += 1 

print('approximate log_2: ' + str(count))

19
approximate log_2: 5


In [53]:
x = 5
while x:
  x -= 1
  if x % 2:   # x is even
    continue # continue jumps to the next iteration of the loop, skipping remaining lines
    print(x)

4
2
0


In [54]:
x = 10
while True:  # infinite loop
  print(x)
  x -= 1
  if x == 7:
    break

10
9
8



## Lists, Tuples, and Dictionaries

**Sequence Operations**: Objects that contain ordered sequences of elements. Strings, list, tuple. 
All support a shared set of operations.

In [56]:
lx = [1]
ly = [2,3,4,5]
len(x) # get length

0

In [57]:
len("supercalifragilisticexpialidocious") # number of chars in string

34

In [58]:
lx + ly # concatenation, creates a NEW sequence of the same type

[1, 2, 3, 4, 5]

In [59]:
lx

[1]

In [62]:
lx + "hi" # does only work for sequence of the same type

TypeError: can only concatenate list (not "str") to list

In [61]:
3 * ('A',) # Repetition

('A', 'A', 'A')

In [63]:
3 in ly # Testing for membership

True

In [65]:
'tuna' in 'fortunate' # For strings, we can also search for substrings.
# this does not work for other sequence types.

True

In [66]:
'banana'.count('a') # Count number of occurences

3

In [67]:
'banana'.count('an') # Number of substrings

2

In [68]:
(23,5,8.5).index(5) # return the index of this element (starting at 0)

1

**Sequence Indexing and Slicing**: 

In [72]:
x = [(1, 2, 3), 'foo', 1.0]
x[1]  # Indexing starts at 0

'foo'

In [73]:
x[0][2] # Nested indexing.

3

In [75]:
x[-1] # Reverse indexing

1.0

Slicing returns a copy of a subsequence.

* ```x[i:j]``` returns subsequence from index(inclusive) to j (exclusive)
* ```x[i:]``` returns subsequence from index i (inclusive) to the end
* ```x[i:]``` returns subsequence from beginning to index j (exclusive)

In [77]:
x = [0,1,2,3,4]
x[1:]

[1, 2, 3, 4]

In [78]:
x[:-2] # Can use reverse indexing in slices

[0, 1, 2]

In [79]:
x[1:3]

[1, 2]

** Iterating through Sequences with for-Loops **

In [86]:
for x in [1,2,3,4,5]:
  if x % 2 == 0:
    print(x)

2
4


The **range** function can be used to iterate through a sequence of numbers. In Python 3, the range function returns an iterator, so the sequence of numbers is not stored explicitly in memory.

In [87]:
for x in range(10): 
  print x

0
1
2
3
4
5
6
7
8
9


In [88]:
for x in range(-3,2):
  print x

-3
-2
-1
0
1


In [91]:
for x in range(-4,4,2): # using a step parameter
    print x

-4
-2
0
2


**List Comprehensions**  
Common task in data processing: Apply some function to each element of an iterator and produce a new list (mapping, filtering). Python supports a special notation for this.

In [92]:
v = [1,3,2,9,4,12,8]
[x*2 for x in v]

[2, 6, 4, 18, 8, 24, 16]

In [94]:
[x for x in v if x > 5] # with a filter condition

[9, 12, 8]

In [99]:
[x if x % 2==0 else "odd" for x in v] # using conditional expressions in comprehensions

['odd', 'odd', 2, 'odd', 4, 12, 8]

In [98]:
[(a,b) for a in range(1,3) for b in ['a','b']] # multiple for statements are nested

[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

**Other list operations**: Lists have a number of operations not supported by other sequence types.  

Lists are mutable and can be modified:

In [113]:
a = ['apples', 'pears']
a[1] = 'oranges'
a

['apples', 'oranges']

In [114]:
a.append('bananas') # adding to a list
a

['apples', 'oranges', 'bananas']

In [115]:
x = a.pop() # remove last element from the list and return it.
# Lists can be used as Stacks
x

'bananas'

In [116]:
a

['apples', 'oranges']

In [119]:
l = ['apple', 'orange', 'banana', 'orange']
l.remove('orange') # remove first occurence
l

['apple', 'banana', 'orange']

In [120]:
l.reverse()
l

['orange', 'banana', 'apple']

In [122]:
l.sort() # sort the list (in this case alphabetically)
l

['apple', 'banana', 'orange']

**Dictionaries**

A dictionary is a collections of objects (values) indexed by *unique* keys.
This data structure is extremely powerful and one of the reasons why Python is great for data analysis.

*Important:* Cannot use lists and dictionaries (or other mutable objects) as keys.

In [123]:
legs = {'cat':4, 'human':2, 'centipede':100}  # animals are keys (unique), numbers are values.
legs['cat'] # indexing by key

4

In [125]:
legs['python'] = 0 # adding a new key/value pair into the dictionary
legs

{'cat': 4, 'centipede': 100, 'human': 2, 'python': 0}

In [129]:
legs['centipede'] = 30 # overwriting value for a key
legs

{'cat': 4, 'centipede': 30, 'human': 2, 'python': 0}

A very common pattern is to iterate through all keys in the dictionary to do something with the values.

In [131]:
for key in legs: 
   print('A '+ key + ' has ' + str(legs[key]) + ' legs.')

A python has 0 legs.
A centipede has 30 legs.
A human has 2 legs.
A cat has 4 legs.


### Exercises 1
1 - Use two nested for loops to compute the value 
of expressions of the form (a[1]+...+a[m])*(b[1]+...+b[n]).
For example for a = [1, 2, 4] and b = [2,3] the result should be 35.

In [134]:
a = [1,2,4]
b = [2,3]

2 - In the following list, every tuple describes a part-time employee in the format 
```(name, hours_worked_this_week, hourly_wage)```. Write a single list comprehension that produces a list of tuples in the format ```(name, total_pay)```, where ```total_pay``` is the product of ```hours_worked_this_week``` and ```hourly_wage```.

In [140]:
employees = [('Bob', 40, 18.25), ('Mary', 10, 20.00), ('John', 0, 100.90), ('Carl', 19, 17.21), ('Meg', 60, 22.10)]
expected_result = [('Bob', 730.0),('Mary', 200.0),('John', 0.0),('Carl', 326.99),('Meg', 1326.0)]

3 - Given the following dictionary mapping ruits to their color, write a program that creates a dictionary that maps colors to lists of fruits with this color. Hint: Make sure to initialize lists before you add fruits to them.

In [142]:
fruit_to_color = {'banana':'yellow',
                  'blueberry':'blue',
                  'cherry':'red',
                  'lemon':'yellow',
                  'kiwi':'green',
                  'strawberry':'red',
                  'tomato':'red'}

In [149]:
expected_result = {'blue': ['blueberry'],
                   'green': ['kiwi'],
                   'red': ['tomato', 'cherry', 'strawberry'],
                   'yellow': ['lemon', 'banana']}

## Strings

Much of the data you will process comes as Strings. Even numeric data needs to be converted from strings. In addition to all sequence operation, Python supports a number of additional operations on strings.

Strings can be defined using single or double quotes.

In [150]:
"Hello 'world'"

"Hello 'world'"

In [151]:
'Hello, "world"'

'Hello, "world"'

In [152]:
'hpc'.capitalize()

'Hpc'

In [153]:
'hpc'.upper() 

'HPC'

In [154]:
'HpC'.lower() # commonly used for normalization

'hpc'

In [155]:
'python'.startswith('py')

True

In [156]:
'python'.endswith('on')

True

**Splitting and joining**

In [158]:
langs = "python,java,lisp,haskell"
langs

'python,java,lisp,haskell'

In [159]:
langs.split(',') # Split the string on the ',' character

['python', 'java', 'lisp', 'haskell']

In [163]:
q = "An African \t or European\n swallow?"
print(q)

An African 	 or European
 swallow?


In [165]:
q.split() #Default: split on white spaces, including tabstops, linebreaks 

['An', 'African', 'or', 'European', 'swallow?']

In [168]:
data = ['A','B','C']
','.join(data) # Good for producing CSV (comma separated value) files.

'A,B,C'

In [171]:
data = [27, 5.1, 'female', True]  # Careful: All entries need to be strings
','.join( [str(x) for x in data] )

'27,5.1,female,True'

##File I/O

To read and write from and to a file, it first needs to be opened. 

The value of f is a file handler object (an instance of class \_io.TextIOWrapper)
 that we can call read and write methods on.

In [5]:
f = open('testfile.txt', 'w') # open the file for writing
f.write("Hello world!\n") #write something to the file.
f.write("Hi!\n") #write something to the file.
f.close() # close the file

'w' is the mode in which the file is opened. 

* 'r' = read
* 'w' = write (overwrite entire file if it exist)
* 'a' = apend (open file to append at the end)



For larger binary files, the file handler should be buffered. This can be specified by appending 'b' to the mode flag.

In [7]:
f = open('data.bin', 'wb')
f.close()

**Reading from a file**: 

In [8]:
f = open('testfile.txt', 'r') # open file for reading
line = f.readline()
while line: 
    print(line)
    line = f.readline()
f.close()

Hello world!

Hi!



In [9]:
f = open('testfile.txt','r')
for line in f:  # this is easier for text files
    print(line)
f.close()

Hello world!

Hi!



In [10]:
f = open('testfile.txt','r')
lines = f.readlines() # get all lines at once... bad for big files
f.close()
lines

['Hello world!\n', 'Hi!\n']

In [12]:
f = open('testfile.txt','r')
content = f.read() #read gets entire file
content

'Hello world!\nHi!\n'

In [13]:
f.seek(0) # seek jumps to specific position in the file. Rarely used.
s = f.read(5) # read 5 characters
while s: 
    print(s)
    s = f.read(5)
f.close()

Hello
 worl
d!
Hi
!



**Writing to a file**:

In [14]:
f = open('testfile.txt', 'w') # open the file for writing
f.write("Hello world!\n") #write something to the file.

lines = ["Hi","world"]
f.writelines(lines) # write a sequence of lines

f.flush() # explicitly flush the buffer, writing everything to disk

f.close() # close the file, also flushes the buffer

The following is a more elegant paradigm to open a file.
  * Error handling.
  * Make sure file closes correctly in any case.

In [103]:
with open('testfile.txt','r') as f:
    lines = f.readlines() # or whatever other file operation you need...
    for l in lines: 
       print(l)

Hello world!

Hiworld


## Functions
A function is a subroutine that computes some result given its parameters.

* More readable code: Break up code into meaningful units.
* Avoid duplicate code.
* Can be shared through modules.
* Abstract away from concrete problem.
* Powerful computational device: allow recursion

In [22]:
import math
def pythagoras ( leg_a , leg_b ) :
    """ 
    Compute the length of the hypotenuse
    opposite of the right angle between leg_a
    and leg_b .
    """
    hypotenuse = math . sqrt ( leg_a **2 + leg_b **2)
    return hypotenuse

In [23]:
pythagoras(3.0, 4.0)

5.0

Functions are objects too! This is imporant, because they can be passed to other functions.

In [27]:
pythagoras

<function __main__.pythagoras>

In [30]:
x = pythagoras
x(4.0,3.0)

5.0

In [36]:
x = [(1 , 'b') ,(4 , 'a') ,(3 , 'c')]
def compare(a,b):
    return 1 if a[1] > b[1] else -1
x.sort(compare)
x

[(4, 'a'), (1, 'b'), (3, 'c')]

In [37]:
x = [(1 , 'b') ,(4 , 'a') ,(3 , 'c')]
#somewhat more readable
def getkey(a): 
    return a[1]
x.sort(key=getkey)
x

[(4, 'a'), (1, 'b'), (3, 'c')]

**lambda expressions** define function objects without a name (function literals).

In [38]:
x = [(1 , 'b') ,(4 , 'a') ,(3 , 'c')]
# even more readable
x.sort(key=lambda a: a[1])
x

[(4, 'a'), (1, 'b'), (3, 'c')]

**recursion:** Functions can call themselves. This is an intuitive way to describe some algorithms, but has some caveats.

In [42]:
def fac ( n ) :
    """ 
    Compute n !
    """
    if n == 0: # base case .
        return 1
    else:
        return n * fac (n -1)
    
fac(10)    

3628800

## Object Oriented Programming

Object Oriented Programming (OOP) is at the core of
Python.
* Everything is an object!
* Operations are methods on objects.
* Modularization.
* We have seen examples of objects already:
   * Objects of built-in data types (int, str, list, dict ... ).
   * Functions.
* Can create our own types (classes).
* Python does not enforce OOP (unlike Java), but we need to understand at least what is going on.

**Classes**: 
   * User defined types of objects (including their methods, attributes, relations to other objects).
   * Can be instantiated into an object / is a ‘blueprint’ that describes how to build an object.

In [57]:
class Knight(object):
    """ 
    A knight with two legs ,
    who can eat food .
    """
    legs = 2 # attribute
    
    def __init__ (self, name): # constructor
        self.stomach = []
        self.name = name
    
    def eat(self, food): # method
        self.stomach.append(food)
        print('Nom nom.')        

**Instance objects**: instances of classes. 

In [58]:
lancelot = Knight("Lancelot")

**Attributes:** data fields on each instance object.

In [59]:
lancelot.name

'Lancelot'

**Methods**: functions that belong to the object and can access
and manipulate the object’s data. All Methods are attributes too.

In [60]:
lancelot.eat("pie")

Nom nom.


In [61]:
lancelot.stomach

['pie']

In [62]:
lancelot.eat

<bound method Knight.eat of <__main__.Knight object at 0x108e29450>>

* first parameter (```self```) of a method is automatically filled in when the method is called on an instance object.
* __init__ is a special method that is called after an instance is created (set up instance data etc.)

**Inheritance**: 
 * Classes inherit from one or more base classes.
 * Look up methods and class attributes in base classes if not found in class.

In [74]:
class Person(object): # "object" is the parent object of all classes.
    
    def __init__(self):
        self.stomach = []
    
    def eat(self, food): # method
        self.stomach.append(food)
        print('Nom nom.')   
        
class Knight(Person): #inherit from Person
    
    def __init__(self):
        super(Knight, self).__init__()
    
    def go_on_quest(self):
        print('I shall seek the holy grail.')

lancelot = Knight() 
lancelot.go_on_quest()

I shall seek the holy grail.


In [76]:
lancelot.eat('coconut')

Nom nom.


Python classes can implement a number of **special methods** to give language-level functionality.

In [85]:
class Knight(object): #inherit from Person
    
    def __init__(self, name):
        self.name = name
        self.stomach = []
    
    def eat(self, food):
        self.stomach.append(food)
        print("Nom nom.")
        
    def __str__(self): # special method that converts this object into a string
        return self.name + (" (hungry)" if not self.stomach else "")
    
lancelot = Knight("Lancelot")    
print(lancelot) # calls str(lancelot) implicitly, which calls __str__

Lancelot (hungry)


In [86]:
lancelot.eat('cookie')
print(lancelot)

Nom nom.
Lancelot


In [87]:
lancelot # default representation for an instance

<__main__.Knight at 0x108e3d650>

In [88]:
class Knight(object): #inherit from Person
    
    def __init__(self, name):
        self.name = name
        self.stomach = []
    
    def eat(self, food):
        self.stomach.append(food)
        print("Nom nom.")
        
    def __repr__(self): # representation for this object to be displayed in the interactive interpreter    
        return self.name + (" (hungry)" if not self.stomach else "")

In [90]:
l = Knight("Lancelot")
l

Lancelot (hungry)

## Modules and Packages

* Python programs can consist of multiple modules (in multiple files).
* Independent groupings of code and data.
* Can be re-used in other programs.
* Can depend on other modules recursively.
* So far we have used a single module:
   * Used the interpreter’s interactive mode.
   * Written single-file python programs.
* We have seen example modules: sys, antigravity, ...

** Structure of a module **:
* A module corresponds to any Python source file (no special syntax).
* The module ‘name’ is typically in file ‘name.py’.
* File can contain class and function definitions, code.
* Can contain a doc string (string in first nonempty line).

```
""" 
A module to illustrate modules. in file example_module.py
"""
class A (object):
  def __init__ ( self , * args ) :
    self.args = args

  def quadruple ( x ) :
    return x **4

x = 42
print (" This is an example module .")```

** Imports **:

In [93]:
import example_module # will run all the code at top level

 This is an example module .


In [94]:
example_module.x

42

In [96]:
a = example_module.A()
a

<example_module.A at 0x108e32550>

Can also import a class/function/name directly.

In [99]:
from example_module import A
a = A()
a

<example_module.A at 0x108e292d0>

**Main Functions:**
   * Problem: Modules often contain some test code that we do
     not want to run every time it is imported.
   * Modules contain a special attribute ```name```
   * ```name == 'main'``` if this module is the first one loaded
    (i.e. passed to the interpreter on the command line).
   * Always use the following main function idiom:
   
```
def main () :
   ...

if __name__ == " __main__ :
   main ()
```   

**Packages**:
  * Packages are modules that contain other modules as
attributes.
  * Packages allow you to create trees of modules.
  * A package corresponds to a directory. (i.e. the package
graphtools.directed.tree is in the file
graphtools/directed/tree.py).
  * Package directories must contain a file init .py
containing the module code (even if its empty).

## Exercises 2

The file files/WHO_first9cols.csv contains a simplified section of a data set collected by the World Health Organization, containing social, economic, health, and political indicators. (Source: http://www.exploredata.net/Downloads/WHO-Data-Set)

* Write a class ```CountryInfo``` that represents each column of the data.
* Write a function that reads the data set from a file and returns a dictionary mapping country names to their ```CountryInfo``` instances.
* Create a list of countries sorted by population.
* Create a list of countries with a literacy rate of less than 50%.

In [27]:
class CountryInfo:
    def __init__(self, country, cid, continent, fertility, literacy, gni, enrolmentf, enrolmentm, population):
        self.country = country
        self.cid = int(cid) if cid else None
        self.continent = int(continent) if continent else None
        self.fertility = int(fertility) if fertility else None
        self.literacy = float(literacy) if literacy else None
        self.gni = int(gni) if gni else None
        self.enrolmentf = int(enrolmentf) if enrolmentf else None
        self.enrolmentm = int(enrolmentm) if enrolmentm else None
        self.population = int(population) if population else None

countries = {}     
with open("files/WHO_first9cols.csv",'r') as data: 
    head = data.readline()
    line = data.readline()
    while line:        
        countryobj = CountryInfo(*line.strip().rsplit(",",8))
        countries[countryobj.country] = countryobj
        line = data.readline()
    

## Resources and References
  - Official Python documentation (2.7 and 3) http://docs.python.org/
  - Official Python tutorial. http://docs.python.org/tutorial/
  - Online Python Cookbook. http://code.activestate.com/recipes/langs/python/
  - Mark Pilgrim, Dive into Python 3. http://diveintopython3.ep.io/
  - Daniel's Fall 2014 Python class: http://www.cs.columbia.edu/~bauer/cs3101-1/lectures.html