# EPA1333 - Computer Engineering for Scientific Computing
## Week 4 - Sept 26, 2017

**Think  Python** -
**How to Think Like a Computer Scientist**

*Allen B. Downey*


## A 2-dimensional Matrix

A two dimensional matrix can be represented by a list of lists.
```
M = [ 
      [1,2,3], 
      [4,5,6], 
      [7,8,9]
    ]
```

Elements can be accessed using: M[x][y]

Note M[1,2] does **not** work with standard Python lists.


In [1]:
# Initialize Matrix
M = [ 
      [1,2,3], 
      [4,5,6], 
      [7,8,9] 
    ]

print(M)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


In [2]:
# Select element

M[2][1]       # Remember indices start at 0!

8

In [3]:
# Manipulating elements
M[2][0] += 10
M

[[1, 2, 3], [4, 5, 6], [17, 8, 9]]

In [4]:
# Let's write a function to add 2 matrices together, elementwise.
# M + M does not work!
# M + M will concatenate 2 lists, ending up in a bigger matrix (twice as many rows).


def add_matrix( M, N ):
    """Add two matrices elementwise. Both matrices must have the samen dimensions!
    Return a new matrix."""
    
    result = []
    for i in range( len(M) ):        # amount of rows in matrix
        new_row = []
        for j in range( len(M[0] )):    # amount of columns in matrix ( nr of elements in a row)
            new_row.append( M[i][j] + N[i][j] )
        result.append(new_row)

    return result

In [5]:
add_matrix(M,M)

[[2, 4, 6], [8, 10, 12], [34, 16, 18]]

In [6]:
# Check adding empty Matrix
add_matrix([[]],[[]])

[[]]

However, using lists for multi-dimenstional matrices is a bit awkward.
Selecting rows is possible, but selecting columns is not straightforward.
Slicing also does not work as expected.


In [7]:
# Select first fow as a slice 
print(M)
M[:1]

[[1, 2, 3], [4, 5, 6], [17, 8, 9]]


[[1, 2, 3]]

In [8]:
# Now, what if we want the first column of that first row... Nope...
M[:1][:1]

[[1, 2, 3]]

In [9]:
# Or what if we want to select the last few columns ... Nope...
M[:1][1:]

[]

Later on we will see how NumPy will provide a more suitable n-dimensional array.

### Exercise: Transposition and Multiplication of matrices

The transposition of a (m x n) matrix is a (n x m) matrix where the rows and columns have been switched.

The multiplication of a (m x n) matrix and a (n x p) matrix is a new (m x p) matrix.

<div class="alert alert-success">
<h2>Exercises</h2>
Write a function transpose that takes a matrix (list of lists) and returns a new matrix that is the transposition of that matrix.

Write a function that returns the matrix-multiplication of two matrices. Check if the multiplication is
defined or not (nr of columns of the one matrix, matches the nr of row of the other matrix).
</div>

## Ch 11: Dictionaries

A dictionary is a very versatile data structure. It consists of *(key, value)* pairs.

```python
d = dict()

d = { 'k1' : 'v1', 'k2' : 'v2' }

d['k3'] = 'v3'
```
  
You can only ask values based on key. There is no 'order', you cannot ask
for the first, second, etc. 

To traverse a dictionary you can get a (sorted) list of keys, a list of values or
a list of key, value pairs.

In [10]:
# Example 1: dictionaries are a type

d = {}  # empty dictionary

print( type(d ) )
 

<class 'dict'>


In [12]:
# initializing dictionaries

a = { 1 : 'one', 2 : 'two' , 3 : 'three' }

# values can be anything, including lists and dictionaries
b = { 'a' : [1,2,3], 'b' : a }

print(a)
print(b)

{1: 'one', 2: 'two', 3: 'three'}
{'b': {1: 'one', 2: 'two', 3: 'three'}, 'a': [1, 2, 3]}


In [13]:
# Accessing elements of a dictionary
a = { 1 : 'one', 2 : 'two' , 3 : 'three' }

print('Dictionary is', a)                # You can print a dictionary
print('Key 1 has value', a[1] )
print('Key 3 has value', a[3] )
print()

print('Dictionary is', b)
print('Key a has value', b['a'])

Dictionary is {1: 'one', 2: 'two', 3: 'three'}
Key 1 has value one
Key 3 has value three

Dictionary is {'b': {1: 'one', 2: 'two', 3: 'three'}, 'a': [1, 2, 3]}
Key a has value [1, 2, 3]


In [14]:
# Dictionaries are mutable
print(a)

a[3] = 'Something else'
print(a)

{1: 'one', 2: 'two', 3: 'three'}
{1: 'one', 2: 'two', 3: 'Something else'}


In [15]:
# Dicts are comparable
print( { 1 : 'a', 2 : 'b'} == { 1: 'a', 2: 'b'})  # the same dictionaries

print( { 1 : 'a', 2 : 'b'} == { 2: 'b', 1: 'a'})  # order of keys does not matter

print( {1:1, 2:2} == {2:2, 1:1, 3:3} )            # size of dictionary does matter


True
True
False


In [16]:
# But no + and * operands...
{ 'a' : 1 } + { 'b' : 2 }

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

In [17]:
# Adding values to a dictionary

# Just assign a value to it
a = { 'a' : 1, 'b' : 20}

a['c'] = 300

print(a)           # Note the order of keys is not sorted


{'c': 300, 'b': 20, 'a': 1}


In [19]:
# The in operator is defined for keys
print( 'c'  in a )

print( 1 in a )

True
False


In [20]:
# Traversing a dictionary

for key in a:
    print(key, a[key] )

c 300
b 20
a 1


In [21]:
# Traversing a dictionary in order

for key in sorted(a):
    print( key, a[key])

a 1
b 20
c 300


In [22]:
# Traversing the values

for value in a.values():
    print( value )

300
20
1


In [23]:
# Traversing key, value pairs

for item in a.items():
    print(item)

('c', 300)
('b', 20)
('a', 1)


In [24]:
# Traversing key, value pairs, using tuple assignment (see later)

for key, value in a.items():
    print(key, value)

c 300
b 20
a 1


In [25]:
# Deleting values from a dictionary

del a['a']
a

{'b': 20, 'c': 300}

### Using dictionaries: counting / histogram

Typical uses for dictionaries is to count occurrences.

In [26]:
# Counting letters in a word

def letter_histogram( text ):
    """Create a histogram of the letters in the given text. The result is a dictionary."""
    
    d = {}    # Create an empty dictionary
    
    for letter in text:
        
        # Only count alphanumeric (letters/numbers), not punctuation, etc.
        if not letter.isalnum(): 
            continue            # Continue immediately starts the next iteration, 
                                # skipping the rest of the body.
                                # It is similar to 'break', but break immediately 
                                # jumps out of the loop,
                                        
        if letter in d:         # Check if the key exists in dictionary already. 
                                # We can code this more efficiently (see later)
            d[letter] += 1  
        else:                
            d[letter] = 1
    
    return d

In [27]:
letter_histogram("The quick brown fox jumps over the lazy dog.")

{'T': 1,
 'a': 1,
 'b': 1,
 'c': 1,
 'd': 1,
 'e': 3,
 'f': 1,
 'g': 1,
 'h': 2,
 'i': 1,
 'j': 1,
 'k': 1,
 'l': 1,
 'm': 1,
 'n': 1,
 'o': 4,
 'p': 1,
 'q': 1,
 'r': 2,
 's': 1,
 't': 1,
 'u': 2,
 'v': 1,
 'w': 1,
 'x': 1,
 'y': 1,
 'z': 1}

Checking if a key is in the dictionary and if not using a default value is a very common action. There is an easy way for that: 

    d.get(key, default-value) => Returns d[key] if key exists, otherwise default-value

In [28]:
def increment( d, k ):
    """Increment the value for key k in the dictionary d by 1. Start at 0 if not present yet."""
    d[k] = d.get(k, 0) + 1
    return d
    

In [31]:
# Now you can increment any value in a dictionary even if the key does not exist.

increment({}, 'k')

{'k': 1}

<div class="alert alert-success">
<h2>Exercises</h2>
Try Exercises 10.4 - 10.6.
</div>

## Ch 12: Tuples

Tuples are unmutable sequence of values, similar to unmutable lists. Values can be of any type.

```python
  t = ( 1, 2, 3)
  
  t[2]
  t[1:3]
  ```

In [32]:
# Initialization of a tuple

t = (1, 2, 3)

t

(1, 2, 3)

In [33]:
# Tuples are a separate type
type(t)

tuple

In [34]:
# Creating a singleton tuple (1 element) requires special syntax

singleton = ( 10, )      # Note the trailing comma

print(singleton)

print(type(singleton))

(10,)
<class 'tuple'>


In [35]:
# Creating an empty tuple
t1 = ()     
t2 = tuple()

print( t1 == t2)

True


In [36]:
# Slicing and tuples

t = tuple( range(10) )

print( 't\t\t', t )
print( 't[1]\t\t', t[1] )
print( 't[1:4]\t\t', t[1:4] )
print( 't[:3]\t\t', t[:3])
print( 't[7:]\t\t', t[7:])
print( 't[-1]\t\t', t[-1] )
print( 't[-3:-1]\t', t[-3:-1])


t		 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
t[1]		 1
t[1:4]		 (1, 2, 3)
t[:3]		 (0, 1, 2)
t[7:]		 (7, 8, 9)
t[-1]		 9
t[-3:-1]	 (7, 8)


In [37]:
# But tuple element assignment does not work

t[3] = 10

TypeError: 'tuple' object does not support item assignment

In [38]:
# Tuples can be compared, elementwise. Comparison stops at the first element that is different between the tuples.

print((1, 2) < ( 1, 3 ))
print((1, 2, 3) < ( 2, 0, 0))
print( (1,2) < (2,) )

True
True
True


In [39]:
# sorting and reverse sorting
# Works for any sequence (lists, strings)

t = ( 132, 2, 4, 3821, 10, 29, 38  )

print('Sorted tuple', sorted(t))
print('Reverse sorted tuple', sorted(t, reverse=True))


Sorted tuple [2, 4, 10, 29, 38, 132, 3821]
Reverse sorted tuple [3821, 132, 38, 29, 10, 4, 2]


### Typical uses for tuples

* multiple return values of functions
* swapping of values
* as 'keys' for dictionaries

In [40]:
# Split a name into first and last name

def splitName( name ):
    nameparts = name.split()
    
    return (nameparts[0],  nameparts[1] )
    

In [41]:
# Use multiple return values

first, last = splitName( 'Steve Jobs' )

print('Firstname:', first)
print('Lastname:', last)

Firstname: Steve
Lastname: Jobs


In [None]:
# Swapping variables

a = 10
b = 'John'

print('a is', a)
print('b is', b)

print('Swapping...\n')
b, a = a, b

print('a is', a)
print('b is', b)


In [42]:
# Keys in dictionaries must be immutable, so no lists, but tuples are ok.

# Telephone book: { (first, lastname) : phonenumber }
tbook = dict()

def add( name, number ):
    ( first, last ) = splitName( name )
    
    tbook[( first, last )] = number

In [43]:
# Put some names in the telephone book
add( 'Steve Jobs', '555-123456')
add( 'Bill Gates', '555-987654')
add( 'Michael Dell', '555-101010')
add( 'Bill Hewlett', '555-555555')
add( 'Dave Packard', '555-888444')

print(tbook)

{('Steve', 'Jobs'): '555-123456', ('Bill', 'Gates'): '555-987654', ('Michael', 'Dell'): '555-101010', ('Bill', 'Hewlett'): '555-555555', ('Dave', 'Packard'): '555-888444'}


### Exercise

Sort a list of words based on their length. Biggest words first.

Hint 1: Useful code to 'strip' texts from whitespace and punctuation.

```python
import string

string.whitespace = ' \t\n\r\x0b\x0c'
string.punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

"text".strip( chars )  # Strip leading and trailing characters listed in 'chars' 
```

Hint 2: Sorting sequences (e.q. lists and tuples)

```python
sorted( [ 10, 3, 8, 5] => [ 3, 5, 8, 10 ]
sorted( [ 10, 3, 8, 5], reverse=True) => [ 10, 8, 5, 3]
```

In [45]:
text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean vel magna scelerisque libero tempor consequat eu 
sit amet tellus. Donec sit amet ipsum id magna semper eleifend. 
Nam vehicula augue eget est pretium, quis accumsan est rutrum. 
Donec elit lorem, pretium ac semper gravida, accumsan sed mauris. 
Quisque quis enim eget tortor maximus fringilla quis sed libero. 
Proin ut gravida lectus. Mauris suscipit tempor lacus sed ultrices. 
Maecenas accumsan pretium posuere. Etiam facilisis ligula diam, 
ac ultrices ligula posuere id."""

In [46]:
import string

# Split string into words, and use lower-case only
words = text.lower().split()

In [47]:
# Let's check what we have...
words[:6]

['lorem', 'ipsum', 'dolor', 'sit', 'amet,', 'consectetur']

In [48]:
# Create a list of tuples with (length, word) pairs
l = []
for w in words:    
    # Cleanup word, remove punctuation and whitespace
    cleaned = w.strip( string.punctuation + string.whitespace )  
  
    l.append( (len(cleaned), cleaned) )

In [49]:
# Again, let's check what we have...
l[:6]

[(5, 'lorem'),
 (5, 'ipsum'),
 (5, 'dolor'),
 (3, 'sit'),
 (4, 'amet'),
 (11, 'consectetur')]

In [50]:
# Finally, sort the list of tuples in reverse order and print it.
for (size, w) in sorted( l, reverse=True ):
    print("%d : %s" %(size, w) )

11 : scelerisque
11 : consectetur
10 : adipiscing
9 : fringilla
9 : facilisis
9 : consequat
8 : vehicula
8 : ultrices
8 : ultrices
8 : suscipit
8 : maecenas
8 : eleifend
8 : accumsan
8 : accumsan
8 : accumsan
7 : quisque
7 : pretium
7 : pretium
7 : pretium
7 : posuere
7 : posuere
7 : maximus
7 : gravida
7 : gravida
6 : tortor
6 : tempor
6 : tempor
6 : tellus
6 : semper
6 : semper
6 : rutrum
6 : mauris
6 : mauris
6 : ligula
6 : ligula
6 : libero
6 : libero
6 : lectus
6 : aenean
5 : proin
5 : magna
5 : magna
5 : lorem
5 : lorem
5 : lacus
5 : ipsum
5 : ipsum
5 : etiam
5 : donec
5 : donec
5 : dolor
5 : augue
4 : quis
4 : quis
4 : quis
4 : enim
4 : elit
4 : elit
4 : eget
4 : eget
4 : diam
4 : amet
4 : amet
4 : amet
3 : vel
3 : sit
3 : sit
3 : sit
3 : sed
3 : sed
3 : sed
3 : nam
3 : est
3 : est
2 : ut
2 : id
2 : id
2 : eu
2 : ac
2 : ac


In [51]:
# Try to remove the double values
# Use a dictionary, or better yet a set!
# A set cannot contain double elements and has no order.

wordset = set()
for w in words:
    cleaned = w.strip( string.punctuation + string.whitespace ) # Cleanup word
    
    # Add tuple (length, word) into a set, which will be unique
    wordset.add( (len(cleaned), cleaned) )    
    
    
# Now print it in sorted order
for (size, word) in sorted(wordset, reverse=True) :
    print("%d : %s" % (size, word) )


11 : scelerisque
11 : consectetur
10 : adipiscing
9 : fringilla
9 : facilisis
9 : consequat
8 : vehicula
8 : ultrices
8 : suscipit
8 : maecenas
8 : eleifend
8 : accumsan
7 : quisque
7 : pretium
7 : posuere
7 : maximus
7 : gravida
6 : tortor
6 : tempor
6 : tellus
6 : semper
6 : rutrum
6 : mauris
6 : ligula
6 : libero
6 : lectus
6 : aenean
5 : proin
5 : magna
5 : lorem
5 : lacus
5 : ipsum
5 : etiam
5 : donec
5 : dolor
5 : augue
4 : quis
4 : enim
4 : elit
4 : eget
4 : diam
4 : amet
3 : vel
3 : sit
3 : sed
3 : nam
3 : est
2 : ut
2 : id
2 : eu
2 : ac


### Set (Ch 19)

A set is a mutable collection of values. There is no order and elements can only be in the set once.
Indexing and slicing does not work, as there is no order.

```python
s = set()
s = {}

s.add(1)
s.add(2)

for x in s:
   print(x)
   
   
s[0] does not work
```


In [53]:
# Example sets
s = set()

for i in range(5):
    s.add(i)
    
print('Set is', s, 'size:', len(s))

s.add(3)   # does nothing, 3 is already in the set

print('Set is', s, 'size:', len(s))


Set is {0, 1, 2, 3, 4} size: 5
Set is {0, 1, 2, 3, 4} size: 5


<div class="alert alert-success">
<h2>Exercises</h2>
Try Exercises 12.1 - 12.4.
</div>

### List comprehensions (Ch 19)

There is a nice shortcut for creating lists (and tuples, sets and dictionaries).

Typical python style: compact and reads as math (e.g. { x | x >= 2 } )

```python
 l = [ x for x in range(5) ] => [ 0, 1, 2, 3, 4 ]
 l = [ x*x for x in range(5) ] => [ 0, 1, 4, 9, 16 ]
 
 l = [ math.sqrt(x) for x in range(5) if ( x%3 == 0) ] => [ 0.0, 1.73205... ]

 l = [ ( len(word), word ) for word in text if word.isalnum() ]

 l = [ ... for ... in ... if ... ]
```



In [54]:
# Example without list comprehension
l = []
for i in range(5):
    l.append(i)
    
print(l)

[0, 1, 2, 3, 4]


In [55]:
# Example with list comprehension
l = [ i for i in range(5) ]
print(l)

[0, 1, 2, 3, 4]


In [56]:
import math
[ math.sqrt(x) for x in range(5) if ( x%3 == 0) ]


[0.0, 1.7320508075688772]

In [60]:
# A more complex example
text = "Hello this is a test."
[ ( len(word), word.lower() ) for word in text.split() if word.isalnum() ]


[(5, 'hello'), (4, 'this'), (2, 'is'), (1, 'a')]

In [61]:
# A dictionary example

d = { k : v for k, v in enumerate( text.split() )}
d

{0: 'Hello', 1: 'this', 2: 'is', 3: 'a', 4: 'test.'}

In [62]:
# A shorter version of our text exercise
# using a set and list comprehension notation.

# Create a clean set of (unique) words
wordset = { w.strip( string.punctuation + string.whitespace ) for w in text.lower().split() }

# Create a set of (length, word) tuples of cleanedup words in text.
tupleset = { ( len(w), w ) for w in wordset }

# Or in a one-liner ...
#tupleset = { ( len(w), w ) for w in 
#              [ x.strip(string.punctuation + string.whitespace) for x in text.lower().split() ]
#          }

# Now sort and print the tuples.
for t in sorted(tupleset, reverse=True):
    print("%d : %s" % t)

5 : hello
4 : this
4 : test
2 : is
1 : a


## Ch: 13 Case study: data structure selection

Some examples on how to use datastructures to solve problems in python.

  * histograms
  * random numbers
  

In [63]:
# Random numbers
# random.random returns a float between [0, 1]
import random

for i in range(5):
    print(random.random())

0.4149059649499386
0.04405580555914146
0.6458302014918825
0.23721253367545114
0.8305512197443271


In [64]:
# random.randint( min, max) returns an integer between [ min, max ]
for i in range(5):
    print(random.randint( 10, 20 ))

12
11
19
13
14


In [65]:
# random.choice( list ) picks a random element from a sequence

l = ['John', 'Matthew', 'Liz', 'Sarah', 'Bob', 'Alison']
for i in range(3):
    print( random.choice(l) )


Matthew
Sarah
Matthew


In [66]:
# the seed value determines the 'pseudorandom' numbers.
# Useful if you want 'deterministic' behavior (e.g. for debugging)

# Set the random seed. Normally not set to get "random" behavior each time.
random.seed(101)          # without argument: use 'currentTime' as seed

# If seed is the same, this will lead to the same numbers.
for i in range(5):
    print(random.random())

0.5811521325045647
0.1947544955341367
0.9652511070611112
0.9239764016767943
0.46713867819697397


<div class="alert alert-success">
<h2>Exercises</h2>
Try Exercise 13.8: Markov Analysis.

Feel free to try any of the other exercises in Ch 13.
</div>

## Ch 14: Files

Reading, writing, creating, renaming and deleting files.

Before we can read/write from/to a file, it must be opened.

```python
f = open( filename, mode )

f.readline()     # Read a single line
f.readlines()    # Returns a list of all lines
f.write( "text" )

f.close()
```

mode indicates how we want to use the file, use a *combination* of:
  * 'r' - open the file for reading (default)
  * 'w' - open the file for writing, truncate the file!
  * 'a' - open the file for writing, append to the end.
  * 'x' - create the file and open file for writing.
  
  
  * 't' - open the file in text mode (recognizes newlines) (default)
  * 'b' - open the file in binary mode (interpret all characters as-is)


In [131]:
# List files in directory
import os

os.listdir()

['.ipynb_checkpoints',
 'Untitled.ipynb',
 'Week4-Assignments.pdf',
 'Week4.ipynb']

In [132]:
# Create a file for writing
f = open('NEW_FILE.TXT', 'x')

In [133]:
os.listdir()

['.ipynb_checkpoints',
 'NEW_FILE.TXT',
 'Untitled.ipynb',
 'Week4-Assignments.pdf',
 'Week4.ipynb']

In [134]:
# Write something to the file... explicitly write a newline.
f.write( 'This is a line of text.\n' )


f.write( text )
f.flush()         # flush the file, necessary in case of buffering.
f.close()         # close the file, which also forces a flush()

In [137]:
# check
!type NEW_FILE.TXT


This is a line of text.
Hello this is a test.


In [136]:
# Now open the file again for reading.

f = open( 'NEW_FILE.TXT', 'r')

for line in f.readlines():
    print('Read line:', line, end='' )
    
f.close()

Read line: This is a line of text.
Read line: Hello this is a test.

In [138]:
# Append a line to the end
f = open('NEW_FILE.TXT', 'a')

f.write('\nA last line at the end\n')

f.close()

In [141]:
!type NEW_FILE.TXT

This is a line of text.
Hello this is a test.
A last line at the end


#### Renaming and deleting files

We can rename and delete files with the following commands.

```python
import os

os.rename( file1, file2)  # Rename file1 to file2
os.remove( file )         # Delete a file
```

In [142]:
os.rename( 'NEW_FILE.TXT', 'RENAMED_FILE.TXT')

os.listdir()

['.ipynb_checkpoints',
 'RENAMED_FILE.TXT',
 'Untitled.ipynb',
 'Week4-Assignments.pdf',
 'Week4.ipynb']

In [143]:
os.remove( 'RENAMED_FILE.TXT' )

os.listdir()

['.ipynb_checkpoints',
 'Untitled.ipynb',
 'Week4-Assignments.pdf',
 'Week4.ipynb']

#### Creating, deleting directories / folders

Python can also manipulate directories / folders.

```python
os.getcwd()     # get current directory
os.chdir( dir ) # change current directory ('..' is the parent directory)

os.mkdir( dir ) # create a new directory in the current directory
os.rmdir( dir ) # remove a directory
```


### Storing data persistently

If you want to store (intermediate) results persistently, you have to save the result on persistent storage, such as a file or a persistent database.

Internal python datastructures such as lists, dictionaries, tuples, etc. are all lost as soon as python quits.

There are two things you have to do:
  1. Choose a persistent storage medium (file, database, etc).
  2. Choose a *suitable format* in which you want to store your data (data representation format)   

#### Suitable format

Not all datastructures in Python cannot be immediately written to a file/database.
The binary data often difficult to represent in a file/database.
  * simple types such as integers, floats, strings are *usually* ok.
  * binary files can only be read/edited by the 'same program', usually not human readable.

**Solutions:**
  1. Choose a standard representation form, such as CSV or JSON or pickling or ...
  2. Choose your own custom format (document it!), 
  > e.g. a list is represented by a number N (nr of elements in the list) followed by
  N lines each containing a string representation of the elements.
  
  [ 1, 2, 3 ] is represented as
  
        3<br>
        1<br>
        2<br>
        3<br>
         

         

### Pickling

Writing data usually uses strings. How do you write a list or dictionary to a file?

First we have to *encode* the list/directory into a string-friendly format. This is
called *pickling* or *serializing*. Then we can write the list to a file.

When reading a pickled object from file, the reverse must be done to turn it into a 
list/dictionary that python understands.

```python
import pickle
 
l = [1,2,3]
pickled_l = pickle.dumps(l)   # serialize the list (dump string-version of the list)

new_l = pickle.loads( pickled_l )  # deserialize the list (load string-version)

``` 
 
 

In [144]:
# Example of serializing a list
import pickle

l=[1,2,3]

pickled_l = pickle.dumps(l)

print(pickled_l)       # Unreadable, but understandable for python.

b'\x80\x03]q\x00(K\x01K\x02K\x03e.'


In [145]:
# Example of deserializing a serialized list
pickle.loads( pickled_l )

[1, 2, 3]

In [146]:
import os
l = [ 1,2,3 ]

# Open the file in binary mode!!!
f = open('STORAGE.TXT', 'xb')

# This will not work, cannot write a list directly.
#f.write( l )     

f.write( pickle.dumps( l ))
f.close()

In [147]:
# File is not human readable unfortunately...
!type STORAGE.TXT

€]q (KKKe.


In [148]:
# Open the file in binary mode!
f = open('STORAGE.TXT', 'rb')

s = f.readline()
f.close()

print(type(s))
print(s)

l = pickle.loads(s)
l

<class 'bytes'>
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'


[1, 2, 3]

In [149]:
os.remove('STORAGE.TXT')

### Shelves

There is module *shelve* that is persistent dictionary and does *pickling* for you.
The only restriction is: the *keys* must be *strings*.

```python
import shelve

d = shelve.open( filename )

d[key] = [1,2,3]
data = d[key]

for k in d.keys()
```

In [157]:
# Example shelve. Performance is not always the best.
import shelve

d = shelve.open('SHELVE_DB')

d['key1'] = [1,2,3]

print('Read from DB: key1:', d['key1'])

d.close()

Read from DB: key1: [1, 2, 3]


In [159]:
os.remove('SHELVE_DB.dat')

## Formatting strings / alignment / precision

If you output results, you may want to have control of the format of the output.
The string-format can be used for that.


    "%d %f %e %s" % ( decimal, float, scientificfloat, string )
    
### How to align fields
  * Use tabs (\t) in your format string (%d\t%d)
  * Use padding and preciesion in format string ( %10.2f )
  * Use rjust(x), ljust(x), center(), zfill() methods of a string
  
  

In [160]:
M = [ [ random.random() * 20 for i in range(3) ] for i in range(3) ] 
M

[[13.269412890601211, 4.290459394759361, 4.4339249905248135],
 [5.770448667625123, 13.84845491990635, 4.247535367166622],
 [19.422119027075475, 1.4071096860232846, 3.8657325746218207]]

In [161]:
# Write a table
for r in M:
    print( "%f %f %f" % (r[0],r[1],r[2]))
    

13.269413 4.290459 4.433925
5.770449 13.848455 4.247535
19.422119 1.407110 3.865733


In [162]:
# Write a table, use tabs
for r in M:
    print( "%f\t%f\t%f" % (r[0],r[1],r[2]))
    

13.269413	4.290459	4.433925
5.770449	13.848455	4.247535
19.422119	1.407110	3.865733


In [163]:
# Write a table, using formating <size>.<precision>
for r in M:
    print( "%10.3f %10.3f %10.3f" % (r[0],r[1],r[2]))
    

    13.269      4.290      4.434
     5.770     13.848      4.248
    19.422      1.407      3.866


In [164]:
# Write a table, using formating <size>.<precision>, 0 padding
for r in M:
    print( "%015.3f %015.3f %015.3f" % (r[0],r[1],r[2]))
    

00000000013.269 00000000004.290 00000000004.434
00000000005.770 00000000013.848 00000000004.248
00000000019.422 00000000001.407 00000000003.866


In [165]:
# Similar for strings
# Create a matrix of words.

M = [ [w for w in text.split()[i:i+3] ] for i in range(0,9,3)]

M

[['Hello', 'this', 'is'], ['a', 'test.'], []]

In [None]:
# Write a table, using size parameters
for r in M:
    print( "%15s %15s %15s" % (r[0], r[1], r[2]) )
    

In [None]:
# Write a table, using formating rjust() (or ljust)
for r in M:
    print( "%s %s %s" % (r[0].rjust(15), r[1].rjust(15), r[2].rjust(15)))
    

In [None]:
# Write a table, using formating center()
for r in M:
    print( "%s %s %s" % (r[0].center(15), r[1].center(15), r[2].center(15)))
    

### Zip, enumerate, tuples, lists, and dictionaries (Ch 12)

Some useful functions and idioms to create and traverse lists/dictionaries.


In [166]:
# zip is a function that merges multiple sequences into tuples

a = tuple( range(10) )
b = list( 'abcdefghi' ) 

print('a is', a)
print('b is', b)

list( zip( a, b) )

a is (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
b is ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']


[(0, 'a'),
 (1, 'b'),
 (2, 'c'),
 (3, 'd'),
 (4, 'e'),
 (5, 'f'),
 (6, 'g'),
 (7, 'h'),
 (8, 'i')]

In [167]:
# Let's pretend we want to make a telephone book, given 2 lists: names and numbers.

names = ['Bill Gates', 'Bill Hewlett', 'Dave Packard', 'Michael Dell', 'Steve Jobs']
phonenrs = ['555-987654', '555-555555', '555-888444', '555-101010', '555-123456']

# Create a list of (key, value) pairs
pairs = list(zip(names, phonenrs))
print(pairs)

[('Bill Gates', '555-987654'), ('Bill Hewlett', '555-555555'), ('Dave Packard', '555-888444'), ('Michael Dell', '555-101010'), ('Steve Jobs', '555-123456')]


In [168]:
# Now create a dictionary from these (key, value) pairs.
phonebook = dict( pairs )

print(phonebook)

{'Dave Packard': '555-888444', 'Bill Hewlett': '555-555555', 'Bill Gates': '555-987654', 'Steve Jobs': '555-123456', 'Michael Dell': '555-101010'}


In [169]:
# Or more efficiently without the intermediate step (building the list)

phonebook = dict( zip(names, phonenrs) )
print(phonebook)

{'Dave Packard': '555-888444', 'Bill Hewlett': '555-555555', 'Bill Gates': '555-987654', 'Steve Jobs': '555-123456', 'Michael Dell': '555-101010'}


In [170]:
# zip can also be used to "unzip" a list of tuples into lists of the separate elements.

a = ( 1, 2, 3 )
b = tuple('abc')
c = ('John', 'Mary', 'Bob')

In [171]:
l = list( zip( a, b, c ) )

print(l)

[(1, 'a', 'John'), (2, 'b', 'Mary'), (3, 'c', 'Bob')]


In [173]:
# The '*' indicates that a sequence (list) should be used separately (unpack) in a function call
# Only available in the parameter list when calling a function

a = [1, 2, 3]

print( a )     # a is used as it is: a list
print( *a )    # a is used as its separate entities: equivalent to print( a[0], a[1], a[2])
print( a[0], a[1], a[2] )

[1, 2, 3]
1 2 3
1 2 3


In [174]:
# Now try to implement the reverse of the zip.   

x, y, z = zip( *l )    # The '*' forces the list to be 'unpacked'  

print(x)
print(y)
print(z)

(1, 2, 3)
('a', 'b', 'c')
('John', 'Mary', 'Bob')


In [175]:
# The use of enumerate in a for loop
# If you want to traverse a list but also want to keep track of the index in the list

for i in range(len( names ) ):
    print('Name %d is %s' % ( i, names[i] ))


Name 0 is Bill Gates
Name 1 is Bill Hewlett
Name 2 is Dave Packard
Name 3 is Michael Dell
Name 4 is Steve Jobs


In [176]:
# More elegant using enumerate which returns a tuple (index, value)
for i, name in enumerate( names ):
    print('Name %d is %s' % ( i, name ))


Name 0 is Bill Gates
Name 1 is Bill Hewlett
Name 2 is Dave Packard
Name 3 is Michael Dell
Name 4 is Steve Jobs


### Ch 15-17: Classes (not discussed)

Python supports object-oriented programming. You can create your own classes
to represent data-objects and define your own methods on them. Keeping the interface
to your objects clear (*separation of concerns*).

Unfortunately, there is no time for us to discuss them in this course. Luckily, you will probably not need them for relatively easy analysis tasks. However, for more complex analyses,
defining your own classes may be useful. For now, know they exist and you can read up on them
at a later time.