# Day 2

## Day 2 Agenda
* __`enumerate()/zip()`__
* list comprehensions
* tuples
* dictionaries
* explaining __`this.py`__
* sets
* file I/O

## "Pythonic"

In [1]:
stooges = ['Shemp', 'Moe', 'Larry', 'Curly']

In [4]:
i = 0
for stooge in stooges: # for thing in container
    print(f'index {i} is {stooge}')
    i += 1

index 0 is Shemp
index 1 is Moe
index 2 is Larry
index 3 is Curly


## __`enumerate()`__
* a builtin function which associates an index with each item in an iterable
* returns an _enumerate_ object


In [8]:
for index, stooge in enumerate(stooges):
    print(f'stooge {index} is {stooge}')

stooge 0 is Shemp
stooge 1 is Moe
stooge 2 is Larry
stooge 3 is Curly


In [9]:
type(enumerate(stooges))

enumerate

## __`zip(*iterables)`__ # 0 or more containers
* builtin function which matches up each item in an iterable with the corresponding item in the other iterable(s)
* technically creates an iterator that aggregates elements from each iterable
* why is it called __`zip`__?

In [10]:
first_names = ['Dave', 'Bruce', 'Taylor']
last_names = ['W-S', 'Lee', 'Swift']
employee_nums = [3456, 1, 2]

for first, last, num in zip(first_names, last_names, employee_nums):
    print(first, last, num)

Dave W-S 3456
Bruce Lee 1
Taylor Swift 2


In [11]:
stooges = ['Larry', 'Moe', 'Curly']
marxbros = ['Groucho', 'Harpo', 'Chico', 'Zeppo']
for stooge, marx in zip(stooges, marxbros):
    print(stooge, marx)

Larry Groucho
Moe Harpo
Curly Chico


In [14]:
from itertools import zip_longest # module that helps with iteration
stooges = ['Larry', 'Moe', 'Curly']
marxbros = ['Groucho', 'Harpo', 'Chico', 'Zeppo']

for stooge, marx in zip_longest(stooges, marxbros):
    print(stooge, marx)

Larry Groucho
Moe Harpo
Curly Chico
None Zeppo


# List Comprehensions

## List Comprehensions ("listcomps")
* quick/compact way to build a list
* more readable/faster
* which is easier to read?

In [15]:
fruits = 'apple lemon cherry fig lime watermelon'.split() # Pythonic

In [16]:
fruit_lengths = []

for fruit in fruits:
    fruit_lengths.append(len(fruit))
    
print(fruit_lengths)

[5, 5, 6, 3, 4, 10]


In [17]:
fruit_lengths = [len(fruit) for fruit in fruits]

print(fruit_lengths)

[5, 5, 6, 3, 4, 10]


## List Comprehensions (cont'd)
* listcomps can generate a list from the Cartesian product of two or more iterables

In [19]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L', 'XL']

In [21]:
tshirts = [[color, size] for color in colors
                            for size in sizes]
tshirts

[['black', 'S'],
 ['black', 'M'],
 ['black', 'L'],
 ['black', 'XL'],
 ['white', 'S'],
 ['white', 'M'],
 ['white', 'L'],
 ['white', 'XL']]

In [23]:
string = 'alphabet soup tastes great!'

In [24]:
print(list(string))

['a', 'l', 'p', 'h', 'a', 'b', 'e', 't', ' ', 's', 'o', 'u', 'p', ' ', 't', 'a', 's', 't', 'e', 's', ' ', 'g', 'r', 'e', 'a', 't', '!']


In [26]:
# generate a list of all the consonants in a
# string, discarding vowels and spaces

consonants = [char for char in string
                       if char not in 'aeiou ']
print(consonants)

['l', 'p', 'h', 'b', 't', 's', 'p', 't', 's', 't', 's', 'g', 'r', 't', '!']


## Lab: List Comprehensions
*  Start with Cartesian product example (colors x sizes of t-shirts) and add a third list, __`sleeves = ['short', 'long']`__ then write a new listcomp which generates the Cartesian product __`colors x sizes x sleeves`__. __`tshirts`__ should look like this:<pre><b>
    [['black', 'S', 'short'],
     ['black', 'S', 'long'],
     ['black', 'M', 'short'],
     ['black', 'M', 'long'],
     ['black', 'L', 'short'],
     ['black', 'L', 'long'],
     ['white', 'S', 'short'],
     ['white', 'S', 'long'],
     ['white', 'M', 'short'],
     ['white', 'M', 'long'],
     ['white', 'L', 'short'],
     ['white', 'L', 'long']]
     
 </b></pre>
* Use a list comprehension to create a list of the squares of the integers from 1 to 25 (i.e, 1, 4, 9, 16, …, 625)
* Given a list of words, create a second list which contains all the words from the first list which do not end with a vowel
* Use a list comprehension to create a list of the integers from 1 to 100 which are not divisible by 5
* Use a list comprehension and __`zip()`__ to create a list of lists, where the list items are name and ID number that you grabbed from separate lists of names and ID numbers
  * start with a list of, say, 5 names ['John', 'Mary', 'Edward', 'Linda', 'Dinesh']
  * and a list of, say, 5 ID numbers [1003, 2043, 8762, 7862, 1093]
  * additional wrinkle: do not include any names whose corresponding ID is -1

## listcomps recap
* keep them short
* they are not _list incomprehensions_, so keep them simple
* use line breaks since they are ignored inside [] (and (), {}) and you therefore don't need the ugly '\\' line continuation character
* note that __`for`__ loops do many things (e.g., scan a sequence to count or select items), computing aggregates (sum, averages) or any number of other processing tasks
  * in contrast, listcomps do ONE thing–generate lists!

# Tuples

## Tuples
* immutable data type
* typically heterogeneous (cf. lists)
* generally imply some structure
 * tuples typically represent a single object, but multiple aspects/attributes of it
 * if lists are typically used like the columns of a spreadsheet...
   * then tuples are typically the rows...

In [40]:
t = () # empty tuple (cf. empty list...[])
t

SyntaxError: invalid syntax (<ipython-input-40-f446e74bb837>, line 1)

In [28]:
type(t)

tuple

In [38]:
t = (1,) # singleton tuple

In [39]:
t

(1,)

In [41]:
t = 'Jones', 'John', 1023, True # no parens
t

('Jones', 'John', 1023, True)

In [45]:
# tuple unpacking
last_name, first_name, employee_num, full_time = t

In [46]:
employee_num # type(employee_num)

1023

In [48]:
something = input('Enter something: ')
as_a_list = something.split() # split() always returns a list
as_a_tuple = tuple(as_a_list) # tuple() always returns a tuple

Enter something: apple fig poear


In [49]:
print(as_a_list, as_a_tuple, sep='\n')

['apple', 'fig', 'poear']
('apple', 'fig', 'poear')


In [50]:
person = 'Sara Breedlove', 1867, 'Louisiana'

In [51]:
person[1]

1867

In [52]:
person[1] = 1868

TypeError: 'tuple' object does not support item assignment

In [53]:
# a tuple may contain a mutable object...

person = 'Curie', 'Marie', 1867, []
person

('Curie', 'Marie', 1867, [])

In [54]:
person[-1].extend('physicist chemist'.split())

## Lab: Tuples
* Given a list of words, sort them by length of word, rather than alphabetically.
* To do this, first create a list of tuples of the form (len, word), where the first element is the length of the word.
* Next, sort the tuples.
* Finally, extract the words from the list of tuples into a new list which is now sorted by length of word. Try to use a list comprehension if you can.

## Recap: Tuples
* not just "constant lists" 
 (see http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists)
* remember that lists are (typically) ordered sequences of homogeneous values (i.e., Excel/DB column)
* and tuples typically imply some structure and refer to multiple attributes of ONE item (person, country, building, etc.)
 * i.e., database/Excel row

# Dictionaries



# Dictionaries
* "unordered" grouping of key/value pairs
* sometimes called a "map", "hashmap", or "associative array"

In [64]:
d = {} # empty dict

In [65]:
d = { 'X': 10, 'V': 5, 'I': 1 } # can be initialized when declared

In [67]:
d

{'X': 10, 'V': 5, 'I': 1}

In [68]:
d['L'] = 50 # add something to the dict
print(d)

{'X': 10, 'V': 5, 'I': 1, 'L': 50}


In [70]:
# iterating through a dict iterates through the keys 
for key in d: # for thing in container
    print(key, end=' ')

X V I L 

In [71]:
# ...of course we can print the values while iterating
for thing in d:
    print(thing, d[thing])

X 10
V 5
I 1
L 50


In [78]:
sbux_dict = {'venti': 20, 'tall': 12, 'grande': 16}
print(sbux_dict)

{'venti': 20, 'tall': 12, 'grande': 16}


In [73]:
print(sbux_dict.keys(), sbux_dict.values(),
      sbux_dict.items(), sep='\n')

dict_keys(['venti', 'tall', 'grande'])
dict_values([20, 12, 16])
dict_items([('venti', 20), ('tall', 12), ('grande', 16)])


In [74]:
total_ounces = 0
for amount in sbux_dict.values():
    total_ounces += amount

total_ounces

48

In [75]:
sum(sbux_dict.values())

48

## Dictionaries: View Objects
* __`keys()`__, __`values()`__, and __`items()`__ are view objects
* view objects provide a dynamic window into the dictionary

In [79]:
keys = sbux_dict.keys()
keys

dict_keys(['venti', 'tall', 'grande'])

In [80]:
# keys will change automagically after we add to the dict
print(keys)
sbux_dict['trenta'] = 31
print(keys)

dict_keys(['venti', 'tall', 'grande'])
dict_keys(['venti', 'tall', 'grande', 'trenta'])


In [77]:
keys

dict_keys(['venti', 'tall', 'grande', 'trenta'])

## Dictionaries: __`enumerate()`__
* because dicts are unordered, __`enumerate()`__ isn't all that useful

In [81]:
for thing in sbux_dict: # dict iteration returns keys
    print(thing)

venti
tall
grande
trenta


In [82]:
for index, val in enumerate(sbux_dict):
    print('index', index, 'is', val)

index 0 is venti
index 1 is tall
index 2 is grande
index 3 is trenta


In [83]:
for thing in sbux_dict:
    print(thing, sbux_dict[thing])

venti 20
tall 12
grande 16
trenta 31


In [84]:
# We can iterate through the dict items, but remember that dict
# is unordered...
for key, val in sbux_dict.items(): # tuple unpacking
    print(key, '=>', val)

venti => 20
tall => 12
grande => 16
trenta => 31


# __`get()`__/__`setdefault()`__: Dealing with missing dict values

In [91]:
d = {'foo': 'bar'}

In [86]:
d['foo']

'bar'

In [87]:
d['foot']

KeyError: 'foot'

In [92]:
if 'foot' in d: # is 'foot' a key in this dict
    print(d['foot'])
# or just... d.get('foot')

In [93]:
print(d.get('foot'))

None


In [94]:
d.setdefault('foo', 23) # get the value of 'foo' or add 'foo' 
# to dict with value = 23
# if 'foo' in d:
#     return d['foo']
# else:
#     d['foo'] = 23
#     return 23

'bar'

In [95]:
d

{'foo': 'bar'}

In [96]:
print(d.setdefault('foot', 23))
print(d)

23
{'foo': 'bar', 'foot': 23}


In [99]:
# what if we sort a dict?
for key in sorted(sbux_dict):
    print(key, sbux_dict[key])

grande 16
tall 12
trenta 31
venti 20


In [100]:
# In order to iterate in order, we have to sort the
# dict by value (as opposed to key)
# By default, sorted() will sort by key--
# usually not what we want!

for k in sorted(sbux_dict, key=sbux_dict.get):
    print(k, '=>', sbux_dict[k])

tall => 12
grande => 16
venti => 20
trenta => 31


## Removing items from a dict
* __`del`__ = remove an item from the dict
* __`dict.pop(key)`__ = remove item and return value
* __`dict.clear()`__ = empty out the dict

In [105]:
mydict = {'trenta': 31, 'grande': 16, 'venti': 20,
          'tall': 12}
print(mydict)

{'trenta': 31, 'grande': 16, 'venti': 20, 'tall': 12}


In [106]:
del mydict['trenta']
print(mydict)

{'grande': 16, 'venti': 20, 'tall': 12}


In [107]:
print(mydict.pop('venti'))

20


In [108]:
print(mydict)

{'grande': 16, 'tall': 12}


In [109]:
mydict.clear()
mydict

{}

## Lab: dictionary
* use a dict to translate Roman numerals into their Arabic equivalents
1. load the dict with Roman numerals M (1000), D (500), C (100), L (50), X (10), V (5), I (1)
2. read in a Roman numeral
3. print Arabic equivalent
4. try it with MCLX = 1000 + 100 + 50 + 10 = 1160
4. __If you have time, deal with the case where a smaller number precedes a larger number, e.g., XC = 100 - 10 = 90, or MCM = 1000 + (1000-100) = 1900__
4. __MCMXCIX = 1999__

## Dict Comprehension
* like a listcomp, a dictcomp creates a dict quickly

In [110]:
names = ['Sally', 'Bob', 'Martha', 'Dirk']
employee_ids = [345, 286, 453, 119]
id_dict = { name: emp_id + 1000
                   for name, emp_id in zip(names, employee_ids)}
print(id_dict)

{'Sally': 1345, 'Bob': 1286, 'Martha': 1453, 'Dirk': 1119}


In [111]:
d = { 'foo': 4, 'bar': -1, 'baz': -1, 'blah': 3, 'what': 2 }
print(d)

{'foo': 4, 'bar': -1, 'baz': -1, 'blah': 3, 'what': 2}


In [112]:
d.items()

dict_items([('foo', 4), ('bar', -1), ('baz', -1), ('blah', 3), ('what', 2)])

In [113]:
d = { key: val for key, val in d.items()
               if val != -1 }
print(d)

{'foo': 4, 'blah': 3, 'what': 2}


In [114]:
id_dict_inverse = { val : key for key, val in id_dict.items() }

In [115]:
id_dict_inverse

{1345: 'Sally', 1286: 'Bob', 1453: 'Martha', 1119: 'Dirk'}

## Now we understand this code!

In [116]:
s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr."""

d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.


# Sets

## Sets
* unordered collection, no duplicates
* kind of a one-trick pony–remove duplicates

In [120]:
s = { 'Annie', 'Betty', 'Cathy', 'Donna' }
print(s)

{'Annie', 'Betty', 'Donna', 'Cathy'}


In [121]:
s.add('Ellen')
print(s)

{'Cathy', 'Ellen', 'Donna', 'Annie', 'Betty'}


In [123]:
s.add('Annie')
print(s)

{'Cathy', 'Ellen', 'Donna', 'Annie', 'Betty'}


In [124]:
# we can use the 'in' operator
if 'Annie' in s:
    print('Yep!')

Yep!


## Deleting from a Set
* __`remove(item)`__: remove an item if it's in the set
* __`discard(item)`__: remove an item whether or not it's in the set
* __`pop()`__: pops a random element out of the set

In [125]:
print(s)

{'Cathy', 'Ellen', 'Donna', 'Annie', 'Betty'}


In [127]:
s.remove('Betty')

KeyError: 'Betty'

In [128]:
print(s)

{'Cathy', 'Ellen', 'Donna', 'Annie'}


In [130]:
s.discard('Loren')

In [131]:
print(s)

{'Cathy', 'Ellen', 'Donna', 'Annie'}


In [132]:
print(s.pop())
print(s)

Cathy
{'Ellen', 'Donna', 'Annie'}


In [133]:
while s: # while the set is non-empty
    print(s.pop())

Ellen
Donna
Annie


## sets (cont'd)

In [134]:
even = set(range(2, 11, 2))
odd = set(range(1, 10, 2))
print(even, odd, sep='\n')

{2, 4, 6, 8, 10}
{1, 3, 5, 7, 9}


In [135]:
prime = {2, 3, 5, 7}
prime & odd

{3, 5, 7}

In [136]:
prime & even

{2}

In [137]:
odd | even

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [138]:
prime - even

{3, 5, 7}

In [139]:
prime ^ odd

{1, 2, 9}

## sets + dicts

In [None]:
movies = {
    'Die Hard': { 'Bruce Willis', 'Alan Rickman', 'Bonnie Bedelia' },
    'The Sixth Sense' : { 'Toni Collete', 'Bruce Willis', 'Donnie Wahlberg' },
    'The Hunt for Red October' : { 'Sean Connery', 'Alec Baldwin' },
    'The Highlander': { 'Christopher Lambert', 'Sean Connery' },
    '16 Blocks': { 'Bruce Willis', ' Yasiin Bey', 'David Morse' }
}

In [None]:
for title, stars in movies.items():
    if 'Bruce Willis' in stars:
        print(title)

In [None]:
for title, stars in movies.items():
    if stars & { 'Alan Rickman', 'Sean Connery' }:
        print(title)

## Subsets

In [None]:
set1 = { 1, 2, 3 }
set2 = { 1, 2, 3, 5, 7, 9 }

In [None]:
set1 <= set2 # <= means "subset"

In [None]:
set1 <= set1 # a set is always a subset of itself

In [None]:
set1 < set1 # but a set is never a proper subset of itself

In [None]:
set1 < set2 # set1 is a proper subset of set2 because set2 has all of set1 *and more*

## Lab: Sets
* Use a set to find all of the unique words in the input and print them out in sorted order
* If the user entered __There is no there there__, your program should print out 
   <pre><b>
   is
   no
   there
   </b></pre>
* Note that `There` and `there` should be counted as the same word.

## Sets Recap
* unordered
* no duplicates
* operators &, |, -, ^
* use __`in`__ to test for membership
* subset vs. proper subset



# File I/O

## File I/O
* __`fileobj = open(filename, mode)`__
* mode is one or two letters
  * r = read
  * r+ = open for reading and writing
  * w = write (create/overwrite)
  * x = write, but only if file does not already exist
  * a = append, if file exists (unless a+, then create)
* second letter =
  * t = text file (default)
  * b = binary
* __`fileobj.close()`__

## File I/O: Open/Close

In [141]:
f = open('test.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: 'test.txt'

In [142]:
f = open('test.txt', 'w')
f.close()

In [145]:
!ls -l test.txt

-rw-r--r--  1 dws  staff  0 Oct 28 08:23 test.txt


In [146]:
f = open('test.txt', 'x')

FileExistsError: [Errno 17] File exists: 'test.txt'

## File I/O: Read/Write

In [147]:
poem = """TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference."""

len(poem)

729

In [148]:
f = open('poem.txt', 'w')
f.write(poem)

729

In [149]:
f.close()

In [150]:
f = open('poem.txt')
poem2 = f.read()
f.close()

In [151]:
poem == poem2

True

## File I/O: __`write()`__ vs. __`print()`__


In [160]:
f = open('poem.txt', 'w')
# another example of why print being a function is good
print(poem, file=f) 
f.close()

In [157]:
f = open('poem.txt')
poem2 = f.read()
f.close()

In [158]:
poem == poem2

True

In [159]:
len(poem2)

729

## __`print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)`__
* __`sep`__ = separator (default is space)
* __`end`__ = what to print at end (default is newline)
* __`file`__ = where to print, default is screen
* __`flush`__ = whether to flush output buffer, default is no

## File I/O: How to Read Data
* __`read()`__: slurps up entire file at once
  * __`read(x)`__ reads a most __`x`__ bytes
* __`readline()`__: reads a line at a time
* __`readlines()`__ reads a line at a time and returns the lines as a list of strings
* or use an iterator…

In [167]:
poem = ''
f = open('poem.txt')
for line in f: # Python reads each line
    poem += line
f.close()

In [168]:
print(poem)

TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.



## File I/O: __`with`__ statement
* the __`with`__ statement sets up a temporary "context" and closes the file automatically so we don't have to bother with closing it

In [178]:
with open('poem.txt') as f1: # ~ f1 = open('poem.txt')
    poem2 = f1.read()
    # at this point file is open
    print('in with statement, f1.closed =', f1.closed)

in with statement, f1.closed = False


In [173]:
poem == poem2

True

In [179]:
f1.closed

True

## Quick Lab: File I/O
* write a Python program which prompts the user for a filename, then opens that file and writes the contents of the file to a new file, in reverse order, i.e.,

<pre><b>
    Original file       Reversed file
    Line 1              Line 4
    Line 2              Line 3
    Line 3              Line 2
    Line 4              Line 1
</b></pre>

## Lab: File I/O + dicts
* write a Python program to read a file and count the number of occurrences of each word in the file
* use a __`dict`__, indexed by word, to count the occurrences
* remember __`d.get(key)`__ will return __`None`__ if there is no such key in the dict (vs. __`d[key]`__ which will throw an exception) and also the __`in`__ operator
  * or use a __`collections.defaultdict`__ if we've covered it
* treat __The__ and __the__ as the same word when counting
* print out words and counts, from most common to least common
* EXTRA: remove punctuation, so __Hamlet,__ == __Hamlet__ # refer back to "import this"
* Road Not Taken and Hamlet are in your materials

## File I/O: recap
* __`open()`__ returns file object
* __`close()`__ closes the file
* __`read()`__ reads bytes
* __`readline()`__ reads a line at a time
* __`readlines()`__ reads all lines–shouldn't be used
* can also iterate through a file object a line at a time
* __`with`__ statement sets up a temporary context (block) for file I/O and automatically closes file when block is exited

# End of Day 2