## Part 2 Agenda
* __`enumerate()/zip()`__
* list comprehensions
* tuples
* dictionaries
* sets
* file I/O

## "Pythonic" ... redux

In [1]:
cars = ['Tesla', 'Fisker', 'Rivian', 'Lordstown']

In [2]:
index = 0
for car in cars: # for thing in container
    print('index', index, 'is', car)
    index += 1

index 0 is Tesla
index 1 is Fisker
index 2 is Rivian
index 3 is Lordstown


## __`enumerate()`__
* a builtin function which associates an index with each item in an container
* returns a special object that "gives up" the index and item at that index...
    * ...each time we "knock on its door"

In [3]:
for index, car in enumerate(cars):
    print('car maker', index, 'is', car)

car maker 0 is Tesla
car maker 1 is Fisker
car maker 2 is Rivian
car maker 3 is Lordstown


In [5]:
for index, car in enumerate(cars, start=1):
    print('car maker', index, 'is', car)

car maker 1 is Tesla
car maker 2 is Fisker
car maker 3 is Rivian
car maker 4 is Lordstown


## __`zip(*iterables)`__
* __*iterables__ means "0+ iterables (containers)"
* builtin function which matches up each item in an iterable with the corresponding item in the other iterable(s)
* why is it called __`zip`__?

In [6]:
first_names = ['Katherine', 'Bruce', 'Taylor']
last_names = ['Johnson', 'Lee', 'Swift']

for first, last in zip(first_names, last_names):
    print(first, last)

Katherine Johnson
Bruce Lee
Taylor Swift


In [7]:
first_names = ['Katherine', 'Bruce', 'Taylor']
last_names = ['Johnson', 'Lee', 'Swift']
employee_nums = [3456, 1234, 2468]

for first, last, num in zip(first_names, last_names, employee_nums):
    print(first, last, num)

Katherine Johnson 3456
Bruce Lee 1234
Taylor Swift 2468


In [11]:
first_names = ['Katherine', 'Bruce', 'Taylor']
last_names = ['Johnson', 'Lee', 'Swift', 'Frost']

for first, last in zip(first_names, last_names):
    print(first, last)

Katherine Johnson
Bruce Lee
Taylor Swift


In [10]:
import itertools # module that helps with iteration

first_names = ['Katherine', 'Bruce', 'Taylor']
last_names = ['Johnson', 'Lee', 'Swift', 'Frost']

for first, last in itertools.zip_longest(first_names, last_names, fillvalue='***'):
    print(first, last)

Katherine Johnson
Bruce Lee
Taylor Swift
*** Frost


In [16]:
first_names = ['Katherine', 'Bruce', 'Taylor']
last_names = ['Johnson', 'Lee', 'Swift', 'Frost']

for first, last in zip(first_names, last_names, strict=True):
    print(first, last)

Katherine Johnson
Bruce Lee
Taylor Swift


ValueError: zip() argument 2 is longer than argument 1

## List Comprehensions ("listcomps")
* quick/compact way to build a list
* "more readable"/faster
* which is easier to read?
  * your answer will change over time...

In [20]:
# suppose we have a list of fruits
# rather than typing it in the "standard" way, we'll use a Pythonic shortcut
fruits = 'apple lemon cherry fig lime watermelon'.split() 
fruits

['apple', 'lemon', 'cherry', 'fig', 'lime', 'watermelon']

#### Now suppose we want a "parallel" list containing the lengths of each fruit string
* first we'll create that list the standard way...

In [21]:
fruit_lengths = [] 

for fruit in fruits:
    fruit_lengths.append()
    
print(fruit_lengths)

[5, 5, 6, 3, 4, 10]


* and now with a list comprehension...

In [23]:
fruit_lengths = [len(fruit) for fruit in fruits]

print(fruit_lengths)

[5, 5, 6, 3, 4, 10]


## List Comprehensions (cont'd)
* listcomps can generate a list from the Cartesian product of two or more iterables

In [25]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L', 'XL']

In [26]:
tshirts = [[color, size] for size in sizes
                             for color in colors]
tshirts

[['black', 'S'],
 ['white', 'S'],
 ['black', 'M'],
 ['white', 'M'],
 ['black', 'L'],
 ['white', 'L'],
 ['black', 'XL'],
 ['white', 'XL']]

* we can also use list comprehensions to *filter* one list into another

In [27]:
string = 'alphabet soup tastes great!'

In [29]:
print(list(string))

['a', 'l', 'p', 'h', 'a', 'b', 'e', 't', ' ', 's', 'o', 'u', 'p', ' ', 't', 'a', 's', 't', 'e', 's', ' ', 'g', 'r', 'e', 'a', 't', '!']


#### suppose we wanted to generate a list of all the consonants in a string, discarding vowels and spaces...

In [31]:
consonants = [char for char in string
                          if char not in 'aeiou! ']
print(consonants)

['l', 'p', 'h', 'b', 't', 's', 'p', 't', 's', 't', 's', 'g', 'r', 't']


## Lab: List Comprehensions
*  Start with Cartesian product example (colors x sizes of t-shirts) and a
*  add a third list, __`sleeves = ['short', 'long']`__ then write a new listcomp which generates the Cartesian product __`colors x sizes x sleeves`__. __`tshirts`__ should look like this:<pre><b>
    [['black', 'S', 'short'],
     ['black', 'S', 'long'],
     ['black', 'M', 'short'],
     ['black', 'M', 'long'],
     ['black', 'L', 'short'],
     ['black', 'L', 'long'],
     ['white', 'S', 'short'],
     ['white', 'S', 'long'],
     ['white', 'M', 'short'],
     ['white', 'M', 'long'],
     ['white', 'L', 'short'],
     ['white', 'L', 'long']]
     
 </b></pre>
* Use a list comprehension to create a list of the squares of the integers from 1 to 25 (i.e, 1, 4, 9, 16, …, 625)
* Given a list of words, create a second list which contains all the words from the first list which do not end with a vowel
* Use a list comprehension to create a list of the integers from 1 to 100 which are not divisible by 5
* Use a list comprehension and __`zip()`__ to create a list of lists, where the list items are name and ID number that you grabbed from separate lists of names and ID numbers
  * start with a list of, say, 5 names ['John', 'Mary', 'Edward', 'Linda', 'Dinesh']
  * and a list of, say, 5 ID numbers [1003, 2043, 8762, 7862, 1093]
  * additional wrinkle: do not include any names whose corresponding ID is -1

## listcomps recap
* keep them short
* they are not _list incomprehensions_, so keep them simple
* use line breaks since they are ignored inside [] (and (), {}) and you therefore don't need the ugly '\\' line continuation character
* note that __`for`__ loops do many things (e.g., scan a sequence to count or select items), computing aggregates (sum, averages) or any number of other processing tasks
  * in contrast, listcomps do ONE thing–generate lists!

## Tuples
* immutable data type
* typically heterogeneous (cf. lists)
* generally imply some structure
 * tuples typically represent a single object, but multiple aspects/attributes of it
 * if lists are typically used like the __columns__ of a spreadsheet...
   * then tuples are typically the __rows__...

In [32]:
t = () # empty tuple (cf. empty list...[])
t

()

In [33]:
type(t)

tuple

In [37]:
t = (3,) # "singleton tuple"

In [38]:
t

(3,)

In [50]:
t = 'Jones', 'John', 1023, True # no parens
t

('Jones', 'John', 1023, True)

In [44]:
# tuple unpacking
last_name, first_name, employee_num, full_time = t

In [41]:
employee_num # type(employee_num)

1023

In [45]:
something = input('Enter something: ')
as_a_list = something.split() # split() always returns a list
as_a_tuple = tuple(as_a_list) # tuple() always returns a tuple

Enter something:  Benioff Marc CEO 1997


In [46]:
print(as_a_list, as_a_tuple, sep='\n')

['Benioff', 'Marc', 'CEO', '1997']
('Benioff', 'Marc', 'CEO', '1997')


In [47]:
person = 'Sara Breedlove', 1867, 'Louisiana'

In [48]:
person[-1]

'Louisiana'

In [49]:
person[1] = 1868

TypeError: 'tuple' object does not support item assignment

## Lab: Tuples
* We don't really know enough yet to use a tuple in interesting ways, so instead let's just tinker around with tuples here in the notebook...
  * Create a tuple representing a city w/fields of your own choosing (e.g., city name, state/country, population, elevation, etc.)
  * "Add" a field to the tuple–since tuples are immutable, you will have to do this by concatenating tuples
  * Using the _in_ operator, check to see if a particular value is in the tuple
  * Using the __`.index()`__ method, find the position of a particular value in the tuple

## Recap: Tuples
* not just "constant lists" 
 (see http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists)
* remember that lists are (typically) ordered sequences of homogeneous values (i.e., Excel/DB column)
* and tuples typically imply some structure and refer to multiple attributes of ONE item (person, country, building, etc.)
   * i.e., database/Excel row


# Dictionaries
* "unordered" grouping of key/value pairs
* sometimes called a "map", "hashmap", or "associative array"

In [51]:
d = {} # empty dict

In [52]:
d = { 'X': 10, 'V': 5, 'I': 1 } # can be initialized when declared

In [53]:
d

{'X': 10, 'V': 5, 'I': 1}

In [54]:
d['L'] = 50 # add something to the dict
print(d)

{'X': 10, 'V': 5, 'I': 1, 'L': 50}


In [55]:
# iterating through a dict iterates through the keys 
for thing in d: # for thing in container
    print(thing, end=' ')

X V I L 

In [56]:
# ...of course we can print the values while iterating
for thing in d:
    print(thing, d[thing])

X 10
V 5
I 1
L 50


In [57]:
sbux_dict = {'venti': 20, 'tall': 12, 'grande': 16}
print(sbux_dict)

{'venti': 20, 'tall': 12, 'grande': 16}


In [58]:
print(sbux_dict.keys(), sbux_dict.values(), sbux_dict.items(), sep='\n')

dict_keys(['venti', 'tall', 'grande'])
dict_values([20, 12, 16])
dict_items([('venti', 20), ('tall', 12), ('grande', 16)])


In [59]:
total_ounces = 0
for amount in sbux_dict.values():
    total_ounces += amount

total_ounces

48

In [60]:
sum(sbux_dict.values())

48

## Dictionaries: View Objects
* __`keys()`__, __`values()`__, and __`items()`__ are view objects
* view objects provide a dynamic window into the dictionary
  * these objects change as the dictionary changes!

In [61]:
keys = sbux_dict.keys()
keys

dict_keys(['venti', 'tall', 'grande'])

In [62]:
# keys will change automagically after we add to the dict
print(keys)
sbux_dict['trenta'] = 31
print(keys)

dict_keys(['venti', 'tall', 'grande'])
dict_keys(['venti', 'tall', 'grande', 'trenta'])


In [63]:
keys

dict_keys(['venti', 'tall', 'grande', 'trenta'])

## __`get()`__: Dealing with missing dict values

In [64]:
d = {'foo': 'bar'}

In [65]:
d['foo']

'bar'

In [66]:
d['food']

KeyError: 'food'

In [69]:
if 'foo' in d: # is 'foot' a key in this dict
    print(d['foo'])
# or just... d.get('foot')

bar


In [71]:
print(d.get('foot'))

None


In [72]:
# what if we sort a dict?
for key in sorted(sbux_dict):
    print(key, sbux_dict[key])

grande 16
tall 12
trenta 31
venti 20


In [73]:
# In order to iterate in order, we have to sort the
# dict by value (as opposed to key)
# By default, sorted() will sort by key--
# usually not what we want!

for k in sorted(sbux_dict, key=sbux_dict.get):
    print(k, '=>', sbux_dict[k])

tall => 12
grande => 16
venti => 20
trenta => 31


## Removing items from a dict
* __`del`__ = remove an item from the dict
* __`dict.pop(key)`__ = remove item and return value
* __`dict.clear()`__ = empty out the dict

In [76]:
mydict = {'trenta': 31, 'grande': 16, 'venti': 20, 'tall': 12}
print(mydict)

{'trenta': 31, 'grande': 16, 'venti': 20, 'tall': 12}


In [77]:
del mydict['trenta']
print(mydict)

{'grande': 16, 'venti': 20, 'tall': 12}


In [78]:
print(mydict.pop('venti'))

20


In [79]:
print(mydict)

{'grande': 16, 'tall': 12}


In [80]:
mydict.clear()
mydict

{}

## Lab: dictionary
* use a dict to translate Roman numerals into their Arabic equivalents
1. load the dict with Roman numerals M (1000), D (500), C (100), L (50), X (10), V (5), I (1)
2. read in a Roman numeral
3. print Arabic equivalent
4. try it with MCLX = 1000 + 100 + 50 + 10 = 1160
4. __The rest of this could be homework...__
4. __Deal with the case where a smaller number precedes a larger number, e.g., XC = 100 - 10 = 90, or MCM = 1000 + (1000-100) = 1900__
  * e.g.,  __MCMXCIX = 1999__

## Sets
* unordered collection, no duplicates
* kind of a one-trick pony–remove duplicates

In [81]:
s = { 'Annie', 'Betty', 'Cathy', 'Donna' }
print(s)

{'Betty', 'Donna', 'Annie', 'Cathy'}


In [82]:
s.add('Ellen')
print(s)

{'Donna', 'Ellen', 'Betty', 'Annie', 'Cathy'}


In [86]:
s.add('Annie')
print(s)

{'Donna', 'Ellen', 'Betty', 'Annie', 'Cathy'}


In [None]:
# we can use the 'in' operator
if 'Annie' in s:
    print('Yep!')

## Deleting from a Set
* __`remove(item)`__: remove an item if it's in the set
* __`discard(item)`__: remove an item whether or not it's in the set
* __`pop()`__: pops a random element out of the set

In [87]:
print(s)

{'Donna', 'Ellen', 'Betty', 'Annie', 'Cathy'}


In [90]:
s.remove('Betty')

KeyError: 'Betty'

In [89]:
print(s)

{'Donna', 'Ellen', 'Annie', 'Cathy'}


In [94]:
s.discard('Loren')

In [95]:
print(s)

{'Donna', 'Ellen', 'Annie', 'Cathy'}


In [96]:
print(s.pop())
print(s)

Donna
{'Ellen', 'Annie', 'Cathy'}


In [97]:
while s: # while the set is non-empty
    print(s.pop())

Ellen
Annie
Cathy


## Lab: Sets
* Use a set to find all of the unique words in the input and print them out in sorted order
* If the user entered __There is no there there__, your program should print out 
   <pre><b>
   is
   no
   there
   </b></pre>
* Note that `There` and `there` should be counted as the same word.

## Sets Recap
* unordered
* no duplicates
* use __`in`__ to test for membership


## File Input/Output (I/O)
* we use a built-in function to open files:
  * __`fileobj = open(filename, mode)`__
  * mode is one or two letters
    * r = read
    * r+ = open for reading and writing
    * w = write (create/overwrite)
    * x = write, but only if file does not already exist
    * a = append, if file exists (unless a+, then create)
  * second letter =
    * t = text file (default)
    * b = binary
* ...but we use a method to close the file (and do everything else)
  * __`fileobj.close()`__

## File I/O: Open/Close

In [98]:
f = open('test.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: 'test.txt'

In [99]:
f = open('test.txt', 'w')
f.close()

In [100]:
f = open('test.txt', 'x')

FileExistsError: [Errno 17] File exists: 'test.txt'

## File I/O: Read/Write

In [101]:
poem = """TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference."""

len(poem)

729

In [102]:
f = open('poem.txt', 'w')
f.write(poem)

729

In [103]:
f.close()

In [104]:
f = open('poem.txt')
poem2 = f.read()
f.close()

In [105]:
poem == poem2

True

## File I/O: __`write()`__ vs. __`print()`__


In [108]:
f = open('poem.txt', 'w')
# another example of why print being a function is good
print(poem, file=f, end='')
f.close()

In [109]:
f = open('poem.txt')
poem2 = f.read()
f.close()

In [110]:
poem == poem2

True

In [111]:
len(poem2)

729

## File I/O: How to Read Data
* __`read()`__: slurps up entire file at once
  * __`read(x)`__ reads a most __`x`__ bytes
* __`readline()`__: reads a line at a time
* __`readlines()`__ reads a line at a time and returns the lines as a list of strings
* or better yet, just let Python do the work!


In [112]:
f = open('poem.txt') # again, for reading because no second arg

for line in f: # Python reads each line
    print(line, end='')
#
# ...
# 
# 

f.close()

TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

In [113]:
print(poem)

TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.


## File I/O: __`with`__ statement
* the __`with`__ statement sets up a temporary "context" 
  * closes the file automatically so we don't have to bother doing it

In [114]:
with open('poem.txt') as inputfile: # ~ inputfile = open('poem.txt')
    for line in inputfile:
        print(line, end='')
    # at this point file is open
    print('\nin with block, inputfile.closed =', inputfile.closed)

# down here...

TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
in with block, inputfile.closed = False


In [115]:
poem == poem2

True

In [116]:
inputfile.closed

True

## Quick Lab: File I/O
* write a Python program which prompts the user for a filename, then opens that file and writes the contents of the file to a new file, in reverse order, i.e.,

<pre><b>
    Original file       Reversed file
    Line 1              Line 4
    Line 2              Line 3
    Line 3              Line 2
    Line 4              Line 1
</b></pre>

## Group Lab: File I/O + dicts
* write a Python program to read a file and count the number of occurrences of each word in the file
* use a __`dict`__, indexed by word, to count the occurrences
* remember __`d.get(key)`__ will return __`None`__ if there is no such key in the dict (vs. __`d[key]`__ which will throw an exception) and also the __`in`__ operator
  * or use a __`collections.defaultdict`__ if we've covered it
* treat __The__ and __the__ as the same word when counting
* print out words and counts, from most common to least common
* __EXTRA:__ remove punctuation, so __Hamlet,__ == __Hamlet__ # refer back to "import this"
* The Road Not Taken and Hamlet are in your materials

## File I/O: recap
* __`open()`__ returns file object
* __`close()`__ closes the file
* __`read()`__ reads bytes
* __`readline()`__ reads a line at a time
* __`readlines()`__ reads all lines–shouldn't be used
* can also iterate through a file object a line at a time
* __`with`__ statement sets up a temporary context (a separate block) for file I/O and automatically closes file when block is exited