# A Story of Two Collections

- List - a linear collection of values that stay in order
- Dictionary - a "bag" of values, each with its own label

# Dictionaries

- dictionaries are Python's most powerful data collection
- dictionaries allow us to do fast database-like operations in Python
- dictionaries have different names in different languages -> Associative Arrays - Perl/PHP; Properties or Map or HashMap - Java; Property Bag - C#/.Net

- lists index their entries based on the position in the list
- Dictionaries are like bags - no order
- so we index the things we put in the dictionary with a "lookup tag"

In [1]:
purse = dict()
purse['money'] = 12
purse['candy'] = 3
purse['tissues'] = 75
print(purse)

{'money': 12, 'candy': 3, 'tissues': 75}


In [2]:
print(purse['candy'])

3


In [3]:
purse['candy'] = purse['candy'] + 2
print(purse)

{'money': 12, 'candy': 5, 'tissues': 75}


- dictionaries are like lists except that they use keys instead of numbers to look up values

# Dictionary Literals (Constants)

- dictionary literals use curly braces and have a list of key:value pairs
- you can make an empty dictionary using empty curly braces

In [4]:
jjj = {'chuck': 1, 'fred': 42, 'jan':100}
print(jjj)

{'chuck': 1, 'fred': 42, 'jan': 100}


In [5]:
ooo = { }
print(ooo)

{}


# Many Counters with a Dictionary

- one common use of dictionaries is counting how often we "see" something

In [6]:
ccc = dict()
ccc['csev'] = 1
ccc['cwen'] = 1
print(ccc)

{'csev': 1, 'cwen': 1}


In [7]:
ccc['cwen'] = ccc['cwen'] + 1
print(ccc)

{'csev': 1, 'cwen': 2}


# Dictionary Tracebacks

- it is an error to reference a key which is not in the dictionary
- we can use the in opoerator to see if a key is in the dictionary

In [8]:
abc = dict()
print(abc['ccc'])

KeyError: 'ccc'

In [9]:
'ccc' in abc

False

In [11]:
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names:
    if name not in counts :
        counts[name] = 1
        continue
    counts[name] = counts[name] + 1
        
print(counts)

{'csev': 2, 'cwen': 2, 'zqian': 1}


# The "get" method for Dictionaries

- The pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common that there is a method called get() that does this for us

- Default value if key does not exist (and no Traceback)

- We can use get() and provide a default value of zero when the key is not yet in the dictionary - and then just add one

In [13]:
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names:
    counts[name] = counts.get(name, 0) + 1
print(counts)

{'csev': 2, 'cwen': 2, 'zqian': 1}


# Counting Pattern

The general pattern to count the words in a line of text is to split the line into words, then loop thorough the words and use a dictionary to track the count of each word independently.

In [15]:
counts = dict()
print('Enter a line of text: ')
line = input("")

words = line.split()

print('Words:', words)

print('Counting...')
for word in words:
    counts[word] = counts.get(word,0) + 1
print('Counts', counts)

Enter a line of text: 
the clown ran after the car and the car ran into the tent and the tent fell down on the clown and the car
Words: ['the', 'clown', 'ran', 'after', 'the', 'car', 'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and', 'the', 'tent', 'fell', 'down', 'on', 'the', 'clown', 'and', 'the', 'car']
Counting...
Counts {'the': 7, 'clown': 2, 'ran': 2, 'after': 1, 'car': 3, 'and': 3, 'into': 1, 'tent': 2, 'fell': 1, 'down': 1, 'on': 1}


# Definite Loops and Dictionaries

Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionaries and looks up the values

In [18]:
counts = {'chuck' : 1, 'fred' : 42, 'jan' : 100}
for key in counts :
    print(key, counts[key])

chuck 1
fred 42
jan 100


# Retrieving LISTS of Keys and Values

In [19]:
jjj = {'chuck' : 1, 'fred' : 42, 'jan' : 100}
print(list(jjj))

['chuck', 'fred', 'jan']


In [20]:
print(jjj.keys())

dict_keys(['chuck', 'fred', 'jan'])


In [21]:
print(jjj.values())

dict_values([1, 42, 100])


In [22]:
print(jjj.items())

dict_items([('chuck', 1), ('fred', 42), ('jan', 100)])


# Two Iteration Variables

- we loop through the key-value pairs in a dictionary using *two* iteration variables
- each iteration, the first variable is the key and the second variable is the corresponding value for the key

In [24]:
jjj = {'chuck' : 1, 'fred' : 42, 'jan' : 100}
for a, b in jjj.items() :
    print(a,b)

chuck 1
fred 42
jan 100


In [26]:
handle = open('words.txt')
counts = dict()
for line in handle :
    words = line.split()
    for word in words :
        counts[word] = counts.get(word, 0) + 1
        
bigcount = None
bigword = None
for word, count in counts.items() :
    if bigcount is None or count > bigcount:
        bigword = word
        bigcount = count

print(bigword, bigcount)

to 16


Assignment 9.4

Write a program to read through the mbox-short.txt and figure out who has sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file. After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer.

In [41]:
fhand = open('mbox-short.txt')
email = dict()

for line in fhand :
    line = line.rstrip()
    if not line.startswith('From ') : continue
    words = line.split()
    email[words[1]] = email.get(words[1], 0) + 1
    
max_email = None
max_times = None
for k, v in email.items() :
    if max_email is None or v > max_times :
        max_email = k
        max_times = v
        
print(max_email, max_times)

cwen@iupui.edu 5
