# Dictionaries

### CDH course "Programming in Python"

[index](https://colab.research.google.com/drive/1s05aR4wn2dU1C3se1oXfqKz2EY5ilrno)

Previous module: [9. String manipulation](https://colab.research.google.com/drive/1Z7NNMxqHTSMoUadH3pVL6Bg6oii4qUoQ)

### This module

- Learn about _dictionaries_, a useful way of storing and looking up data

## Exercise 10.1: Dictionaries

1 . Below are two dictionaries containing information about different types of fruit. Print a nice message about each fruit stating its colour and price. For example, _An apple is red and costs € 2.50_, etc.

In [3]:
fruit_colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange'}
fruit_prices = {'apple': 2.50, 'banana': 2.10, 'orange': 1.50}

# The example sentence includes 'a'/'an' ('an apple' / 'a banana')
# which was a bit of an oversight: the point of this exercise
# was to loop through a dictionary and access values.
# You could make a solution like this to avoid it.

for fruit in fruit_colors:
    color = fruit_colors[fruit]
    price = fruit_prices[fruit]

    print(fruit + 's', 'are', color, 'and cost €', price, sep=' ')

# Bonus points if you DID implement a solution on when to use
# 'a' or 'an'! Here is an example of your you can do that.

print()

def starts_with_vowel(word):
    '''
    Returns True if the word (a string) starts with a vowel.
    '''
    vowels = ['a', 'e', 'i', 'o', 'u']
    if word[0] in vowels:
        return True
    return False


for fruit in fruit_colors:
    color = fruit_colors[fruit]
    price = fruit_prices[fruit]

    if starts_with_vowel(fruit):
        article = 'An'
    else:
        article = 'A'

    print(article, fruit, 'is', color, 'and costs €', price, sep=' ')

apples are red and cost € 2.5
bananas are yellow and cost € 2.1
oranges are orange and cost € 1.5

An apple is red and costs € 2.5
A banana is yellow and costs € 2.1
An orange is orange and costs € 1.5


2 . Here is a longer lists of fruit colours. Write a function `count_fruits` which gets gets a colour as input and returns the number of fruits that have that colour (according to `lots_of_fruit`).

In [4]:
lots_of_fruit = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange',
                 'cucumber': 'green', 'kiwi': 'green', 'strawberry': 'red',
                 'pineapple': 'yellow','blackberry': 'black', 'cherry': 'red',
                 'gooseberry': 'green', 'raspberry': 'red', 'mandarin': 'orange',
                 'lemon': 'yellow', 'lime': 'green'}

In [5]:
def count_fruits(color):
    '''Count the number of fruits in `lots_of_fruit` that match this colour.'''
    count = 0
    for value in lots_of_fruit.values():
        if value == color:
            count = count + 1
    return count

# let's see if it works!
assert count_fruits('red') == 4
assert count_fruits('lavender') == 0

3 . The list `fruit_basket` contains a bunch of fruits. Can you make a dictionary `fruit_counts` which gives the amount for each fruit in `fruit_basket`? (Do not count the fruits by hand!)

In [6]:
fruit_basket = ['apple', 'banana', 'banana', 'banana', 'apple', 'orange',
                'orange', 'grape', 'grape', 'grape', 'grape', 'grape', 'grape',
                'grape', 'grape', 'grape', 'pear', 'apple', 'strawberry',
                'strawberry', 'strawberry', 'orange']

In [7]:
def count_items(items):
    '''
    Count the items in a list.

    Input: a list of items, such as strings
    Output: a dictionary with the total of occurences for each item.
    '''

    counts = dict() # we will keep track of our counts in here!

    for item in items:
        # the current count: either the current value in the dictionary
        # or 0 if we haven't seen this fruit yet
        current_count = counts.get(item, 0)
        new_count = current_count + 1
        counts[item] = new_count
    
    return counts

fruit_counts = count_items(fruit_basket)

# let's see if it works!
assert fruit_counts['apple'] == 3


4 . Here is a different list, which contains the words in a sentence. Can you use your code above to make a dictionary `word_counts` telling us how often each word occurs? (Tip: if you need to do very similar tasks, make a function!)

Write a function that takes a dictionary like `word_counts` tells us the most commonly occuring item and the count. Note that there can be multiple items that occurred the most.

In [8]:
# the variable sent0 contains the first sentence of The Catcher in the Rye
# split into single words
sent0 = ['If', 'you', 'really', 'want', 'to', 'hear', 'about', 'it,', 'the', 
         'first', 'thing', 'you’ll', 'probably', 'want', 'to', 'know', 'is', 
         'where', 'I', 'was', 'born,', 'and', 'what', 'my', 'lousy', 'childhood', 
         'was', 'like,', 'and', 'how', 'my', 'parents', 'were', 'occupied', 
         'and', 'all', 'before', 'they', 'had', 'me,', 'and', 'all', 'that', 
         'David', 'Copperfield', 'kind', 'of', 'crap,', 'but', 'I', 'don’t', 
         'feel', 'like', 'going', 'into', 'it,', 'if', 'you', 'want', 
         'to', 'know', 'the', 'truth.']

In [9]:
word_counts = count_items(sent0) # we recycle our function from the last exercise

In [14]:
def most_frequent(counts):
    '''
    For a dictionary with totals, the most commonly occuring item(s) and the count.

    Input should be a dictionary with the total number of occurences for each key in some collection.
    Returns a tuple of two items. First is a list of the most frequent item(s). If the input
    is an empty dict, the list if empty. Second is the number of occurences for that item.
    '''

    if not len(counts):
        return [], 0

    max_count = max(counts.values())

    top_items = []
    for item, count in counts.items():
        if count == max_count:
            top_items.append(item)

    return top_items, max_count

words, total = most_frequent(word_counts)
print(words, total)

# here are some assert statements you could use to check your own function
# feel free to adapt them if your function gives a different output format
assert most_frequent(fruit_counts) == (['grape'], 9)
assert most_frequent(word_counts) == (['and'], 4)
assert most_frequent({}) == ([], 0)

['and'] 4
