<h1>Dictionaries</h1>
<h3>Main properties</h3>
<ul>
    <li>Unordered collections of arbitrary objects, keys provide the symbolic location of the items</li>
    <li>Map by nature keys to values unidirectionaly</li>
    <li>Accessed by key, not offset position: fetch an item out of the dictionary by key</li>
    <li>Mutable mapping: Can be changed in place, but they don't support sequence operations.</li>
    <li>Variable-length: Can shrink and grow (without new copies being made)</li>
    <li>Heterogeneous: Can contain any type of object</li>
    <li>Arbitraribly nestable: Each key can have just one associative value, but that value can be a collection of multiple objects</li>
    <li>Tables of object references (hash tables): Dictionaries are implemented as hash tables and store object references</li>
    <li>One value per key, but there may be many keys per value</li>
</ul>

<h3>Dictionary assignments</h3>

In [193]:
# assignment
d1 = {'player': 'Michael', 'number': 23} # dictionary
d2 = {'players' : {'sport' : 'basketball', 'games played': 789 }} # nested dictionary
d3 = dict(player='Michael', number=23) # keyword construction
d4 = dict([('player', 'Michael'),('number', 23)]) # key/value pairs
d5 = dict.fromkeys(["blue", "yellow", "green"], 2) # fix value
print(d1, d2, d3, d4, d5, sep="\n")
del d1, d2, d3, d4, d5

{'player': 'Michael', 'number': 23}
{'players': {'sport': 'basketball', 'games played': 789}}
{'player': 'Michael', 'number': 23}
{'player': 'Michael', 'number': 23}
{'blue': 2, 'yellow': 2, 'green': 2}


In [194]:
# key can be any immutable type

class PassClass:
    pass

def hello_func():
    print("Hello!")

{
    "str": "string",
    100: "integer",
    1.2: "float",
    True: "bool",
    (1,2,3): "tuple",
    frozenset([1,2,3]): "frozenset",
    hello_func: "function",
    PassClass: "class object",
    PassClass(): "instance object"
}

{'str': 'string',
 100: 'integer',
 1.2: 'float',
 True: 'bool',
 (1, 2, 3): 'tuple',
 frozenset({1, 2, 3}): 'frozenset',
 <function __main__.hello_func()>: 'function',
 __main__.PassClass: 'class object',
 <__main__.PassClass at 0x7505f9f7df50>: 'instance object'}

<h3>Basic operations</h3>

In [195]:
breakfast = {'toast': 2, 'ham': 2, 'egg': 1}

In [196]:
# length
len(breakfast)

3

In [197]:
# key membership test
'ham' in breakfast

True

In [198]:
"hamm" not in breakfast

True

<h3>Iterations</h3>

In [199]:
breakfast = dict([("toast", 2),  ("ham", 4), ("cheese", 3)])

In [200]:
# for loop iterates over the keys
for key in breakfast:
    print(key, ":", breakfast[key])

toast : 2
ham : 4
cheese : 3


In [201]:
# for loop iterates over items method
for key, value in breakfast.items():
    print(key, ":", value)

toast : 2
ham : 4
cheese : 3


In [202]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
for key in d:
    print(key + '\t' + str(d[key]))

bacon	1
ham	2
sausage	3


<h3>Changing Dictionaries in place</h3>

In [203]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# change an entry
d['ham'] = ['grill', 'bake', 'fry']
d

{'bacon': 1, 'ham': ['grill', 'bake', 'fry'], 'sausage': 3}

In [204]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# delete entries
del d['sausage'], d['bacon']
d

{'ham': 2}

In [205]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# add new entry
d['steak'] = 4
d

{'bacon': 1, 'ham': 2, 'sausage': 3, 'steak': 4}

<h3>Dictionaries comprehension</h3>

In [206]:
keys = ['a','b', 'c']
values = (1,2,4)
{ key: value for (key, value) in zip(keys, values) }

{'a': 1, 'b': 2, 'c': 4}

In [207]:
keys = "abcdefghi"
values = range(1,100)
{ key: value**2 for (key, value) in zip(keys, values) if key in "bdfh" }

{'b': 4, 'd': 16, 'f': 36, 'h': 64}

In [208]:
# map a single stream
d = {x: x**2 for x in range(0,10)}
d

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

In [209]:
# map a single stream
d = {c.lower(): c + '!' for c in ['SPAM', 'HAM', 'EGGS']}
d

{'spam': 'SPAM!', 'ham': 'HAM!', 'eggs': 'EGGS!'}

<h3>Dictionary methods</h3>

In [210]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# return all keys
d.keys() # makes a view object
print(d.keys(), list(d.keys()))

dict_keys(['bacon', 'ham', 'sausage']) ['bacon', 'ham', 'sausage']


In [211]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# return all values
list(d.values())

[1, 2, 3]

In [212]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# return all key/value pairs
list(d.items())

[('bacon', 1), ('ham', 2), ('sausage', 3)]

In [213]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# avoid missing key errors by fetching its value with the get method
d.get('bacon')

1

In [214]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3}
# a non existing key is normally an error, but get returns none
print(d.get('egg'))

None


In [215]:
eat = {'bacon': 1, 'ham': 2, 'sausage': 3}
drink = {'juice': 4, 'milk': 5, 'beer': 6}
# update: merges the keys and values of one dictionary into another (overwriting values with same keys)
eat.update(drink)
print(eat)

{'bacon': 1, 'ham': 2, 'sausage': 3, 'juice': 4, 'milk': 5, 'beer': 6}


In [216]:
d = {'bacon': 1, 'ham': 2, 'sausage': 3, "egg": 1}
# pop: delete and return value from key
d.pop('bacon')

1

In [217]:
# pop with default value if key error occur
d.pop('hamm', None)

In [218]:
# pop any item
d.popitem()

('egg', 1)

In [219]:
d.clear()
d

{}

In [224]:
# setdefault
breakfast = {'bacon': 1, 'ham': 2, 'sausage': 3, "egg": 1}
# inserts only if key not exists
breakfast.setdefault("bacon", 100)
breakfast.setdefault("cheese", 2)
breakfast

{'bacon': 1, 'ham': 2, 'sausage': 3, 'egg': 1, 'cheese': 2}

### Functions in dictionaries

In [220]:
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

functions = {add: (4, 6), multiply: (2, 9) }

for function, args in functions.items():
    # args need to be unpacked from tuple
    print(function(*args))

10
18


### Text analysis

In [2]:
text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

In [9]:
letter_stats = dict()

for letter in text:
    key = letter.lower().strip()
    if key:
        letter_stats[key] = letter_stats.get(key, 0) + 1
print(letter_stats)

{'l': 22, 'o': 25, 'r': 24, 'e': 59, 'm': 19, 'i': 38, 'p': 19, 's': 39, 'u': 17, 'y': 13, 'd': 16, 't': 43, 'x': 2, 'f': 6, 'h': 14, 'n': 38, 'g': 11, 'a': 29, '.': 4, 'b': 5, "'": 1, 'v': 5, 'c': 10, '1': 2, '5': 1, '0': 3, ',': 4, 'w': 6, 'k': 7, '9': 1, '6': 1}


In [15]:
import string

categories = dict()

for letter in text:
    if letter != " ":
        if letter in string.ascii_lowercase:
            key = "lower"
        elif letter in string.ascii_uppercase:
            key = "upper"
        else:
            key = "other"

    if key not in categories:
        categories[key] = set()
    categories[key].add(letter)

print(categories)

{'upper': {'P', 'M', 'L', 'A', 'I'}, 'lower': {'a', 'w', 'g', 'b', 'o', 'v', 'h', 'm', 'k', 'r', 'l', 'u', 's', 'e', 'i', 'x', 'p', 't', 'y', 'f', 'c', 'n', 'd', ' '}, 'other': {'6', '1', '0', '9', '5', "'", ',', '.', ' '}}


### Exercise
Write a Python function that will create and return a dictionary from another dictionary, but sorted by value. You can assume the values are all comparable and have a natural sort order.

In [35]:
composers = {'Johann': 65, 'Ludwig': 56, 'Frederic': 39, 'Wolfgang': 35}

In [45]:
def sort_dict_by_value(any_dict: dict) -> dict:
    sort_dict = dict()
    for composer in sorted(any_dict, key=lambda key: any_dict[key]):
        sort_dict[composer] = any_dict[composer]
    return sort_dict

print(sort_dict_by_value(composers))

{'Wolfgang': 35, 'Frederic': 39, 'Ludwig': 56, 'Johann': 65}


In [44]:
def sort_dict_by_value(any_dict: dict) -> dict:
    sort_dict = { key: value
        for key, value in sorted(any_dict.items(), key=lambda element: element[1])
    }
    return sort_dict

print(sort_dict_by_value(composers))

{'Wolfgang': 35, 'Frederic': 39, 'Ludwig': 56, 'Johann': 65}


In [46]:
def sort_dict_by_value(any_dict: dict) -> dict:
    return dict(sorted(any_dict.items(), key=lambda element: element[1]))

print(sort_dict_by_value(composers))

{'Wolfgang': 35, 'Frederic': 39, 'Ludwig': 56, 'Johann': 65}


Given two dictionaries, `d1` and `d2`, write a function that creates a dictionary that contains only the keys common to both dictionaries, with values being a tuple containg the values from `d1` and `d2`. (Order of keys is not important).

In [74]:
d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d2 = {'b': 20, 'c': 30, 'y': 40, 'z': 50}

Solution:

In [None]:
{'b': (2, 20), 'c': (3, 30)}

In [76]:
def intersect_dicts(dict1: dict, dict2: dict) -> dict:
    d = dict()
    for key in dict1.keys() & dict2.keys():
        d.update({key: (dict1[key], dict2[key])})
    return d

intersect_dicts(d1,d2)

{'c': (3, 30), 'b': (2, 20)}

In [72]:
def intersect_dicts(dict1: dict, dict2: dict) -> dict:
    d = {
        key: (dict1[key], dict2[key])
        for key in dict1.keys() & dict2.keys()
    }
    return d

intersect_dicts(d1,d2)

{'c': (3, 30), 'b': (2, 20)}

In [77]:
def intersect_dicts(dict1: dict, dict2: dict) -> dict:
    intersect_keys = dict1.keys() & dict2.keys()

    d = {
        key: (dict1[key], dict2[key])
        for key in intersect_keys
        }
    return d

intersect_dicts(d1,d2)

{'c': (3, 30), 'b': (2, 20)}

You have text data spread across multiple servers.
Each server is able to analyze this data and return a dictionary that contains words and their frequency.

Your job is to combine this data to create a single dictionary that contains all the words and their combined frequencies from all these data sources. Bonus points if you can make your dictionary sorted by frequency (highest to lowest).

For example, you may have three servers that each return these dictionaries:

In [128]:
d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

In [142]:
def merge_sum(*dicts):

    unsorted = {}
    for d in dicts:
        for key, value in d.items():
            unsorted[key] = unsorted.get(key, 0) + value

    return dict(sorted(unsorted.items(), key=lambda element: element[1], reverse=True))

merge_sum(d1,d2,d3)

{'python': 17,
 'javascript': 15,
 'java': 13,
 'c#': 12,
 'c++': 10,
 'go': 9,
 'erlang': 5,
 'haskell': 2,
 'pascal': 1}

In [143]:
merge_sum(d1,d2)

{'python': 16, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9}

For this exercise suppose you have a web API load balanced across multiple nodes. This API receives various requests for resources and logs each request to some local storage. Each instance of the API is able to return a dictionary containing the resource that was accessed (the dictionary key) and the number of times it was requested (the associated value).

Your task here is to identify resources that have been requested on some, but not all the servers, so you can determine if you have an issue with your load balancer not distributing certain resource requests across all nodes.

For simplicity, we will assume that there are exactly 3 nodes in the cluster.

You should write a function that takes 3 dictionaries as arguments for node 1, node 2, and node 3, and returns a dictionary that contains only keys that are not found in **all** of the dictionaries. The value should be a list containing the number of times it was requested in each node (the node order should match the dictionary (node) order passed to your function). Use `0` if the resource was not requested from the corresponding node.

In [145]:
n1 = {'employees': 100, 'employee': 5000, 'users': 10, 'user': 100}
n2 = {'employees': 250, 'users': 23, 'user': 230}
n3 = {'employees': 150, 'users': 4, 'login': 1000}

Solution:

In [185]:
result = {'employee': (5000, 0, 0),
          'user': (100, 230, 0),
          'login': (0, 0, 1000)}

In [205]:
def load_balancer_analysis(node1: dict, node2: dict, node3: dict):

    def different_keys(d1: dict, d2:dict) -> set:
        union = d1.keys() & d2.keys()
        intersect = d1.keys() | d2.keys()
        return intersect - union

    not_in_all = different_keys(node1,node2) | different_keys(node1,node3) | different_keys(node2,node3)
    result =  {
        key: (node1.get(key,0), node2.get(key,0), node3.get(key,0))
        for key in not_in_all
    }

    return result

load_balancer_analysis(n1,n2,n3)

{'login': (0, 0, 1000), 'user': (100, 230, 0), 'employee': (5000, 0, 0)}

In [199]:
union = n1.keys() | n2.keys() | n3.keys()
union

{'employee', 'employees', 'login', 'user', 'users'}

In [201]:
intersect = n1.keys() & n2.keys() & n3.keys()
intersect

{'employees', 'users'}

In [202]:
union - intersect

{'employee', 'login', 'user'}

In [204]:
def load_balancer_analysis(node1: dict, node2: dict, node3: dict):
    union = node1.keys() | node2.keys() | node3.keys()
    intersect = node1.keys() & node2.keys() & node3.keys()
    not_in_all = union - intersect
    result =  {
        key: (node1.get(key,0), node2.get(key,0), node3.get(key,0))
        for key in not_in_all
    }
    return result

load_balancer_analysis(n1,n2,n3)

{'login': (0, 0, 1000), 'employee': (5000, 0, 0), 'user': (100, 230, 0)}