### Exercises

#### Exercise #1

Let's revisit an exercise we did right after the section on dictionaries.

You have text data spread across multiple servers. Each server is able to analyze this data and return a dictionary that contains words and their frequency.

Your job is to combine this data to create a single dictionary that contains all the words and their combined frequencies from all these data sources. Bonus points if you can make your dictionary sorted by frequency (highest to lowest).

For example, you may have three servers that each return these dictionaries:

In [2]:
d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

Your resulting dictionary should look like this:

In [2]:
d = {'python': 17,
     'javascript': 15,
     'java': 13,
     'c#': 12,
     'c++': 10,
     'go': 9,
     'erlang': 5,
     'haskell': 2,
     'pascal': 1}

If only servers 1 and 2 return data (so d1 and d2), your results would look like:

In [3]:
d = {'python': 16,
     'javascript': 15,
     'java': 13,
     'c#': 12,
     'c++': 10, 
     'go': 9}

This was one solution to the problem:

In [4]:
def merge(*dicts):
    unsorted = {}
    for d in dicts:
        for k, v in d.items():
            unsorted[k] = unsorted.get(k, 0) + v
            
    # create a dictionary sorted by value
    return dict(sorted(unsorted.items(), key=lambda e: e[1], reverse=True))

Implement two different solutions to this problem:

**a**: Using `defaultdict` objects

**b**: Using `Counter` objects

---

In [11]:
from collections import defaultdict, Counter

def merge_default(*dicts):
    result = defaultdict(int)
    for dict_ in dicts:
        for k,v in dict_.items():
            result[k] += v
    return result

def merge_counter(*dicts):
    counter = sum((Counter(d) for d in dicts), Counter())
    return counter



print(merge_default(d1, d2, d3))
print(merge_counter(d1, d2, d3))

defaultdict(<class 'int'>, {'python': 17, 'java': 13, 'c#': 12, 'javascript': 15, 'c++': 10, 'go': 9, 'erlang': 5, 'haskell': 2, 'pascal': 1})
Counter({'python': 17, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9, 'erlang': 5, 'haskell': 2, 'pascal': 1})


#### Exercise #2

Suppose you have a list of all possible eye colors:

In [14]:
eye_colors = ("amber", "blue", "brown", "gray", "green", "hazel", "red", "violet")

Some other collection (say recovered from a database, or an external API) contains a list of `Person` objects that have an eye color property.

Your goal is to create a dictionary that contains the number of people that have the eye color as specified in `eye_colors`. The wrinkle here is that even if no one matches some eye color, say `amber`, your dictionary should still contain an entry `"amber": 0`.

Here is some sample data:

In [15]:
class Person:
    def __init__(self, eye_color):
        self.eye_color = eye_color

In [16]:
from random import seed, choices
seed(0)
persons = [Person(color) for color in choices(eye_colors[2:], k = 50)]

As you can see we built up a list of `Person` objects, none of which should have `amber` or `blue` eye colors

Write a function that returns a dictionary with the correct counts for each eye color listed in `eye_colors`.

In [25]:
def count_eye_colors(persons, eye_colors):
    counter = {color: 0 for color in eye_colors}
    for person in persons:
        counter[person.eye_color] += 1
    return counter


In [26]:
count_eye_colors(persons, eye_colors)

{'amber': 0,
 'blue': 0,
 'brown': 3,
 'gray': 10,
 'green': 8,
 'hazel': 7,
 'red': 10,
 'violet': 12}

---

#### Exercise #3

You are given three JSON files, representing a default set of settings, and environment specific settings.
The files are included in the downloads, and are named:
* `common.json`
* `dev.json`
* `prod.json`

Your goal is to write a function that has a single argument (the environment name) and returns the "combined" dictionary that merges the two dictionaries together, with the environment specific settings overriding any common settings already defined.

For simplicity, assume that the argument values are going to be the same as the file names, without the `.json` extension. So for example, `dev` or `prod`.

The wrinkle: We don't want to duplicate data for the "merged" dictionary - use `ChainMap` to implement this instead.

In [50]:
import json
from collections import ChainMap
from pprint import pprint

with open('common.json', 'r') as default:
    GLOBAL = json.load(default)

def get_settings(env):
    file_name = env + '.json'
    with open(file_name, 'r') as file:
        settings = json.load(file)
    result = ChainMap(settings, GLOBAL)
    return result

def chain_recursive(d1, d2):
    chain = ChainMap(d1, d2)
    for k,v in d1.items():
        if isinstance(v, dict) and k in d2:
            chain[k] = chain_recursive(d1[k], d2[k])
    return chain



In [51]:
answer =get_settings('prod')

pprint(answer)

print(answer['logs'])

ChainMap({'data': {'input_root': '$DATA_INPUT_PATH',
                   'output_root': '$DATA_OUTPUT_PATH'},
          'database': {'pwd': '$PG_PWD', 'user': '$PG_USER'}},
         {'data': {'input_root': '/default/path/inputs',
                   'numerics': {'precision': 6, 'type': 'Decimal'},
                   'output_root': '/default/path/outputs'},
          'database': {'db_name': 'deepdive', 'port': 5432, 'schema': 'public'},
          'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s '
                             '%(user)s %(message)s',
                   'level': 'info'}})
{'level': 'info', 'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(message)s'}


In [52]:
d1 = get_settings('dev')
d2 = get_settings('common')

In [53]:
dev = chain_recursive(d2, d1)

In [54]:
pprint(dev)

ChainMap(ChainMap({'data': ChainMap({'input_root': '/default/path/inputs',
                                     'numerics': ChainMap({'precision': 6,
                                                           'type': 'Decimal'},
                                                          {'type': 'float'}),
                                     'output_root': '/default/path/outputs'},
                                    {'input_root': '/dev/path/inputs',
                                     'numerics': {'type': 'float'},
                                     'operators': {'add': '__add__'},
                                     'output_root': '/dev/path/outputs'}),
                   'database': ChainMap({'db_name': 'deepdive',
                                         'port': 5432,
                                         'schema': 'public'},
                                        {'pwd': 'test', 'user': 'test'}),
                   'logs': ChainMap({'format': '%(asctime)s: %(levelname)s: 

In [57]:
dev['database']

ChainMap({'db_name': 'deepdive', 'schema': 'public', 'port': 5432}, {'user': 'test', 'pwd': 'test'})

In [58]:
def get_settings(env):
    file_name = env + '.json'
    with open(file_name, 'r') as file:
        settings = json.load(file)
    result = chain_recursive(settings, GLOBAL)
    return result

In [59]:
prod = get_settings('prod')

In [60]:
prod

ChainMap({'data': ChainMap({'input_root': '$DATA_INPUT_PATH', 'output_root': '$DATA_OUTPUT_PATH'}, {'input_root': '/default/path/inputs', 'output_root': '/default/path/outputs', 'numerics': {'type': 'Decimal', 'precision': 6}}), 'database': ChainMap({'user': '$PG_USER', 'pwd': '$PG_PWD'}, {'db_name': 'deepdive', 'schema': 'public', 'port': 5432})}, {'data': {'input_root': '/default/path/inputs', 'output_root': '/default/path/outputs', 'numerics': {'type': 'Decimal', 'precision': 6}}, 'database': {'db_name': 'deepdive', 'schema': 'public', 'port': 5432}, 'logs': {'level': 'info', 'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(message)s'}})