### Exercises

#### Exercise #1

Let's revisit an exercise we did right after the section on dictionaries.

You have text data spread across multiple servers. Each server is able to analyze this data and return a dictionary that contains words and their frequency.

Your job is to combine this data to create a single dictionary that contains all the words and their combined frequencies from all these data sources. Bonus points if you can make your dictionary sorted by frequency (highest to lowest).

For example, you may have three servers that each return these dictionaries:

In [1]:
d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

Your resulting dictionary should look like this:

In [2]:
d = {'python': 17,
     'javascript': 15,
     'java': 13,
     'c#': 12,
     'c++': 10,
     'go': 9,
     'erlang': 5,
     'haskell': 2,
     'pascal': 1}

If only servers 1 and 2 return data (so d1 and d2), your results would look like:

In [3]:
d = {'python': 16,
     'javascript': 15,
     'java': 13,
     'c#': 12,
     'c++': 10, 
     'go': 9}

This was one solution to the problem:

In [4]:
def merge(*dicts):
    unsorted = {}
    for d in dicts:
        for k, v in d.items():
            unsorted[k] = unsorted.get(k, 0) + v
            
    # create a dictionary sorted by value
    return dict(sorted(unsorted.items(), key=lambda e: e[1], reverse=True))

Implement two different solutions to this problem:

**a**: Using `defaultdict` objects

**b**: Using `Counter` objects

In [5]:
from collections import defaultdict

def merge(*dicts):
    unsorted = defaultdict(int)
    for d in dicts:
        for k, v in d.items():
            unsorted[k] += v
    return dict(sorted(unsorted.items(), key=lambda e: e[1], reverse=True))

In [6]:
print(merge(d1, d2, d3))

{'python': 17, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9, 'erlang': 5, 'haskell': 2, 'pascal': 1}


In [7]:
print(merge(d1, d2))

{'python': 16, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9}


In [8]:
from collections import Counter

def merge(*dicts):
    unsorted = Counter()
    for d in dicts:
        unsorted.update(d)
    return dict(unsorted.most_common())

In [9]:
print(merge(d1, d2, d3))

{'python': 17, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9, 'erlang': 5, 'haskell': 2, 'pascal': 1}


In [10]:
print(merge(d1, d2))

{'python': 16, 'javascript': 15, 'java': 13, 'c#': 12, 'c++': 10, 'go': 9}


---

#### Exercise #2

Suppose you have a list of all possible eye colors:

In [11]:
eye_colors = ("amber", "blue", "brown", "gray", "green", "hazel", "red", "violet")

Some other collection (say recovered from a database, or an external API) contains a list of `Person` objects that have an eye color property.

Your goal is to create a dictionary that contains the number of people that have the eye color as specified in `eye_colors`. The wrinkle here is that even if no one matches some eye color, say `amber`, your dictionary should still contain an entry `"amber": 0`.

Here is some sample data:

In [12]:
class Person:
    def __init__(self, eye_color):
        self.eye_color = eye_color

In [13]:
from random import seed, choices
seed(0)
persons = [Person(color) for color in choices(eye_colors[2:], k = 50)]

As you can see we built up a list of `Person` objects, none of which should have `amber` or `blue` eye colors

Write a function that returns a dictionary with the correct counts for each eye color listed in `eye_colors`.

In [14]:
def count_eye_colors(persons, eye_colors):
    eye_counter = Counter(p.eye_color for p in persons)
    return {k: eye_counter[k] for k in eye_colors}

In [15]:
count_eye_colors(persons, eye_colors)

{'amber': 0,
 'blue': 0,
 'brown': 3,
 'gray': 10,
 'green': 8,
 'hazel': 7,
 'red': 10,
 'violet': 12}

---

#### Exercise #3

You are given three JSON files, representing a default set of settings, and environment specific settings.
The files are included in the downloads, and are named:
* `common.json`
* `dev.json`
* `prod.json`

Your goal is to write a function that has a single argument (the environment name) and returns the "combined" dictionary that merges the two dictionaries together, with the environment specific settings overriding any common settings already defined.

For simplicity, assume that the argument values are going to be the same as the file names, without the `.json` extension. So for example, `dev` or `prod`.

The wrinkle: We don't want to duplicate data for the "merged" dictionary - use `ChainMap` to implement this instead.

In [16]:
import json

In [17]:
with open('common.json') as f:
    print(json.load(f))

{'data': {'input_root': '/default/path/inputs', 'output_root': '/default/path/outputs', 'numerics': {'type': 'Decimal', 'precision': 6}}, 'database': {'db_name': 'deepdive', 'schema': 'public', 'port': 5432}, 'logs': {'level': 'info', 'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(message)s'}}


In [18]:
with open('dev.json') as f:
    print(json.load(f))

{'data': {'input_root': '/dev/path/inputs', 'output_root': '/dev/path/outputs', 'numerics': {'type': 'float'}, 'operators': {'add': '__add__'}}, 'database': {'user': 'test', 'pwd': 'test'}, 'logs': {'level': 'trace', 'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(filename)s %(funcName)s %(message)s'}}


In [19]:
with open('prod.json') as f:
    print(json.load(f))

{'data': {'input_root': '$DATA_INPUT_PATH', 'output_root': '$DATA_OUTPUT_PATH'}, 'database': {'user': '$PG_USER', 'pwd': '$PG_PWD'}}


In [20]:
from collections import ChainMap

def settings(env):
    with open('common.json') as f:
        common_settings = json.load(f)
    with open(f'{env}.json') as f:
        env_settings = json.load(f)
    
    def settings_recurse(env_settings, common_settings):
        chain = ChainMap(env_settings, common_settings)
        for k, v in env_settings.items():
            if isinstance(v, dict) and k in common_settings:
                chain[k] = settings_recurse(v, common_settings[k])
        return dict(chain)
    
    return settings_recurse(env_settings, common_settings)

In [21]:
prod = settings('prod')
prod

{'data': {'input_root': '$DATA_INPUT_PATH',
  'output_root': '$DATA_OUTPUT_PATH',
  'numerics': {'type': 'Decimal', 'precision': 6}},
 'database': {'db_name': 'deepdive',
  'schema': 'public',
  'port': 5432,
  'user': '$PG_USER',
  'pwd': '$PG_PWD'},
 'logs': {'level': 'info',
  'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(message)s'}}

In [22]:
prod['database']['user']

'$PG_USER'

In [23]:
prod['data']['numerics']['type']

'Decimal'

In [24]:
prod['logs']['level']

'info'

In [25]:
dev = settings('dev')
dev

{'data': {'input_root': '/dev/path/inputs',
  'output_root': '/dev/path/outputs',
  'numerics': {'type': 'float', 'precision': 6},
  'operators': {'add': '__add__'}},
 'database': {'db_name': 'deepdive',
  'schema': 'public',
  'port': 5432,
  'user': 'test',
  'pwd': 'test'},
 'logs': {'level': 'trace',
  'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(filename)s %(funcName)s %(message)s'}}

In [26]:
dev['logs']['level']

'trace'

In [27]:
dev['data']['numerics']['type']

'float'

In [28]:
dev['data']['operators']

{'add': '__add__'}