#### Exercise 1

Let's revisit an exercise we did right after the section on dictionaries.

You have text data spread across multiple servers. Each server is able to analyze this data and return a dictionary that contains words and their frequency.

Your job is to combine this data to create a single dictionary that contains all the words and their combined frequencies from all these data sources. Bonus points if you can make your dictionary sorted by frequency (highest to lowest).

For example, you may have three servers that each return these dictionaries:

In [39]:
d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

In [40]:
from collections import defaultdict, Counter

In [41]:
def merge_default(*args):
    d = defaultdict(int)
    for arg in args:
        for k, v in arg.items():
            d[k] += v
    return defaultdict(int, sorted(d.items(), key=lambda e: e[1], reverse=True))

In [42]:
merge_default(d1, d2)

defaultdict(int,
            {'python': 16,
             'javascript': 15,
             'java': 13,
             'c#': 12,
             'c++': 10,
             'go': 9})

In [43]:
merge_default(d1, d2, d3)

defaultdict(int,
            {'python': 17,
             'javascript': 15,
             'java': 13,
             'c#': 12,
             'c++': 10,
             'go': 9,
             'erlang': 5,
             'haskell': 2,
             'pascal': 1})

In [44]:
def merge_counter(*args):
    d = Counter()
    for arg in args:
        d.update(arg)
    return Counter(dict(d.most_common()))

In [45]:
merge_counter(d1, d2, d3)

Counter({'python': 17,
         'javascript': 15,
         'java': 13,
         'c#': 12,
         'c++': 10,
         'go': 9,
         'erlang': 5,
         'haskell': 2,
         'pascal': 1})

In [46]:
merge_counter(d1, d2)

Counter({'python': 16,
         'javascript': 15,
         'java': 13,
         'c#': 12,
         'c++': 10,
         'go': 9})

#### Exercise 2

Suppose you have a list of all possible eye colors:

In [9]:
eye_colors = ("amber", "blue", "brown", "gray", "green", "hazel", "red", "violet")

Some other collection (say recovered from a database, or an external API) contains a list of `Person` objects that have an eye color property.

Your goal is to create a dictionary that contains the number of people that have the eye color as specified in `eye_colors`. The wrinkle here is that even if no one matches some eye color, say `amber`, your dictionary should still contain an entry `"amber": 0`.

Here is some sample data:

In [10]:
class Person:
    def __init__(self, eye_color):
        self.eye_color = eye_color

In [11]:
from random import seed, choices
seed(0)
persons = [Person(color) for color in choices(eye_colors[2:], k = 50)]

As you can see we built up a list of `Person` objects, none of which should have `amber` or `blue` eye colors

Write a function that returns a dictionary with the correct counts for each eye color listed in `eye_colors`.

In [12]:
test = Counter(eye_colors)
test.subtract(test)
test

Counter({'amber': 0,
         'blue': 0,
         'brown': 0,
         'gray': 0,
         'green': 0,
         'hazel': 0,
         'red': 0,
         'violet': 0})

In [13]:
def eye_color_count(eye_colors, persons):
    # Create a Counter with the valid eye colors.
    d = Counter(eye_colors)
    # Set the count of each eye color to 0 by subtracting
    # the counter from itself.
    d.subtract(d)
    d.update(person.eye_color for person in persons)
    return d

In [14]:
eye_color_count(eye_colors, persons)

Counter({'amber': 0,
         'blue': 0,
         'brown': 3,
         'gray': 10,
         'green': 8,
         'hazel': 7,
         'red': 10,
         'violet': 12})

#### Exercise 3

You are given three JSON files, representing a default set of settings, and environment specific settings.
The files are included in the downloads, and are named:
* `common.json`
* `dev.json`
* `prod.json`

Your goal is to write a function that has a single argument (the environment name) and returns the "combined" dictionary that merges the two dictionaries together, with the environment specific settings overriding any common settings already defined.

For simplicity, assume that the argument values are going to be the same as the file names, without the `.json` extension. So for example, `dev` or `prod`.

The wrinkle: We don't want to duplicate data for the "merged" dictionary - use `ChainMap` to implement this instead.

In [15]:
import json
from collections import ChainMap

In [16]:
# Open a JSON file and return it as a dictionary.
def dict_from_json_file(file_path):
    with open(file_path) as f:
        return json.load(f)
        

In [17]:
dict_from_json_file('dev.json')

{'data': {'input_root': '/dev/path/inputs',
  'output_root': '/dev/path/outputs',
  'numerics': {'type': 'float'},
  'operators': {'add': '__add__'}},
 'database': {'user': 'test', 'pwd': 'test'},
 'logs': {'level': 'trace',
  'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s %(filename)s %(funcName)s %(message)s'}}

In [18]:
# Grab the appropriate dictionary for the given environment.
def env_dict(env_name):
    file_path = env_name + '.json'
    return dict_from_json_file(file_path)

In [19]:
env_dict('prod')

{'data': {'input_root': '$DATA_INPUT_PATH',
  'output_root': '$DATA_OUTPUT_PATH'},
 'database': {'user': '$PG_USER', 'pwd': '$PG_PWD'}}

In [31]:
def recursive_chain(a, b):
    c = {}def env_config(env_name):
    # Create common dictionary.
    common = env_dict('common')
    # Create dictionary for selected environment.
    env = env_dict(env_name)
    # Return a combined dictionary with env overwriting common.
    return recursive_chain(env, common)
    # Iterate through each key in the combined set of keys from both dicts.
    for k in a.keys() | b.keys():
        # If either key's value is itself a dictionary and contains any items...
        if (a.get(k) and isinstance(a[k], dict)) or (b.get(k) and isinstance(b[k], dict)):
            # Call the function on the next level, passing both dictionary's values.
            # If one dictionary doesn't contain the given key, pass an empty dictionary.
            c[k] = recursive_chain(a.get(k, {}), b.get(k, {}))
        # Otherwise, set the value of the key in the combined dictionary equal to the
        # value grabbed from a ChainMap of the two dictionaries.
        else:
            c[k] = ChainMap(a, b).get(k)
    return c

In [47]:
def env_config(env_name):
    # Create common dictionary.
    common = env_dict('common')
    # Create dictionary for selected environment.
    env = env_dict(env_name)
    # Return a combined dictionary with env overwriting common.
    return recursive_chain(env, common)

In [49]:
def chain_recursive(d1, d2):
    chain = ChainMap(d1, d2)
    for k, v in d1.items():
        if isinstance(v, dict) and k in d2:
            chain[k] = chain_recursive(d1[k], d2[k])
    return chain

In [50]:
def load_settings(env_name):
    # Create common dictionary.
    common = env_dict('common')
    # Create dictionary for selected environment.
    env = env_dict(env_name)
    # Return a combined dictionary with env overwriting common.
    return chain_recursive(env, common)

In [57]:
pprint(env_config('prod'))

{'data': {'input_root': '$DATA_INPUT_PATH',
          'numerics': {'precision': 6, 'type': 'Decimal'},
          'output_root': '$DATA_OUTPUT_PATH'},
 'database': {'db_name': 'deepdive',
              'port': 5432,
              'pwd': '$PG_PWD',
              'schema': 'public',
              'user': '$PG_USER'},
 'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s '
                    '%(message)s',
          'level': 'info'}}


In [58]:
pprint(env_config('dev'))

{'data': {'input_root': '/dev/path/inputs',
          'numerics': {'precision': 6, 'type': 'float'},
          'operators': {'add': '__add__'},
          'output_root': '/dev/path/outputs'},
 'database': {'db_name': 'deepdive',
              'port': 5432,
              'pwd': 'test',
              'schema': 'public',
              'user': 'test'},
 'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s %(user)s '
                    '%(filename)s %(funcName)s %(message)s',
          'level': 'trace'}}


In [55]:
pprint(load_settings('prod'))

ChainMap({'data': ChainMap({'input_root': '$DATA_INPUT_PATH',
                            'output_root': '$DATA_OUTPUT_PATH'},
                           {'input_root': '/default/path/inputs',
                            'numerics': {'precision': 6, 'type': 'Decimal'},
                            'output_root': '/default/path/outputs'}),
          'database': ChainMap({'pwd': '$PG_PWD', 'user': '$PG_USER'},
                               {'db_name': 'deepdive',
                                'port': 5432,
                                'schema': 'public'})},
         {'data': {'input_root': '/default/path/inputs',
                   'numerics': {'precision': 6, 'type': 'Decimal'},
                   'output_root': '/default/path/outputs'},
          'database': {'db_name': 'deepdive', 'port': 5432, 'schema': 'public'},
          'logs': {'format': '%(asctime)s: %(levelname)s: %(clientip)s '
                             '%(user)s %(message)s',
                   'level': 'info'}})


In [56]:
pprint(load_settings('dev'))

ChainMap({'data': ChainMap({'input_root': '/dev/path/inputs',
                            'numerics': ChainMap({'type': 'float'},
                                                 {'precision': 6,
                                                  'type': 'Decimal'}),
                            'operators': {'add': '__add__'},
                            'output_root': '/dev/path/outputs'},
                           {'input_root': '/default/path/inputs',
                            'numerics': {'precision': 6, 'type': 'Decimal'},
                            'output_root': '/default/path/outputs'}),
          'database': ChainMap({'pwd': 'test', 'user': 'test'},
                               {'db_name': 'deepdive',
                                'port': 5432,
                                'schema': 'public'}),
          'logs': ChainMap({'format': '%(asctime)s: %(levelname)s: '
                                      '%(clientip)s %(user)s %(filename)s '
                              

In [59]:
dev_settings = load_settings('dev')

In [61]:
dev_settings['data']['numerics']['type']

'float'