<div align="right">Env: Python [conda env:PY27_Test]</div>
<div align="right">Env: Python [conda env:PY36] </div>

# Working With Dictionaries - The Basics
These excercises come from multiple sources and show the basics of creating, modifying, and sorting dictionaries.
All code was created in Python 2.7 and cross-tested in Python 3.6.

**Quick Guide to the basics:**
- **Create** Dictionary:  dictionary = { key1:value, key2:value2 }
 - Example: `myDict1 = { 'one':'first thing', 'two':'secondthing' }`
 - Example: `myDict2 = { 1:43, 2:600, 3:-1000.4 }`
 - Example: `myDict3 = { 1:"text", 2:345, 3:'another value' }`
- **Add** to a dictionary:  `myDict1['newVal'] = 'something stupid'`
 - Resulting Dictionary: `myDict1 = { 'one':'first thing', 'two':'secondthing', 'newVal':'something stupid' }`
- **Remove** from dictionary: `del myDict1['newVal']`
 - now myDict1 is back the way it was: `{ 'one':'first thing', 'two':'secondthing' }`

**In This Document:**
- Immediately Below: modified "Learn Python The HardWay" dictionary example (covers much of the basics)
- [.get()](#get) - safely retrieve dict value 
- [sorting](#sorting)
- [When to use different sorted container options](#when)
- using [comprehensions w/ dictionaries](#comprehensions)
- [over-riding "missing" key](#misskey) 
- [find nth element](#nth_elem)
- [nested dictionaries](#nested) 
- [more dictionary and nested dictionary resourses](#nest2)
- browse below for more topics that may not be in this list ...

In [35]:
# Ex39 in Learn Python the Hard Way:
#  https://learnpythonthehardway.org/book/ex39.html
# edited, expanded, and made PY3.x compliant by Mitch before inclusion in this notebook

# create a mapping of state to abbreviation
states = {
    'Oregon': 'OR',
    'Florida': 'FL',
    'California': 'CA',
    'New York': 'NY',
    'Michigan': 'MI'
}

# create a basic set of states and some cities in them
cities = {
    'CA': 'San Francisco',
    'MI': 'Detroit',
    'FL': 'Jacksonville'
}

# add some more cities
cities['NY'] = 'New York'
cities['OR'] = 'Portland'

# print out some cities
print('-' * 10)
print("Two cities:")
print("NY State has: %s" %cities['NY'])
print("OR State has: %s" %cities['OR'])

# print some states
print('-' * 10)
print("Abbreviations for Two States:")
# PY 2.7 syntax from original code: print "Michigan's abbreviation is: ", states['Michigan']
print("Michigan's abbreviation is: %s" %states['Michigan'])
print("Florida's abbreviation is: %s" %states['Florida'])

# do it by using the state then cities dict
print('-' * 10)
print("State Abbreviation extracted from cities dictionary:")
print("Michigan has: %s" %cities[states['Michigan']])
print("Florida has: %s" %cities[states['Florida']])

# print every state abbreviation
print('-' * 10)
print("Every State Abbreviation:")
for state, abbrev in states.items():
    print("%s is abbreviated %s" % (state, abbrev))

# print every city in state
print('-' * 10)
print("Every city in Every State:")
for abbrev, city in cities.items():
    print("%s has the city %s" %(abbrev, city))

# now do both at the same time
print('-' * 10)
print("Do Both at Once:")
for state, abbrev in states.items():
    print("%s state is abbreviated %s and has city %s" % (
        state, abbrev, cities[abbrev]))

print('-' * 10)

----------
Two cities:
NY State has: New York
OR State has: Portland
----------
Abbreviations for Two States:
Michigan's abbreviation is: MI
Florida's abbreviation is: FL
----------
State Abbreviation extracted from cities dictionary:
Michigan has: Detroit
Florida has: Jacksonville
----------
Every State Abbreviation:
California is abbreviated CA
Michigan is abbreviated MI
New York is abbreviated NY
Florida is abbreviated FL
Oregon is abbreviated OR
----------
Every city in Every State:
FL has the city Jacksonville
CA has the city San Francisco
MI has the city Detroit
OR has the city Portland
NY has the city New York
----------
Do Both at Once:
California state is abbreviated CA and has city San Francisco
Michigan state is abbreviated MI and has city Detroit
New York state is abbreviated NY and has city New York
Florida state is abbreviated FL and has city Jacksonville
Oregon state is abbreviated OR and has city Portland
----------


<a id="get" name="get"></a>
### .get() examples

In [36]:
# ex 39 Python the Hard Way modified code continued ...

# safely get a abbreviation by state that might not be there
state = states.get('Texas')
if not state:
    print("Sorry, no Texas.")

# get a city with a default value
city = cities.get('TX', 'Does Not Exist')
print("The city for the state 'TX' is: %s" % city)
print("The city for the state 'FL' is: %s" % states.get('Florida'))

city2 = states.get('Hawaii')
print("city2: %s" %city2)
if city2 == None:
    city2 = 'Value == None'
elif city2 == '':
    city2 = 'Value is empty ""'
elif not city2:
    city2 = 'Value Missing (Passed not test)'
else:
    city2 = 'No Such Value'
    
print("The city for the state 'HI' is: %s" % city2) 
print("These commands used .get() to safely retrieve a value")

Sorry, no Texas.
The city for the state 'TX' is: Does Not Exist
The city for the state 'FL' is: FL
city2: None
The city for the state 'HI' is: Value == None
These commands used .get() to safely retrieve a value


In [37]:
# more tests on above code from Learn Python the Hard Way:
print(not city2)

# tests that produce an error - numerical indexing has no meaning in dictionaries:
# print states[1][1]

False


In [38]:
# what happens if all keys are not unique?
foods = {
    'fruit': 'banana',
    'fruit': 'apple',
    'meat': 'beef'
}

for foodType, indivFood in foods.items():
    print("%s includes %s" % (foodType, indivFood))

# answer:  does not happen.  2nd attempt to use same key over-writes the first
# remove elements from a dictionary
del foods['meat']
# add an element to dictionary
foods['vegetables'] = 'carrot'
foods['meats'] = 'chicken'
# change an element to a dictionary
foods['vegetables'] = 'corn'

foods

fruit includes apple
meat includes beef


{'fruit': 'apple', 'meats': 'chicken', 'vegetables': 'corn'}

In [39]:
# from MIT Big Data Class:
# Associative Arrays ==> Called "Dictionaries" or "Maps" in Python
# each value has a key that you can use to find it - { Key:Value }

super_heroes = {'Spider Man' : 'Peter Parker',
                'Super Man' : 'Clark Kent',
                'Wonder Woman': 'Dianna Prince',
                'The Flash' : 'Barry Allen',
                'Professor X' : 'Charles Exavior',
                'Wolverine' : 'Logan'}
print("%s %s" %("len(super_heroes): )", len(super_heroes)))
print("%s %s" %("Secret Identity for The Flash:", super_heroes['The Flash']))
del super_heroes['Wonder Woman']
print("%s %s" %("len(super_heroes): )", len(super_heroes)))
print(super_heroes)
super_heroes['Wolverine'] = 'John Logan'
print("Secret Identify for Wolverine:", super_heroes.get("Wolverine"))
print("Keys ... then Values (for super_heroes):")
print(super_heroes.keys())
print(super_heroes.values())

len(super_heroes): ) 6
Secret Identity for The Flash: Barry Allen
len(super_heroes): ) 5
{'Spider Man': 'Peter Parker', 'The Flash': 'Barry Allen', 'Super Man': 'Clark Kent', 'Wolverine': 'Logan', 'Professor X': 'Charles Exavior'}
('Secret Identify for Wolverine:', 'John Logan')
Keys ... then Values (for super_heroes):
['Spider Man', 'The Flash', 'Super Man', 'Wolverine', 'Professor X']
['Peter Parker', 'Barry Allen', 'Clark Kent', 'John Logan', 'Charles Exavior']


<a id="nested" name="nested"></a>
### nested dictionaries

In [41]:
# list of dictionaries:

FoodList = [foods, {'meats':'beef', 'fruit':'banana', 'vegetables':'broccoli'}]
print(FoodList[0])
print(FoodList[1])

{'vegetables': 'corn', 'fruit': 'apple', 'meats': 'chicken'}
{'vegetables': 'broccoli', 'fruit': 'banana', 'meats': 'beef'}


In [42]:
# dictionary of dictionaries (sometimes called "nested dictionary"):
# note: this is an example only.  In real world, sinde FoodList is inclusive of foods, you probably would not include both
#       uniform structures (same number of levels across all elements) is also advisable if possible

nestedDict = { 'heroes':super_heroes, 'foods': foods, 'complex_foods':FoodList } 
print(nestedDict['heroes'])
print('-'*72)
print(nestedDict['complex_foods'])

{'Spider Man': 'Peter Parker', 'The Flash': 'Barry Allen', 'Super Man': 'Clark Kent', 'Wolverine': 'John Logan', 'Professor X': 'Charles Exavior'}
------------------------------------------------------------------------
[{'vegetables': 'corn', 'fruit': 'apple', 'meats': 'chicken'}, {'vegetables': 'broccoli', 'fruit': 'banana', 'meats': 'beef'}]


<a id="nest2" name="nest2"></a>
### Working With Dictionaries and Nested Dictionaries - Helpful Code

This section has additional resources for working with dictionaries and nested dictionaries:

- [FileDataObj code](https://github.com/TheMitchWorksPro/DataTech_Playground/blob/master/Python_Misc/TMWP_PY_FileDataObjects.ipynb) - the FileDataObject stores contents from a file in a nested dictionary and explores sorting and summarising the nested dict
- [dictionary and nested dictionary functions in a PY module](https://github.com/TheMitchWorksPro/DataTech_Playground/blob/master/Python_Misc/TMWP_dictionary_manipulation.py) - merging dictionaries, adding to a nested dictionary, summarizing a nested dictionary, etc. (some of this code was created from the previous example)

<a id="sorting" name="sorting"></a>

### Sorting

In [43]:
# Help on Collections Objects including Counter, OrderedDict, dequeu, etc:
#  https://docs.python.org/2/library/collections.html

# regular dictionary does not necessarily preserve order (things added in randomly?)
# original order of how you add elements is prserved in OrderedDict

from collections import OrderedDict

myOrdDict = OrderedDict({'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2})
print(myOrdDict)
myOrdDict['pork belly'] = 7
print(myOrdDict)
myOrdDict['sandwich'] = 5
print(myOrdDict)
myOrdDict['hero'] = 5
print(myOrdDict)

OrderedDict([('orange', 2), ('pear', 1), ('banana', 3), ('apple', 4)])
OrderedDict([('orange', 2), ('pear', 1), ('banana', 3), ('apple', 4), ('pork belly', 7)])
OrderedDict([('orange', 2), ('pear', 1), ('banana', 3), ('apple', 4), ('pork belly', 7), ('sandwich', 5)])
OrderedDict([('orange', 2), ('pear', 1), ('banana', 3), ('apple', 4), ('pork belly', 7), ('sandwich', 5), ('hero', 5)])


In [45]:
# sorting the ordered dictionary ...

# dictionary sorted by key
# replacing original OrderedDict w/ results
myOrdDict = OrderedDict(sorted(myOrdDict.items(), key=lambda t: t[0]))
print("myOrdDict (sorted by key):\n %s" %myOrdDict)

# dictionary sorted by value
myOrdDict2 = OrderedDict(sorted(myOrdDict.items(), key=lambda t: t[1]))
print("myOrdDict2 (sorted by value):\n %s" %myOrdDict2)

# dictionary sorted by length of the key string
myOrdDict3 = OrderedDict(sorted(myOrdDict.items(), key=lambda t: len(t[0])))
print("myOrdDict3 (sorted by length of key):\n %s" %myOrdDict3)

myOrdDict (sorted by key):
 OrderedDict([('apple', 4), ('banana', 3), ('hero', 5), ('orange', 2), ('pear', 1), ('pork belly', 7), ('sandwich', 5)])
myOrdDict2 (sorted by value):
 OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4), ('hero', 5), ('sandwich', 5), ('pork belly', 7)])
myOrdDict3 (sorted by length of key):
 OrderedDict([('hero', 5), ('pear', 1), ('apple', 4), ('banana', 3), ('orange', 2), ('sandwich', 5), ('pork belly', 7)])


In [46]:
# collections.OrderedDict(sorted(dictionary.items(), reverse=True))
# pd.Series(OrderedDict(sorted(browser.items(), key=lambda v: v[1])))

# changing sort order to reverse key sort:
myOrdDict3 = OrderedDict(sorted(myOrdDict.items(), reverse=True))
print("myOrdDict3 (reverse key sort):\n %s" %myOrdDict3)

myOrdDict3 (reverse key sort):
 OrderedDict([('sandwich', 5), ('pork belly', 7), ('pear', 1), ('orange', 2), ('hero', 5), ('banana', 3), ('apple', 4)])


In [12]:
# testing of above strategy ... usually works but encountered  cases where it failed for no known reason
# lambda approach may be more reliable:

import pandas as pd

# value sort as pandas series:
myOrdDict4 = pd.Series(OrderedDict(sorted(myOrdDict.items(), key=lambda v: v[1])))
print("myOrdDict4 (value sort / alternate method):\n %s" %myOrdDict4)

myOrdDict4 (value sort / alternate method):
 pear          1
orange        2
banana        3
apple         4
hero          5
sandwich      5
pork belly    7
dtype: int64


In [13]:
# value sort in reverse order:
myOrdDict5 = OrderedDict(sorted(myOrdDict.items(), key=lambda t: (-t[1],t[0])))
print("myOrdDict5 (sorted by value in reverse order):\n %s" %myOrdDict5)

myOrdDict5 (sorted by value in reverse order):
 OrderedDict([('pork belly', 7), ('hero', 5), ('sandwich', 5), ('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])


In [14]:
# Help on Collections Objects including Counter, OrderedDict, dequeu, etc:
#  https://docs.python.org/2/library/collections.html

# sample using a list:
# for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
#     cnt[word] += 1

from collections import Counter

cnt = Counter()
for num in myOrdDict.values():
    cnt[num] +=1
    
print(cnt)


Counter({5: 2, 1: 1, 2: 1, 3: 1, 4: 1, 7: 1})


In [15]:
# http://stackoverflow.com/questions/11089655/sorting-dictionary-python-3
# another approach proposed in 2013 on Stack Overflow (but this may have been newer than OrderdDict at the time)

''' Help topic recommends this approach:
pip install sortedcontainers

Then:

from sortedcontainers import SortedDict
myDic = SortedDict({10: 'b', 3:'a', 5:'c'})
sorted_list = list(myDic.keys())

'''
print("conda install sortedcontainers is available in Python 2.7 and 3.6 as of April 2017")

conda install sortedcontainers is available in Python 2.7 and 3.6 as of April 2017


In [16]:
# some dictionaries to work with ...

super_heroes  # created earlier

{'Professor X': 'Charles Exavior',
 'Spider Man': 'Peter Parker',
 'Super Man': 'Clark Kent',
 'The Flash': 'Barry Allen',
 'Wolverine': 'John Logan'}

In [17]:
super_heroes['The Incredible Hulk'] = 'Bruce Banner'

In [18]:
super_heroes  # seems to alpha sort on keys anyway

{'Professor X': 'Charles Exavior',
 'Spider Man': 'Peter Parker',
 'Super Man': 'Clark Kent',
 'The Flash': 'Barry Allen',
 'The Incredible Hulk': 'Bruce Banner',
 'Wolverine': 'John Logan'}

In [48]:
# quick case study exploring another means of reverse sorting (from Stack Overflow):
reversed_tst = OrderedDict(list(super_heroes.items())[::-1])
reversed_tst   # note how in this instance, we don't get what we expected
               # this example might not be advisable ...

OrderedDict([('Professor X', 'Charles Exavior'),
             ('Wolverine', 'John Logan'),
             ('Super Man', 'Clark Kent'),
             ('The Flash', 'Barry Allen'),
             ('Spider Man', 'Peter Parker')])

In [51]:
# however ... if we combine methodologies:
reversed_tst = OrderedDict(sorted(super_heroes.items(), key=lambda v: v[1])[::-1])
reversed_tst   # now the values are in reverse order ...

OrderedDict([('Spider Man', 'Peter Parker'),
             ('Wolverine', 'John Logan'),
             ('Super Man', 'Clark Kent'),
             ('Professor X', 'Charles Exavior'),
             ('The Flash', 'Barry Allen')])

In [52]:
# however ... if we combine methodologies:
reversed_tst = OrderedDict(sorted(super_heroes.items(), key=lambda k: k)[::-1])
reversed_tst   # now the keys are in reverse order ...

OrderedDict([('Wolverine', 'John Logan'),
             ('The Flash', 'Barry Allen'),
             ('Super Man', 'Clark Kent'),
             ('Spider Man', 'Peter Parker'),
             ('Professor X', 'Charles Exavior')])

In [19]:
fruitDict = {3: 'banana', 4: 'pear', 1: 'apple', 2: 'orange'}
fruitDict   # dictionaries appear to alpha sort at least on output making it hard to spot the effects below

{1: 'apple', 2: 'orange', 3: 'banana', 4: 'pear'}

In [20]:
# help on library:
# http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html

# test sample code from Stack Overflow post:
from sortedcontainers import SortedDict
myDic = SortedDict({10: 'b', 3:'a', 5:'c'})
sorted_list = list(myDic.keys())

print(myDic)
print(sorted_list)

SortedDict(None, 1000, {3: 'a', 5: 'c', 10: 'b'})
[3, 5, 10]


In [21]:
fruitDict = SortedDict(fruitDict)
sorted_list = list(fruitDict.keys())

print(fruitDict)
print(sorted_list)

SortedDict(None, 1000, {1: 'apple', 2: 'orange', 3: 'banana', 4: 'pear'})
[1, 2, 3, 4]


<a id="when" name="when"></a>As per the examples above ...


**So when to do what?**
- **OrderedDict:**  will store whatever you put into it in whatever order you first record the data (maintaing that order)
- **SortedDict:**   by default will alpha sort the data (over-riding original order) and maintain it for you in sorted order
- **Dict:**  Don't care about storing it in order?  just sort and output the results without storing it in new container

\*\*Final note:  only `SortedDict` allows indexing by numerical order on the data (by-passing keys) under both Python 2.7 and 3.6 (as shown in the next section)

<a id="nth_elem" name="nth_elem"></a>
### Find the nth element in a dictionary

In [40]:
# MIT Big Data included a demo of this type of index/access to a dictionary in a Python 2.7 notebook
# the code is organized in a try-except block here so it won't halt the notebook if converted to Python 3.6

def print_1st_keyValue(someDict):
    try:
        print(someDict.values()[0])            # only works in Python 2.7
    except Exception as ee:
        print(str(type(ee)) + ": " + str(ee))  # error from PY 3.6: 
                                               # <class 'TypeError'>: 'dict_values' object does not support indexing
    finally:
        try:
            print(someDict.keys()[0])          # only works in Python 2.7
        except Exception as ee:
            print(str(type(ee)) + ": " + str(ee))  # error from PY 3.6: 
                                                   # <class 'TypeError'>: 'dict_keys' object does not support indexing
                        
print_1st_keyValue(super_heroes)

Peter Parker
Spider Man


In [44]:
print_1st_keyValue(myOrdDict)  # run same test on ordered dictionaries
                               # failed in Python 3.6, worked in Python 2.7
                               # reminder:  syntax is orderedDict.values()[0], orderedDict.keys()[0]

2
orange


In [22]:
print_1st_keyValue(fruitDict)  # run same test on sorted dictionary - 
                               # this works in Python 3.6 and 2.7
                               # reminder:  syntax is sortedDict.values()[0], sortedDict.keys()[0]

apple
1


<a id="comprehensions" name="comprehensions"></a>
### Dictionary Comprehensions

In [23]:
# dictionary comprehension
[ k for k in fruitDict if k > 2 ]

[3, 4]

In [24]:
[ fruitDict[k] for k in fruitDict if k > 1 ] 

['orange', 'banana', 'pear']

In [25]:
newDict = { k*2:'fruit - '+fruitDict[k] for k in fruitDict if k > 1 and len(fruitDict[k]) >=6} 
print(newDict)
type(newDict)

{4: 'fruit - orange', 6: 'fruit - banana'}


dict

<a id="misskey" name="misskey"></a>
### keyDict object

In [26]:
class KeyDict(dict):
    def __missing__(self, key):
        #self[key] = key  # uncomment if desired behavior is to add keys when they are not found (w/ key as value)
        #this version returns the key that was not found
        return key

kdTst = KeyDict(super_heroes)
print(kdTst['The Incredible Hulk'])
print(kdTst['Ant Man'])   # value not found so it returns itself as per __missing__ over-ride

Bruce Banner
Ant Man


In [27]:
help(SortedDict)

Help on class SortedDict in module sortedcontainers.sorteddict:

class SortedDict(__builtin__.dict)
 |  SortedDict provides the same methods as a dict.  Additionally, SortedDict
 |  efficiently maintains its keys in sorted order. Consequently, the keys
 |  method will return the keys in sorted order, the popitem method will remove
 |  the item with the highest key, etc.
 |  
 |  Method resolution order:
 |      SortedDict
 |      __builtin__.dict
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __copy__ = copy(self)
 |  
 |  __delitem__(self, key)
 |      Remove ``d[key]`` from *d*.  Raises a KeyError if *key* is not in the
 |      dictionary.
 |  
 |  __init__(self, *args, **kwargs)
 |      SortedDict provides the same methods as a dict.  Additionally, SortedDict
 |      efficiently maintains its keys in sorted order. Consequently, the keys
 |      method will return the keys in sorted order, the popitem method will
 |      remove the item with the highest key, etc.
