_HDS5210 Programming for Data Science_

# Week 8 - Data Structures

At the beginning of the semester, we learned about some basic variables that were of types like `int`, `str`, `float`, and `bool`.  These "single valued" data types are referred to as **"scalar"** data types.  In this context **scalar** refers to a data type that just holds a single variable, as opposed to a **complex** data type that holds a more complicated data structure, like a `list` or `class`.

In this lecture, we'll talk about another complex data type called a **dictionary**.  https://docs.python.org/3/tutorial/datastructures.html#dictionaries

Dictionaries are a collection of variable keys and values within a single variable.  In this example, we have a variable `v` that contains keys of `a` and `b` with corresponding values `'Three'` and `3`.
```
v = { 
  'a': 'Three', 
  'b': 3 
}
```

In [19]:
# We create an empty dictionary using curly braces:
d = {}

In [20]:
# Or we can initialize it with key / value pairs:
v = { 'a': 'Three', 'b': 3 }

In [21]:
v

{'a': 'Three', 'b': 3}

In [22]:
# The keys within a dictionary can be of any type:
elements = { 
    'H' : {
        'name': 'Hydrogen',
        'number': 1,
        'isotopes': [1,2,3]
    },
    'He' : {
        'name': 'Helium',
        'number': 2,
        'isotopes': [2,3,4]
    }
}

In [23]:
elements

{'H': {'isotopes': [1, 2, 3], 'name': 'Hydrogen', 'number': 1},
 'He': {'isotopes': [2, 3, 4], 'name': 'Helium', 'number': 2}}

In [24]:
# Note that the keys within a dictionary can only be used once.  
# The following isn't illegal, but it may not do what you expect it to do.

duplicate = {
    'name': 'Paul Boal',
    'name': 'Eric Westhus'
}

In [25]:
duplicate

{'name': 'Eric Westhus'}

## Accessing Keys/Values of the Dictionary

**Note that the ordering of keys is ARBITRARY, not necessarily how they are entered or alphabetically**

In [39]:
help(dict.__getitem__)

Help on method_descriptor:

__getitem__(...)
    x.__getitem__(y) <==> x[y]



In [40]:
for e in elements:
    print("{:s} is short for {:s}".format(e, elements[e]['name']))

H is short for Hydrogen
He is short for Helium


In [42]:
elements['H']['isotopes'].append(4)

In [43]:
elements['H']

{'isotopes': [1, 2, 3, 4], 'name': 'Hydrogen', 'number': 1}

In [45]:
elements['H']['name'] = 'Hydrogen?'

In [46]:
elements['H']


{'isotopes': [1, 2, 3, 4], 'name': 'Hydrogen?', 'number': 1}

## Other ways of creating dictionaries

In [50]:
keys = ['one', 'two', 'three']
vals = [1,     2,     3      ]
d = dict(zip(keys, vals))

In [53]:
d

{'one': 1, 'three': 3, 'two': 2}

If key names are really simple, we can use a different syntax:

In [55]:
d = dict(one=1, two=2, three=3)

In [56]:
d

{'one': 1, 'three': 3, 'two': 2}

## Other ways of looping over dictionaries

In [52]:
for k, v in elements.items():
    print("{:s} = {:s}".format(k,str(v)))

H = {'number': 1, 'name': 'Hydrogen?', 'isotopes': [1, 2, 3, 4]}
He = {'number': 2, 'name': 'Helium', 'isotopes': [2, 3, 4]}


## Ways to use dictionaries

In [57]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [58]:
dosages

[{'amount': 100, 'drug': 'Aspirin', 'mass_unit': 'mg', 'time_unit': 'hr'},
 {'amount': 50, 'drug': 'Digoxin', 'mass_unit': 'mg', 'time_unit': 'hr'}]

In [59]:
for d in dosages:
    print("{:s} {:d} {:s}/{:s}".format(d["drug"],d["amount"],d["mass_unit"],d["time_unit"]))

Aspirin 100 mg/hr
Digoxin 50 mg/hr


## Reading CSV as a dictionary

The csv module has a DictReader class that will read the CSV file into a list of dictionaries with one row per input data row, and key/value pairs that use the column names from the header row as key names.

In [63]:
import csv
with open('/midterm/census.csv') as f:
    dict = csv.DictReader(f)
    for d in dict:
        break
        
d

{'BIRTHS2010': '987836',
 'BIRTHS2011': '3973485',
 'BIRTHS2012': '3936976',
 'BIRTHS2013': '3940576',
 'BIRTHS2014': '3958107',
 'BIRTHS2015': '3985924',
 'CENSUS2010POP': '308745538',
 'DEATHS2010': '598691',
 'DEATHS2011': '2512442',
 'DEATHS2012': '2501531',
 'DEATHS2013': '2608019',
 'DEATHS2014': '2611362',
 'DEATHS2015': '2625033',
 'DIVISION': '0',
 'DOMESTICMIG2010': '0',
 'DOMESTICMIG2011': '0',
 'DOMESTICMIG2012': '0',
 'DOMESTICMIG2013': '0',
 'DOMESTICMIG2014': '0',
 'DOMESTICMIG2015': '0',
 'ESTIMATESBASE2010': '308758105',
 'INTERNATIONALMIG2010': '199613',
 'INTERNATIONALMIG2011': '910951',
 'INTERNATIONALMIG2012': '948321',
 'INTERNATIONALMIG2013': '992215',
 'INTERNATIONALMIG2014': '1133261',
 'INTERNATIONALMIG2015': '1150528',
 'NAME': 'UNITED STATES',
 'NATURALINC2010': '389145',
 'NATURALINC2011': '1461043',
 'NATURALINC2012': '1435445',
 'NATURALINC2013': '1332557',
 'NATURALINC2014': '1346745',
 'NATURALINC2015': '1360891',
 'NETMIG2010': '199613',
 'NETMIG2011':