_HDS5210 Programming for Data Science_

# Week 8 - Data Structures

https://drive.google.com/open?id=1CEFBEQaoFiA4fuVMfVeD2iq_rlfyu0U0oVV3SyotkBY

At the beginning of the semester, we learned about some basic variables that were of types like `int`, `str`, `float`, and `bool`.  These "single valued" data types are referred to as **"scalar"** data types.  In this context **scalar** refers to a data type that just holds a single variable, as opposed to a **complex** data type that holds a more complicated data structure, like a `list` or `class`.

In this lecture, we'll talk about **sets** and **tuples**, but primarily focus another complex data type called a **dictionary**.  https://docs.python.org/3/tutorial/datastructures.html#dictionaries

Dictionaries are a collection of variable keys and values within a single variable.  In this example, we have a variable `v` that contains keys of `a` and `b` with corresponding values `'Three'` and `3`.
```
v = { 
  'a': 'Three', 
  'b': 3 
}
```

In [1]:
# We create an empty dictionary using curly braces:
d = {}

In [2]:
# Or we can initialize it with key / value pairs:
v = { 'a': 'Three', 'b': 3 }

In [3]:
v

{'a': 'Three', 'b': 3}

In [4]:
# The keys within a dictionary can be of any type:
elements = { 
    'H' : {
        'name': 'Hydrogen',
        'number': 1,
        'isotopes': [1,2,3]
    },
    'He' : {
        'name': 'Helium',
        'number': 2,
        'isotopes': [2,3,4]
    }
}

In [5]:
elements

{'H': {'isotopes': [1, 2, 3], 'name': 'Hydrogen', 'number': 1},
 'He': {'isotopes': [2, 3, 4], 'name': 'Helium', 'number': 2}}

In [13]:
elements[0] 

KeyError: 0

In [6]:
# Note that the keys within a dictionary can only be used once.  
# The following isn't illegal, but it may not do what you expect it to do.

duplicate = {
    'name': 'Paul Boal',
    'name': 'Eric Westhus'
}

In [7]:
duplicate

{'name': 'Eric Westhus'}

In [8]:
duplicate = {}
duplicate['name'] = 'Paul Boal'

In [9]:
duplicate

{'name': 'Paul Boal'}

In [10]:
duplicate['name'] = 'Eric Westhus'

In [11]:
duplicate

{'name': 'Eric Westhus'}

## Accessing Keys/Values of the Dictionary

**Note that the ordering of keys is ARBITRARY, not necessarily how they are entered or alphabetically**

In [14]:
# __getite__ == [] 
help(dict.__getitem__)

Help on method_descriptor:

__getitem__(...)
    x.__getitem__(y) <==> x[y]



In [20]:
# for key in dictionary
for e in elements:
    print(e)
    print(elements[e])
    print("{:s} is short for {:s}".format(e, elements[e]['name']))

He
{'name': 'Helium', 'number': 2, 'isotopes': [2, 3, 4]}
He is short for Helium
H
{'name': 'Hydrogen', 'number': 1, 'isotopes': [1, 2, 3, 4]}
H is short for Hydrogen


In [18]:
elements['H']['isotopes'].append(4)

In [19]:
elements['H']

{'isotopes': [1, 2, 3, 4], 'name': 'Hydrogen', 'number': 1}

In [21]:
elements['H']['name'] = 'Hydrogen?'

In [22]:
elements['H']


{'isotopes': [1, 2, 3, 4], 'name': 'Hydrogen?', 'number': 1}

## Other ways of creating dictionaries

In [23]:
keys = ['one', 'two', 'three']
vals = [1,     2,     3      ]


In [26]:
d = dict(zip(keys, vals))

In [27]:
d

{'one': 1, 'three': 3, 'two': 2}

If key names are really simple, we can use a different syntax:

In [None]:
d = dict(one=1, two=2, three=3)

In [None]:
d

In [32]:
keys = ['one', 'two', 'three']
vals = [1,2,3]
dict(zip(keys, vals))

{'one': 1, 'three': 3, 'two': 2}

In [37]:
d = {}

In [38]:
for index in range(len(keys)):
    d[vals[index]] = keys[index]

In [39]:
d

{1: 'one', 2: 'two', 3: 'three'}

In [40]:
d[1] 

'one'

In [41]:
d[1] = 'Seven'
d

{1: 'Seven', 2: 'two', 3: 'three'}

In [42]:
elements

{'H': {'isotopes': [1, 2, 3, 4], 'name': 'Hydrogen?', 'number': 1},
 'He': {'isotopes': [2, 3, 4], 'name': 'Helium', 'number': 2}}

In [44]:
elements['H'] = 1

In [45]:
elements

{'H': 1, 'He': {'isotopes': [2, 3, 4], 'name': 'Helium', 'number': 2}}

In [51]:
d = dict(zip(keys, vals))

In [52]:
d[1] = 'Seven'

In [53]:
d

{1: 'Seven', 'two': 2, 'three': 3, 'one': 1}

In [54]:
d = { 'one': 1, 'uno': 1 }
d

{'one': 1, 'uno': 1}

## Other ways of looping over dictionaries

In [59]:
for abbr, info in elements.items():
    print("{:s} = {:s}".format(str(abbr),str(info)))

He = {'name': 'Helium', 'number': 2, 'isotopes': [2, 3, 4]}
H = 1


## Ways to use dictionaries

In [None]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [None]:
dosages

In [None]:
for d in dosages:
    print("{:s} {:d} {:s}/{:s}".format(d["drug"],d["amount"],d["mass_unit"],d["time_unit"]))

## Reading CSV as a dictionary

The csv module has a DictReader class that will read the CSV file into a list of dictionaries with one row per input data row, and key/value pairs that use the column names from the header row as key names.

In [None]:
import csv
with open('/midterm/census.csv') as f:
    dict = csv.DictReader(f)
    for d in dict:
        break
        
d

In [None]:
import csv
with open('/midterm/census.csv') as f:
    dict = csv.DictReader(f)
    row1 = next(dict)
    print(row1)

In [None]:
import csv
with open('/midterm/census.csv') as f:
    dict = csv.DictReader(f)
    row1 = next(dict)
    print(row1['POPESTIMATE2014'])

In [None]:
import csv
with open('/midterm/census.csv') as f:
    dict = csv.DictReader(f)
    row1 = next(dict)
    print(row1.keys())

# Inverting Dictionaries

In [None]:
patient_ages = {
    "E143291": 19,
    "E872839": 32,
    "E878198": 19,
    "E871111": 21,
    "E143299": 3,
    "E123332": 21,
    "E989891":19
} 

In [None]:
age_patients = {}
for patient, age in patient_ages.items():
    if age in age_patients:
        age_patients[age].append(patient)
    else:
        age_patients[age] = [patient]

age_patients

# Sets

Sets are a special kind of list that is always a unique set of values - there won't be any duplicates.

You can also think about it as just a dictionary with only keys - remember that dictionaries can only have one entry for each key.

In [None]:
sex = {'M', 'F', 'U', 'O'}
sex

In [None]:
sex = {'M', 'F', 'U', 'M', 'F', 'O'}
sex

In [None]:
sex.add('T')
sex

You can also do all kinds of other set operations: See Chapter 11, p202

# Tuples

There's another special kind of ordered list called a tuple.  What's special about tuples is that they can't be altered once they're created, even though the objects inside them can be.  Weird.

One thing you can do with tuples, is assign to multiple variables at once.

In [None]:
(a, b) = (1, 2)
a

In [None]:
b