_HDS5210 Programming for Data Science_

# Week 8 - Data Structures

https://drive.google.com/drive/folders/12qaE8sNPWRMqcfeI9eSBxzUscz8KZDBf?usp=sharing

At the beginning of the semester, we learned about some basic variables that were of types like `int`, `str`, `float`, and `bool`.  These "single valued" data types are referred to as **"scalar"** data types.  In this context **scalar** refers to a data type that just holds a single variable, as opposed to a **complex** data type that holds a more complicated data structure, like a `list` or `class`.

In this lecture, we'll talk about **sets** and **tuples**, but primarily focus another complex data type called a **dictionary**.  https://docs.python.org/3/tutorial/datastructures.html#dictionaries

Dictionaries are a collection of variable keys and values within a single variable.  In this example, we have a variable `v` that contains keys of `a` and `b` with corresponding values `'Three'` and `3`.
```
v = { 
  'a': 'Three', 
  'b': 3 
}
```

In [None]:
# We create an empty dictionary using curly braces:
d = {}

In [None]:
# Or we can initialize it with key / value pairs:
v = { 'a': 'Three', 'b': 3 }

In [None]:
v

In [None]:
# The keys within a dictionary can be of any type:
elements = { 
    'H' : {
        'name': 'Hydrogen',
        'number': 1,
        'isotopes': [1,2,3]
    },
    'He' : {
        'name': 'Helium',
        'number': 2,
        'isotopes': [2,3,4]
    }
}

In [None]:
elements

In [None]:
elements[0] 

In [None]:
# Note that the keys within a dictionary can only be used once.  
# The following isn't illegal, but it may not do what you expect it to do.

duplicate = {
    'name': 'Paul Boal',
    'name': 'Eric Westhus'
}

In [None]:
duplicate

In [None]:
duplicate = {}
duplicate['name'] = 'Paul Boal'

In [None]:
duplicate

In [None]:
duplicate['name'] = 'Eric Westhus'

In [None]:
duplicate

## Accessing Keys/Values of the Dictionary

**Note that the ordering of keys is ARBITRARY, not necessarily how they are entered or alphabetically**

In [None]:
# __getitem__ == [] 
help(dict.__getitem__)

In [None]:
# for key in dictionary
for e in elements:
    print(e)
    print(elements[e])
    print("{:s} is short for {:s}".format(e, elements[e]['name']))

In [None]:
elements['H']['isotopes'].append(4)

In [None]:
elements['H']

In [None]:
elements['H']['name'] = 'Hydrogen?'

In [None]:
elements['H']


## Other ways of creating dictionaries

In [None]:
keys = ['one', 'two', 'three']
vals = [1,     2,     3      ]


In [None]:
d = dict(zip(keys, vals))

In [None]:
d

If key names are really simple, we can use a different syntax:

In [None]:
d = dict(one=1, two=2, three=3)

In [None]:
d

In [None]:
keys = ['one', 'two', 'three']
vals = [1,2,3]
dict(zip(keys, vals))

In [None]:
d = {}

In [None]:
for index in range(len(keys)):
    d[vals[index]] = keys[index]

In [None]:
d

In [None]:
d[1] 

In [None]:
d[1] = 'Seven'
d

In [None]:
elements

In [None]:
elements['H'] = 1

In [None]:
elements

In [None]:
d = dict(zip(keys, vals))

In [None]:
d[1] = 'Seven'

In [None]:
d

In [None]:
d = { 'one': 1, 'uno': 1 }
d

## Other ways of looping over dictionaries

In [None]:
for abbr, info in elements.items():
    print("{:s} = {:s}".format(str(abbr),str(info)))

## Ways to use dictionaries

In [None]:
Asprin100 = dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr")
Asprin100

In [None]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [None]:
dosages

In [None]:
for d in dosages:
    print("{:s} {:d} {:s}/{:s}".format(d["drug"],d["amount"],d["mass_unit"],d["time_unit"]))

## Reading CSV as a dictionary

The csv module has a DictReader class that will read the CSV file into a list of dictionaries with one row per input data row, and key/value pairs that use the column names from the header row as key names.

In [None]:
import csv
with open('/midterm/census.csv') as f:
    l = csv.DictReader(f)
    for item in l:
        print(item['NAME'])
        


In [None]:
import csv
with open('/midterm/census.csv') as f:
    L = csv.DictReader(f)
    row1 = next(L)
#    print(row1)
    row2 = next(L)
    print(row2)

In [None]:
import csv

def get_population(filename, year):
    with open(filename) as f:
        dict = csv.DictReader(f)
        row1 = next(dict)
        #print(row1['NAME'], row1['POPESTIMATE'+str(year)])
        return row1['POPESTIMATE'+str(year)]
    
print(get_population('/midterm/census.csv', 2014))

In [None]:
import csv
with open('/midterm/census.csv') as f:
    L = csv.DictReader(f)
    row1 = next(L)
    print(list(row1.keys()))

# Inverting Dictionaries

In [None]:
patient_ages = {
    "E143291": 19,
    "E872839": 32,
    "E878198": 19,
    "E871111": 21,
    "E143299": 3,
    "E123332": 21,
    "E989891": 19
} 

In [None]:
age_patients = {}
for pat_id, age in patient_ages.items():
    print(pat_id, age)
    
    if age in age_patients:
        print(age, "is already here")
        age_patients[age].append(pat_id)
    else:
        print(age, "is not here")
        age_patients[age] = [pat_id]

age_patients

In [None]:
age_patients = {}
for pat_id, age in patient_ages.items():
    print(pat_id, age)

    if age not in age_patients:
        age_patients[age] = []
    
    age_patients[age].append(pat_id)
    print("Age {:d} has {:d} patients".format(age, len(age_patients[age])))

age_patients

for age, pat_list in age_patients.items():
    print("-- Age {:d} has {:d} patients".format(age, len(pat_list)))
    

# Sets

Sets are a special kind of list that is always a unique set of values - there won't be any duplicates.

You can also think about it as just a dictionary with only keys - remember that dictionaries can only have one entry for each key.

In [None]:
sex = {'M', 'F', 'U', 'O'}
sex

In [None]:
sex = {'M', 'F', 'U', 'M', 'F', 'O'}
sex

In [None]:
sex.add('T')
sex

You can also do all kinds of other set operations: See Chapter 11, p202

# Tuples

There's another special kind of ordered list called a tuple.  What's special about tuples is that they can't be altered once they're created, even though the objects inside them can be.  Weird.

One thing you can do with tuples, is assign to multiple variables at once.

In [None]:
(a, b) = (1, 2)
a

In [None]:
b