_HDS5210: Programming for Health Data Scientists_

# Week 8 - Dictionaries and Other Data Types



## Part 1 - Count up dictionary values

Take the dictionary below and return a new dictionary that contains each of the values and a count of how many times they appear as values in the dictionary.

I've provided some skeleton code.  Replace the comments with code that does the right thing.


In [1]:
patient_ages = {
    "E143291": 19,
    "E872839": 32,
    "E878198": 19,
    "E871111": 21,
    "E143299": 3,
    "E123332": 21,
    "E989891":19
} 

In [2]:
results = {}
for id, age in patient_ages.items():
    # Set results[age] = 0 if it doesn't exist yet
    # Increment results[age] by 1
    results.setdefault(age,0)
    results[age] += 1
    
print(results)

{19: 3, 32: 1, 3: 1, 21: 2}


The expected output here would be:
```
{3: 1, 19: 3, 21: 2, 32: 1}
```

Part 2 - Count by Gender
---

Now consider the more complex dictionary below.  This one is two levels of dictionary.  That is, the information about each patient is also a dictionary.  The *sub dictionaries* for each patient each contain two named values: *age* and *gender*.  Compute the average age for each gender.  Some skeleton code is provided.



In [3]:
patients = {
    "E143291": { "age": 19, "gender": "M" },
    "E872839": { "age": 32, "gender": "F" },
    "E878198": { "age": 19, "gender": "F" },
    "E871111": { "age": 21, "gender": "F" },
    "E143299": { "age": 3,  "gender": "M" },
    "E123332": { "age": 21, "gender": "M" },
    "E989891": { "age": 19, "gender": "F" }
}

gender_stats = {}
for pid, info in patients.items():
    age = info['age']
    gender = info['gender']
    gender_stats.setdefault(gender, { "count": 0, "total": 0 })
    gender_stats[gender]['count'] += 1
    gender_stats[gender]['total'] += age
    
print(gender_stats)
    

{'F': {'count': 4, 'total': 91}, 'M': {'count': 3, 'total': 43}}


## Part 3 - Count up how many unique values are in the values

Try to solve this one in one line if you can...

Take the first dictionary from above and count the number of unique ages that appear in the dictionary values.


In [4]:
patient_ages = {
    "E143291": 19,
    "E872839": 32,
    "E878198": 19,
    "E871111": 21,
    "E143299": 3,
    "E123332": 21,
    "E989891":19
} 

# Get unique ages from the values above
len(set(list(patient_ages.values())))


4

The expected output here would be:
```
4
```

## Part 3 - Convert a list to a dictionary

Take the list of lists below and convert it into a dictionary where the entries in the dictionary are keyed off of whatever values are in column 0, and the dictionary values are the list of all the values that appear with that key from the input list.  See the example below.

In [5]:
names = [['Boal', 'Paul'],
         ['Duck', 'Donald'],
         ['Duck', 'Daisy'],
         ['Boal', 'Ada'],
         ['Boal', 'Teddy'],
         ['Westhus', 'Eric']]

families = {}

for name in names:
    last = name[0]
    first = name[1]
    families.setdefault(last, [])
    #1 Append the first name onto the list for that last name
    families[last].append(first)
    
families

{'Boal': ['Paul', 'Ada', 'Teddy'],
 'Duck': ['Donald', 'Daisy'],
 'Westhus': ['Eric']}

Expected output is:
```
{
 'Boal':    ['Paul', 'Ada', 'Teddy'],
 'Duck':    ['Donald', 'Daisy'],
 'Westhus': ['Eric']
}
 ```

## Part 4 - Join using a dictionary

Not joining again!!!  Don't worry, it's easier to do with a dictionary.

We have a list of patients, diagnosis, and length of stay.  We also have a dictionary that contains diagnosis and average length of stay.  Produce an output list that lists the patient and an indicator if the patient's stay was 'too long', 'too short', 'just right'

In [6]:
avg_los = {
    "Hemolytic jaundice and perinatal jaundice" : 2,
    "Medical examination/evaluation" : 3.2,
    "Liveborn" : 3.2,
    "Trauma to perineum and vulva" : 2.1,
    "Normal pregnancy and/or delivery" : 2,
    "Umbilical cord complication" : 2.1,
    "Forceps delivery" : 2.2,
    "Administrative/social admission" : 4.2,
    "Prolonged pregnancy" : 2.4,
    "Other complications of pregnancy" : 2.5
}

patients = [
    ['Boal', 'Medical examination/evaluation', 1.1],
    ['Boal', 'Other complications of pregnancy', 3.3],
    ['Jones', 'Liveborn', 3.2],
    ['Ashbury', 'Forceps delivery', 2.0]
]

In [7]:
los = []

for pat in patients:
    last = pat[0]
    code = pat[1]
    days = pat[2]
    target = avg_los[code]

    if days < target:
        status = 'too short'
    elif days > target:
        status = 'too long'
    else:
        status = 'just right'
    
    los.append([last,status])
    
los

[['Boal', 'too short'],
 ['Boal', 'too long'],
 ['Jones', 'just right'],
 ['Ashbury', 'too short']]

The output we expect to get is:
```
[['Boal', 'too short'],
 ['Boal', 'too long'],
 ['Jones', 'just right'],
 ['Ashbury', 'too short']]
```

Part 6 - CSV dictionary reader
---

Read in `/data/aco_year1.csv` using the DictReader function.  Then aggregate how many plans are availablbe in each state.  Note that the column, States Where Beneficiaries Reside, can have a comma separate list of values in it.


In [8]:
import csv

acos = {}

with open('/data/aco_year1.csv') as file:
    csv = csv.DictReader(file)
    for r in csv:
        states = r['States Where Beneficiaries Reside '].split(',')
        for s in states:
            state = s.strip()
            acos.setdefault(state,0)
            acos[state] += 1

acos

{'Alabama': 3,
 'Arizona': 7,
 'Arkansas': 2,
 'California': 17,
 'Colorado': 1,
 'Connecticut': 11,
 'Delaware': 1,
 'District of Columbia': 3,
 'Florida': 31,
 'Georgia': 13,
 'Idaho': 1,
 'Illinois': 12,
 'Indiana': 10,
 'Iowa': 6,
 'Kansas': 2,
 'Kentucky': 7,
 'Louisiana': 2,
 'Maine': 3,
 'Maryland': 10,
 'Massachusetts': 14,
 'Michigan': 8,
 'Minnesota': 2,
 'Mississippi': 3,
 'Missouri': 5,
 'Montana': 1,
 'Nebraska': 2,
 'Nevada': 3,
 'New Hampshire': 6,
 'New Jersey': 11,
 'New Mexico': 2,
 'New York': 18,
 'North Carolina': 7,
 'North Dakota': 1,
 'Ohio': 8,
 'Oklahoma': 1,
 'Oregon': 2,
 'Pennsylvania': 7,
 'Puerto Rico': 2,
 'Rhode Island': 2,
 'South Carolina': 5,
 'Tennessee': 8,
 'Texas': 15,
 'Utah': 1,
 'Vermont': 3,
 'Virginia': 7,
 'Washington': 2,
 'West Virginia': 1,
 'Wisconsin': 7,
 'Wyoming': 1}