# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In the /data directory on the Jupyter server, there is a file called `allergies.json` that contains a list of patient allergies.  It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [1]:
import json
from pathlib import Path
HOME = str(Path.home())

ALLERGIES_FILE="/data/allergies.json"

In [2]:
### BEGIN SOLUTION
# define the function
def allergy_count(json_file):
    '''(file path) -> int
    This function finds the key "entry" in a JSON file and counts the number of items in the corresponding value (assumed to be a list). 
    This value is returned as an integer. The JSON file input is assumed to be a working file directory.
    
    >>> allergy_count(ALLERGIES_FILE)
    4
    '''
    # Initialize the return variable
    count = -1
    
    # Read in the JSON file and assign to an object, allergies
    with open(json_file) as f:
        allergies = json.load(f)
    
    # Retrieve the contents of the "entry" object and assign to a file
    entry = allergies.get("entry")
    
    # Count the length of the "entry" object
    # convert the length to an integer
    # Assign the integer-length to the return variable
    count = int(len(entry))
    
    # Return the integer of the count
    return count
### END SOLUTION

In [3]:
# Test the docstring case
# import the doctest module
import doctest

# run the docstring examples to test
doctest.run_docstring_examples(allergy_count, globals(), verbose = True)

Finding tests in NoName
Trying:
    allergy_count(ALLERGIES_FILE)
Expecting:
    4
ok


In [4]:
allergy_count(ALLERGIES_FILE)

4

In [5]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [6]:
### BEGIN SOLUTION
# Define the function
def patient_count(json_file):
    '''(file path) -> int
    This file takes a file path as an input (with assumption the file path leads to a json file)
    and returns an integer count of the unique patients in the JSON file. The structure of the file 
    is assumed to be that of the allergies.json file provided.
    
    The key "entry" is a list where each item in the list is a dictionary containing the key "resource."
    This key ("resource") corresponds to a dictionary with the key "patient," a dictionary with the key
    "display" which corresponds to a string value of the patient's name. A unique name is assumed to indicate
    a unique patient.
    
    >>> patient_count(ALLERGIES_FILE)
    2
    '''
    
    # Initialize a list to store the unique patient names
    names = []
    
    # Open the file path and assign the file contents to an object
    with open(json_file) as f:
        allergies = json.load(f)
        
    # Assign the value of the "entry" key object to a variable
    entries = allergies.get("entry")
    
    # For all of the entries in the list "entry"
    for entry in entries:
        # Retrieve the value of the key "resource" key from the "entry" dictionary
        resource = entry.get("resource")
        # Retrieve the value of the "patient" key
        patient = resource.get("patient")
        # Retrieve the value of the "display" key
        display_name = patient.get("display")
        # If the value of the "display" key is not already in the list of
        # patient names, append the value of the "display" key to the list
        if display_name not in names:
            names.append(display_name)
            # Else, pass (implied)
    
    # Assign the length of the list names, formatted as an integer, to the return variable
    patients = int(len(names))
    
    # Return the count of unique patients
    return patients
### END SOLUTION

In [7]:
# Test the docstring case
# import the doctest module
import doctest

# run the docstring examples to test
doctest.run_docstring_examples(patient_count, globals(), verbose = True)

Finding tests in NoName
Trying:
    patient_count(ALLERGIES_FILE)
Expecting:
    2
ok


In [8]:
patient_count(ALLERGIES_FILE)

2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [17]:
### BEGIN SOLUTION
# Define the function
def allergy_per_patient(json_file):
    ''' (file path) -> dict
    This function takes a file path as an input, with the assumption that file path maps to a JSON file
    with a similar structure to allergies.json, and returns a dictionary with the patient name
    and the number of allergies that patient has recorded in the file.
    
    The key "entry" is a list where each item in the list is a dictionary containing the key "resource."
    This key ("resource") corresponds to a dictionary with the key "patient," a dictionary with the key
    "display" which corresponds to a string value of the patient's name. A unique name is assumed to indicate
    a unique patient. The key "resource" also contains a key "substance" which is a dictionary with the 
    key "substance,"" another dictionary. The dictionary value of "substance" contains the key "text" with
    a string value of the allergen. There is assumed to be one entry for each unique allergen.
    
    >>> allergy_per_patient(ALLERGIES_FILE)
    {'Jason Argonaut': 3, 'Paul Boal': 1}
    '''
    # Initialize a dictionary for the final result
    patient_allergies = {}
    
        
    # Open the file path and assign the file contents to an object
    with open(json_file) as f:
        allergies = json.load(f)
        
    # Assign the value of the "entry" key object to a variable
    entries = allergies.get("entry")
    
    # For all of the entries in the list "entry"
    for entry in entries:
        # Retrieve the value of the key "resource" key from the "entry" dictionary
        resource = entry.get("resource")
        
        # Retrieve the value of the "patient" key from the resource
        patient = resource.get("patient")
        
        # Retrieve the value of the "display" key from the patient
        display_name = patient.get("display")
        
        # If the patient name is not already a key in the dictionary
        if display_name not in patient_allergies:
            allergy_count = 0
            # Create a new dictionary entry with patient name as key and a count of allergens as the value
            patient_allergies.setdefault(display_name, allergy_count)
            
        # If the patient is in the patient_allergies dictionary, retreive the current value
        # of the allergy count and increment the allergy count by 1
        patient_allergies[display_name] = (patient_allergies.get(display_name) + 1)
    
    # Return the dictionary of patient names and allergy count
    return patient_allergies
        
### END SOLUTION

In [18]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [19]:
import json

### BEGIN SOLUTION
# Define the function
def allergy_list(json_file):
    ''' (file path) -> list
    This function takes a file path as an input, with the assumption that file path maps to a JSON file
    with a similar structure to allergies.json, and returns a list, where each item corresponds to a list with 
    the following format: ["patient name", "allergen", "reaction"].
    
    The key "entry" is a list where each item in the list is a dictionary containing the key "resource."
    This key ("resource") corresponds to a dictionary with the key "patient," a dictionary with the key
    "display" which corresponds to a string value of the patient's name. A unique name is assumed to indicate
    a unique patient. The key "resource" also contains a key "substance" which is a dictionary with the 
    key "substance,"" another dictionary. The dictionary value of "substance" contains the key "text" with
    a string value of the allergen. There is assumed to be one entry for each unique allergen. The key "resource"
    also contains a key "reaction" , which is a list with a dictionary including the key "manifestation," which
    corresponds to a list containing a description the reaction ("text"). Only the first reaction is captured
    by the function. 
    '''
    # Initialize a list for the final result
    allergy_list = []
    
        
    # Open the file path and assign the file contents to an object
    with open(json_file) as f:
        allergies = json.load(f)
        
    # Assign the value of the "entry" key object to a variable
    entries = allergies.get("entry")
    
    # For all of the entries in the list "entry"
    for entry in entries:
        # Retrieve the value of the key "resource" key from the "entry" dictionary
        resource = entry.get("resource")
        
        # Retrieve the value of the "patient" key from the resource
        patient = resource.get("patient")
        
        # Retrieve the value of the "display" key from the patient
        display_name = patient.get("display")
        
        # Retrieve the value of the "substance" key from the resource
        substance = resource.get("substance")
        
        # Retrieve the value of the "text" key from the substance
        allergen = substance.get("text")
        
        # Retrieve the value of the "reaction" key from the resource
        reaction_list = resource.get("reaction")
        
        # Retrieve the first reaction from the reaction list
        first_reaction = reaction_list[0]
        
        # Retrieve the value of the key "manifestation" from the first_reaction dictionary
        manifestation_list = first_reaction.get("manifestation")
        
        # Retrieve the first manifestation from the manifestation list
        first_manifestation = manifestation_list[0]
        
        # Retrieve the value of the key "text" from the first_manifestation dictionary
        description = first_manifestation.get("text")
        
        # Append a new entry to the allergy_list - this will be a list of the patient's name, allergen, and 
        # reaction description
        allergy_list.append([display_name, allergen, description])
    
    # Return the allergy_list
    return allergy_list
    
### END SOLUTION

In [20]:
output=[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

assert allergy_list(ALLERGIES_FILE) == output


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  Solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [25]:
import json

### BEGIN SOLUTION
# Define the function
def allergy_reaction(json_file, patient, substance):
    ''' (file path, str, str) -> str
    This function takes a file path, patient name, and substance inputs, with the assumption that file path maps to a JSON file
    with a similar structure to allergies.json. The output is a string, corresponding to a description of an 
    allergic reaction. This function calls on the function allergy_list (see help documentation for a description).
    
    Given the file with the allergy information, patient name, and substance, the function first determines
    whether the patient given has a documented allergy to that substance. If so, the function returns
    the patient's documented reaction to that substance. If not, the function returns None.
    '''
    # Initialize a return variable as a None Type object
    reaction = None  
 
    # Call the function allergy_list, providing the json_file as input
    listed_allergies = allergy_list(json_file)
    
    # Each item in the allergy_list has the following format: ["patient name", "allergen", "reaction"]
    # Iterate through all the items in the allergy_list
    for item in listed_allergies:
        # If the patient provided has a documented allergy to the substance provided
        if (item[0] == patient) and (item[1] == substance):
            # assign the reaction to a return variable
            reaction = item[2]
        # If the patient provided does not have a documented allergy to the substance provided
        # The return variable will remain None
    
    # Return the reaction indicator
    return reaction
        
### END SOLUTION

In [26]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---
---

# Stretch (Extra) Problems

Work on either of the stretch problems below can earn you up to 25 free points toward the midterm assignment.  That is, if you complete one of these extra problems successfully, you can skip 1 of the problems that will appear on the midterm exam coming up next week.

The midterm will be distribute this Saturday 3/13.

This assignment is due on Sunday 3/14.  If you are trying for one of these extra problems Slack me, and I'll provide you feedback on how you did on these before end of day Monday 3/15.  That way you can choose what to complete on the midterm.


---
---

### STRETCH for March 2021 - For those looking for an additional challenge

As I've mentioned in class, CMS is now enforcing a rule around price transparency.  Every facility that take Medicare payments is required to publish a "machine readable" file with it's pricing infomration for a number of common procedures across all of the payers they work with.  There are two examples of such files in the `/data/` directory: `whiteriver.json` and `saline.xml`.

If you want to compare contracted prices across these two hospitals, you'll need to read in the information from both of those files into some kind of data structure, then merge the data together from those two files.  See what you can do.

See if you can create an output file that has the following fields:
* HOSPITAL
* PROCEDURE_CODE
* PAYER
* AMOUNT

If you choose to work on this, you may get stuck at some point and you won't know if you're _doing it right_. Make some assumptions. Document your questions in this notebook.



```
Procedure Code |  Description  |  Gross Charges  |  Aetna  |  QualChoice
```

---
---

### STRETCH from March 2020 - For those looking for an additional challenge

The Coronavirus is creating quite the stir right now.  There are some sources suggesting that trends show it is going to be significantly more serious than SARS was back in the 2002 timeframe.  Here's one visualization trying to demonstrate that: https://www.reddit.com/r/China_Flu/comments/ev2b4v/i_updated_some_charts_comparing_this_outbreak/

Someone on Kaggle has generously already compiled a dataset based on information from Johns Hopkins about the Coronavirus outbreak.  https://www.kaggle.com/brendaso/2019-coronavirus-dataset-01212020-01262020  Create a Kaggle account, if you don't already have one.  Download this data set and then upload it to your Jupyter Home folder.  (The "up arrow" button is for uploading a file.)

Use Python's built-in `csv` module to read the data from this file and generate the following information: **what are the total confirmed cases in all of Mainland China as of the latest information in the data set?**  Some important things to note:
* Each entry for a given city has the **cumulative** number of cases.  So that column is not additive (it cannot be summed).  You'll have to find a way to filter your data for the last day for each city, then total those up.
* If you choose to parse the date column, you will want to lookup how to do that using Python's `datetime` module.  Especially the `strptime` function.  https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior  Hint: you can parse a date string in the format 2/17/2020 using the code below.  This link will tell you what things like `%m` and `%Y` mean.

```
from datetime import datetime
d = datetime.strptime('2/17/2020', '%m/%d/%Y')
```

If you want to take this another step, **create a list of tuples that contain (observate date, total confirmed) totalled over all locations represented in the data**

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

To run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Follow the instruction on the prompt below to either ssave and submit your work, or continue working.

If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.

---

In [None]:
a=input('''
Are you ready to submit your work?
1. Click the Save icon (or do Ctrl-S / Cmd-S)
2. Type "yes" or "no" below
3. Press Enter

''')

if a=='yes':
    !git add week06_assignment_2.ipynb
    !git commit -a -m "Submitting the week 6 programming exercises"
    !git push
else:
    print('''
    
OK. We can wait.
''')


---

If the message above says something like _Submitting the week 3 review exercises_ or _Everything is up to date_, then your work was submitted correctly.