# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In the /data directory on the Jupyter server, there is a file called `allergies.json` that contains a list of patient allergies.  It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [188]:
import json
ALLERGIES_FILE="allergies.json"

In [189]:

# Define ALLERGIES_FILE before using it
ALLERGIES_FILE = "allergies.json"

# Load the allergy data after defining ALLERGIES_FILE
allergies = json.load(open(ALLERGIES_FILE))

def allergy_count(file_path=ALLERGIES_FILE):
    """
    Counts the occurrences of each allergy substance in a JSON file.

    Args:
        file_path (str, optional): The path to the JSON file. Defaults to ALLERGIES_FILE.

    Returns:
        collections.Counter: A Counter object containing the allergy substances and their counts.
    """
    # Load the allergy data from the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Extract the allergy substances
    substances = [entry['resource']['substance']['text'] for entry in data['entry']]

    # Count occurrences of each substance
    # substance_count = Counter(substances)
    substance_count = len(substances)

    return substance_count

# Call the function with the default file path or provide your own
allergy_counts = allergy_count()  # Using the default ALLERGIES_FILE

print(allergy_counts)

4


In [190]:
allergy_count(ALLERGIES_FILE)

4

In [191]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [192]:
allergies = json.load(open(ALLERGIES_FILE))


ALLERGIES_FILE = "allergies.json" # This variable is defined but not used
def patient_count(file_path=ALLERGIES_FILE):
    # Load the allergy data from the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Extract unique patient references
    patients = {entry['resource']['patient']['reference'] for entry in data['entry']}

    # Count unique patients
    return len(patients)

unique_patient_count = patient_count()
print(f"Number of unique patients: {patient_count}")

Number of unique patients: <function patient_count at 0x7fda0d2029e0>


In [193]:
patient_count(ALLERGIES_FILE)

2

In [194]:
assert type(patient_count(ALLERGIES_FILE)) == int
assert patient_count(ALLERGIES_FILE) == 2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [195]:
import json
from collections import defaultdict

def allergy_per_patient(file_path):
    """
    Calculates the number of allergies per patient from a JSON file.

    Args:
        file_path (str): The path to the JSON file containing allergy data.

    Returns:
        dict: A dictionary where keys are patient names and values
              are the number of allergies for each patient.
    """

    with open(file_path, 'r') as file:
        data = json.load(file)

    allergies_by_patient = defaultdict(int)
    for entry in data['entry']:
        patient_name = entry['resource']['patient']['display']
        allergies_by_patient[patient_name] += 1

    return dict(allergies_by_patient)

# Example usage
file_path = "allergies.json"
result = allergy_per_patient(file_path)
print(result)

{'Jason Argonaut': 3, 'Paul Boal': 1}


In [196]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

In [197]:
assert type(allergy_per_patient(ALLERGIES_FILE)) == dict
assert allergy_per_patient(ALLERGIES_FILE) == {'Paul Boal': 1, 'Jason Argonaut': 3}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [198]:
import json
from collections import defaultdict

def allergy_list(json_file):
  """
  This function returns a list of lists, where each linner list represents
  a patient's allergy and reaction.
retuns:
  list: A list of lists, each containing:
  patient_name (str): The name of the patient.
  all_med (str): The medication name or substance the patient is allergic to.
  all_react (str): The reaction experienced by the patient.
test:
  allergy_list(ALLERGIES_FILE)
  [['Jason Argonaut', 'PENICILLIN G', 'Hives'], ['Paul Boal', 'PENICILLIN G', 'Bruising'], ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'], ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
  """
  with open(json_file) as file:
    data = json.load(file)
    entries = data['entry']
    allergy_list = []
 # all_med: allergy medication, all_react: allergy reaction.
    for entry in entries:
      patient_name = entry['resource']['patient']['display']
      all_med = entry['resource']['substance']['text']
      all_react = entry['resource']['reaction'][0]['manifestation'][0]['text']
      allergy_list.append([patient_name, all_med, all_react])

  return allergy_list


In [199]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

In [200]:
assert allergy_list(ALLERGIES_FILE) == [['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  You can solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [201]:
import json

def allergy_reaction(json_file, patient, diagnosis):
  """
  This function parses a json file containing allergy information and extracts the allergic reaction
  associated with a given patient and their diagnosed allergy.
  returns:
    str: The reaction associated with the specified patient and diagnosis, or None if not found.
  test:
  >>> allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')
  'Hives'
  """
  with open(json_file) as file:
    data = json.load(file)
    entries = data['entry']

    # Iterate through entries and find matching patient and diagnosis
    for entry in entries:
      patient_name = entry.get('resource').get('patient').get('display')
      all_med = entry.get('resource').get('substance').get('text')
      all_react = entry.get('resource').get('reaction')[0].get('manifestation')[0].get('text')

      # Check if current entry matches the provided patient and diagnosis
      if patient_name == patient and all_med == diagnosis:
        # If it matches, return the allergic reaction
        return all_react

    # If no match is found after iterating through all entries, return None
    return None

In [202]:
allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')

'Hives'

In [203]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to save this notebook file back to GitHub.  To do that in Google Colab:
1. File -> Save a Copy in GitHub
2. Make sure your HDS5210 repository is selected
3. Make sure the file name includes the week number like this: `week06/week06_assignment_2.ipynb`
4. Add a commit message that means something

**Be sure week names are lowercase and use a two digit week number!!**

**Be sure you use the same file name provided by the instructor!!**

