# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In this GitHub repository, there is a file called `allergies.json` that contains a list of patient allergies.  You will need to download this [file from here](https://raw.githubusercontent.com/paulboal/hds5210-2023/main/week06/allergies.json) and then upload it into Google Colab to run these examples. It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [1]:
import json
ALLERGIES_FILE= json.load(open("allergies.json"))
data = ALLERGIES_FILE

In [2]:
def allergy_count(json_file):
  """
  Evaluate the number of entries in a JSON file which contains allergy data.

  This function calculates the count of entries, in the JSON data if the 'entry' key is available. In case the 'entry' key is not found it will return a value of 0.

  Args:
   json_file (str): The name of JSON file that will be analyzed.

  Returns:
   int: If the 'entry' list is present, in the JSON file it will return the number of entries it contains. Otherwise it will return 0.

   """
  if 'entry' in data:
    return len(data['entry'])
  else:
    return 0


In [3]:
allergy_count(ALLERGIES_FILE)

4

In [4]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

In [5]:
type(allergy_count(ALLERGIES_FILE))

int

In [6]:
allergy_count(ALLERGIES_FILE)

4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [7]:
def patient_count(json_file):
    """
    Estimate the number of distinct patients in a file containing allergy data.

    This function runs over the 'entry' list in the JSON data, extracts patient names, and calculates the number of distinct patients. It then returns this count as an integer.

    Args:
        json_file (str): The name of JSON file that will be analyzed.

    Returns:
        int: The number of distinct patients in the 'entry' list of the JSON file.
    """
    patients = set()
    for entry in data.get('entry', []):
        patient = entry.get('resource', {}).get('patient', {}).get('display', "")
        if patient:
            patients.add(patient)
    return len(patients)


In [8]:
patient_count(ALLERGIES_FILE)

2

In [9]:
assert type(patient_count(ALLERGIES_FILE)) == int
assert patient_count(ALLERGIES_FILE) == 2

In [10]:
type(patient_count(ALLERGIES_FILE))

int

In [11]:
patient_count(ALLERGIES_FILE)

2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [12]:
"""
    estimate the number of allergies per patient has in JSON file having allergy data.

    This function runs through the 'entry' list in the JSON data, retrieves the names of patients and calculates the total number of allergies for each patient. It provides a dictionary with patient names as keys and their corresponding allergy counts as values.

    Args:
     json_file (str): The name of JSON file that will be analyzed.

     Returns:
      dict: A dictionary where keys represent patient names, the values indicate the number of allergies each patient has.
"""

def allergy_per_patient(json_file):
  allergies_per_patient ={}
  for entry in data.get('entry', []):
    patient = entry.get('resource',{}).get('patient', {}).get('display', '')
    if patient:
      allergies_per_patient[patient] = allergies_per_patient.get(patient, 0) + 1
  return allergies_per_patient

In [13]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

In [14]:
assert type(allergy_per_patient(ALLERGIES_FILE)) == dict
assert allergy_per_patient(ALLERGIES_FILE) == {'Paul Boal': 1, 'Jason Argonaut': 3}

In [15]:
type(allergy_per_patient(ALLERGIES_FILE))

dict

In [16]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [17]:
"""

    From JSON file extract allergy data and return it as a list of lists

    Args:
      json_file (str): This file contains allergy data
    Returns:
      list : collection of lists, where each individual list includes the name of the patient, their allergy and if applicable, any reaction they may have experienced. The format for each entry follows this structure; [patient_name, allergy, reaction].
"""
def allergy_list(json_file):
  output_list = []
  if 'entry' in data:
      for entry in data['entry']:
          resource = entry.get('resource', {})
          patient_name = resource.get("patient", {}).get("display", None)
          allergy = resource.get("substance", {}).get("text", None)
          reaction = None
          reactions = resource.get('reaction',[])
          if reactions:
              first_reaction = reactions[0].get("manifestation", [])
              if first_reaction:
                  reaction = first_reaction[0].get("text", None)
          output_list.append([patient_name, allergy, reaction])
  return output_list

In [18]:
assert allergy_list(ALLERGIES_FILE) == [['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]


In [19]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]

### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  You can solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [20]:
"""
    Understand the potential effects that may occur if the patient consumes the indicated substance.

    Args:
        json_file (str): This file contains allergy data
        patient (str): Patient's name
        substance (str): Substances name

    Returns:
        str or None: If the patient is allergic to the substance, their reaction will be noted. If no allergy is found, it will be stated as None.
    """
def allergy_reaction(json_file, patient, substance):
    # Get the allergy list using the allergy_list function
    allergy_data = allergy_list(json_file)

    # Initialize the reaction to None
    reaction = None

    # Iterate through the allergy data to find the specified patient and substance
    for entry in allergy_data:
        entry_patient, entry_substance, entry_reaction = entry

        # Check if the current entry matches both the patient and substance
        if entry_patient == patient and entry_substance == substance:
            reaction = entry_reaction  # Update the reaction if a match is found

    # Return the final reaction (can be None if no match is found)
    return reaction


In [21]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

In [22]:
allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')

'Hives'

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to save this notebook file back to GitHub.  To do that in Google Colab:
1. File -> Save a Copy in GitHub
2. Make sure your HDS5210 repository is selected
3. Make sure the file name includes the week number like this: `week06/week06_assignment_2.ipynb`
4. Add a commit message that means something

**Be sure week names are lowercase and use a two digit week number!!**

**Be sure you use the same file name provided by the instructor!!**

