<a href="https://colab.research.google.com/github/Bhavyasaradhi/HDS5210_InClass/blob/master/midterm/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [13]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [14]:
def step1_bmi(weight_kg, height_m):
    """(int, float) -> float

    Calculate BMI

    >>> step1_bmi(60, 1.60)
    23.44

    >>> step1_bmi(96, 1.66)
    34.84

    """
    # weight / height ** 2
    return round((weight_kg / (height_m ** 2)), 2)

In [15]:
# step1_bmi(96, 1.66)

In [16]:
doctest.run_docstring_examples(step1_bmi, globals(),verbose=True)

Finding tests in NoName
Trying:
    step1_bmi(60, 1.60)
Expecting:
    23.44
ok
Trying:
    step1_bmi(96, 1.66)
Expecting:
    34.84
ok


### Step 2: Calculate BODE Score

In [17]:
def step2_bode(bmi, fev, dyspnea, dist):
    """(int, int, str, int) -> int

    Calculate BODE score using BMI, FEV1(% of predicted), dyspnea description, and distance walked(6 mins).

    Dyspnea descriptions are mapped to numerical values(mMRC) based on (from the mdcalc website):
    - 0: Dyspnea only with strenuous exercise
    - 1: Dyspnea when hurrying or walking up a slight hill
    - 2: Walks slower than people of same age because of dyspnea or stops for breath when walking at own pace
    - 3: Stops for breath after walking 100 yards (91 m) or after a few minutes
    - 4: Too dyspneic to leave house or breathless when dressing

    >>> step2_bode(34, 34, "Dyspnea only with strenuous exercise", 340)
    4
    >>> step2_bode(22, 33, "Too dyspneic to leave house or breathless when dressing", 150)
    8

    """
    # BMI score
    bmi_score = 0 if bmi > 21 else 1

    # FEV1% score
    if fev >= 65:
        fev_score = 0
    elif 50 <= fev < 65:
        fev_score = 1
    elif 36 <= fev < 50:
        fev_score = 2
    else:
        fev_score = 3

    # Map dyspnea to there appropriate numbers
    dmap = {
        "Dyspnea only with strenuous exercise": 0,
        "Dyspnea when hurrying or walking up a slight hill": 1,
        "Walks slower than people of same age because of dyspnea or stops for breath when walking at own pace": 2,
        "Stops for breath after walking 100 yards (91 m) or after a few minutes": 3,
        "Too dyspneic to leave house or breathless when dressing": 4,

        'ONLY STRENUOUS EXERCISE' : 0,
        'WHEN HURRYING': 1,
        'WALKING UPHILL': 1,
        'SLOWER THAN PEERS': 2,
        'STOPS WHEN WALKING AT PACE': 2,
        'STOPS AFTER A FEW MINUTES': 3,
        'STOPS AFTER 100 YARDS': 3,
        'BREATHLESS WHEN DRESSING': 4,
        'UNABLE TO LEAVE HOME': 4

    }

    dyspnea_level = dmap[dyspnea]

    # Convert dyspnea level to points
    if dyspnea_level in [0, 1]:
        dyspnea_score = 0
    elif dyspnea_level == 2:
        dyspnea_score = 1
    elif dyspnea_level == 3:
        dyspnea_score = 2
    elif dyspnea_level == 4:
        dyspnea_score = 3

    # Calculate walking distance score
    if dist >= 350:
        dist_score = 0
    elif 250 <= dist < 350:
        dist_score = 1
    elif 150 <= dist < 250:
        dist_score = 2
    else:
        dist_score = 3

    # Sum them all
    bode = bmi_score + fev_score + dyspnea_score + dist_score
    return bode

In [18]:
# step2_bode(22, 33, "Too dyspneic to leave house or breathless when dressing", 150)

In [19]:
doctest.run_docstring_examples(step2_bode, globals(),verbose=True)

Finding tests in NoName
Trying:
    step2_bode(34, 34, "Dyspnea only with strenuous exercise", 340)
Expecting:
    4
ok
Trying:
    step2_bode(22, 33, "Too dyspneic to leave house or breathless when dressing", 150)
Expecting:
    8
ok


### Step 3: Calculate BODE Risk

In [20]:
def step3_risk(score):
    """(int) -> str
    Calculate 4-year survival risk based on BODE index score which is calculated in previous step.

    The rates correspond to BODE index:
    - 0–2: 80%
    - 3–4: 67%
    - 5–6: 57%
    - 7–10: 18%

    >>> step3_risk(4)
    '67%'

    >>> step3_risk(8)
    '18%'

    """

    # converted rates according to the site
    if score <= 2:
        return "80%"
    elif 3 <= score <= 4:
        return "67%"
    elif 5 <= score <= 6:
        return "57%"
    elif 7 <= score <= 10:
        return "18%"


In [21]:
doctest.run_docstring_examples(step3_risk, globals(),verbose=True)

Finding tests in NoName
Trying:
    step3_risk(4)
Expecting:
    '67%'
ok
Trying:
    step3_risk(8)
Expecting:
    '18%'
ok


### Step 4: Load Hospital Data

In [22]:
def patient_data(c):
    """(str) -> list

    Load patient data from provided CSV file(patient.csv).
    """
    with open(c, 'r') as file:

        reader = csv.DictReader(file)
        patient = list(reader)

    # print(type(patient))
    return patient

def hospital_data(js):
    """(str) -> list

    Load hospital data from provided JSON file(hospitals.json).
    """
    with open(js, 'r') as file:

        hospital = json.load(file)

    # print(type(hospital))
    return hospital

In [23]:
# patient_data('/content/patient.csv')

In [24]:
hospital_data('/content/hospitals.json')

[{'system': 'BJC',
  'hospitals': [{'name': 'BJC', 'beds': 2000},
   {'name': 'BJC WEST COUNTY', 'beds': 1000},
   {'name': 'MISSOURI BAPTIST', 'beds': 800}]},
 {'system': 'SSM',
  'hospitals': [{'name': 'SAINT LOUIS UNIVERSITY', 'beds': 1000},
   {'name': "ST.MARY'S", 'beds': 500}]},
 {'system': "ST.LUKE'S", 'hospitals': [{'name': "ST.LUKE'S", 'beds': 800}]}]

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [25]:
# file names for input and output data
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

# Initialize
patient_results = []
hospital_output_list = []

# Including headers
patient_results = [['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL']]
hospital_output_list = [['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']]

hospital_names = []
hospital_counts = []
hospital_total_scores = []
hospital_total_risks = []

# Load given data using file names
patients = patient_data(patient_csv)
hospitals = hospital_data(hospital_json)

# Loop each patient and get details like name, height, etc.
for p in patients:
    name = p['NAME']
    h = float(p['HEIGHT_M'])
    w = float(p['WEIGHT_KG'])
    pct = float(p['fev_pct'])
    dys = p['dyspnea_description']
    dist = float(p['distance_in_meters'])
    hosp = p['hospital']

    # Calculate BMI, BODE score, and risk using previosuly defined functions
    bmi = step1_bmi(w, h)
    bode = step2_bode(bmi, pct, dys, dist)
    risk = step3_risk(bode)

    # Find the index
    if hosp not in hospital_names:
        hospital_names.append(hosp)  # Add the hospital name
        hospital_counts.append(0)  # Initialize
        hospital_total_scores.append(0)
        hospital_total_risks.append(0)

    index = hospital_names.index(hosp)

    # get statistics
    hospital_counts[index] += 1
    hospital_total_scores[index] += bode

    # Remove % which will be at the end of the string and convert to int
    hospital_total_risks[index] += int(risk[:-1])

    # Append data
    patient_results.append([name, bode, risk, hosp])

# Loop each hospital
for hospital_info in hospitals:
    for hosp in hospital_info['hospitals']:  # Loop through the list
        name = hosp['name']
        beds = hosp['beds']

        if name in hospital_names:  # Check
            index = hospital_names.index(name)  # index
            copd_count = hospital_counts[index]  # count of COPD cases
            total_sum_score = hospital_total_scores[index]  # total BODE
            total_sum_risk = hospital_total_risks[index]  # total risk

            # Calculate %
            pct_cases_over_beds = round((copd_count / beds) * 100, 2)
            # Calculate averages
            avg_score = round(total_sum_score / copd_count, 2)
            avg_risk = round(total_sum_risk / copd_count, 2)
        else:
            copd_count = 0
            avg_score = 0
            avg_risk = 0
            pct_cases_over_beds = 0

        # Append hospital info
        hospital_output_list.append([name, copd_count, pct_cases_over_beds, avg_score, avg_risk])


#Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(patient_results)
#Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(hospital_output_list)