<a href="https://colab.research.google.com/github/anacarmona1/HDS-5210-Ana/blob/main/midterm_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [1]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [2]:

def bmi(weight_kg, height_m):
  """ This function calculates BMI using the following formula:
Weight (kg) / height (m) ** 2
>>> round(bmi(90.7, 1.82), 2)
27.38
>>> round(bmi(91, 1.80), 2)
28.09
>>> round(bmi(70, 1.70), 2)
24.22
>>> round(bmi(50, 1.66), 2)
18.14
"""
  if weight_kg <= 0 or height_m <= 0:
    raise ValueError("Weight and Height must be greater than 0")
  bmi = weight_kg / height_m ** 2
  return bmi

In [48]:
doctest.run_docstring_examples(bmi, globals(), verbose = True)

Finding tests in NoName
Trying:
    round(bmi(90.7, 1.82), 2)
Expecting:
    27.38
ok
Trying:
    round(bmi(91, 1.80), 2)
Expecting:
    28.09
ok
Trying:
    round(bmi(70, 1.70), 2)
Expecting:
    24.22
ok
Trying:
    round(bmi(50, 1.66), 2)
Expecting:
    18.14
ok


In [3]:
assert round(bmi(90.7, 1.82), 2) == 27.38
assert round(bmi(91, 1.80), 2) == 28.09
assert round(bmi(70, 1.70), 2) == 24.22
assert round(bmi(50, 1.66), 2) == 18.14

### Step 2: Calculate BODE Score

In [4]:
def bode_score(fev1, walk_distance_6min, mmcr_dyspnea, bmi):

  """ This formula calculates the BODE score based on the folllowing parameters:
  FEV1 (% predicted)  => 65 = 0 points
  ..................  > 50-64 = 1 point
  ..................  > 36-49 = 2 points
  ..................  <= 35 = 3 points
  6 min walk distance =>350 = 0 points
  ................... 250-349 = 1 point
  ................... 150-249 = 2 points
  ................... <= 149 = 3 points
  mMRC Dyspnea Scale  0-1 = 0 points
  ................... 2 = 1 point
  ................... 3 = 2 points
  ................... 4 = 3 points
  BMI > 21 = 0 points
  ... < 21 = 1 point
  >>> bode_score(50, 250, 3, 20)
  5
  >>> bode_score(67, 350, 1, 23)
  0
  >>> bode_score(33, 120, 4, 19)
  10
  """
  if fev1 <= 0 or walk_distance_6min <= 0 or bmi <= 0: #Except mMRC Dyspnea Scale, which can be 0
    raise ValueError("Parameters must be greater than 0")
  points = 0
  #Calculating fev1 points

  if fev1 >= 65:
   points += 0
  elif 50 <= fev1 < 64:
   points += 1
  elif 36 <=  fev1 < 49:
   points += 2
  elif fev1 <= 35:
   points += 3

  #Calculating 6 min walking distance points

  if walk_distance_6min >= 350:
   points += 0
  elif 250 <= walk_distance_6min < 349:
   points += 1
  elif 150 <= walk_distance_6min < 249:
   points += 2
  elif walk_distance_6min <= 149:
   points += 3

  #Calculating mMRC points

  if mmcr_dyspnea == 0-1:
   points += 0
  elif mmcr_dyspnea == 2:
   points += 1
  elif mmcr_dyspnea == 3:
   points += 2
  elif mmcr_dyspnea == 4:
   points +=3

#Calculating bmi points

  if bmi >= 21:
   points += 0
  elif bmi < 21:
   points += 1

  return points


In [6]:
doctest.run_docstring_examples(bode_score, globals(), verbose = True )

Finding tests in NoName
Trying:
    bode_score(50, 250, 3, 20)
Expecting:
    5
ok
Trying:
    bode_score(67, 350, 1, 23)
Expecting:
    0
ok
Trying:
    bode_score(33, 120, 4, 19)
Expecting:
    10
ok


In [7]:
assert bode_score(50, 250, 3, 20) == 5
assert bode_score(67, 350, 1, 23) == 0
assert bode_score(33, 120, 4, 19) == 10

### Step 3: Calculate BODE Risk

In [8]:
def bode_risk(bode_score):
  """This function calculates the BODE risk based on the BODE score, returning the 4-year survival % depending on the BODE score obtained.
  0-2 = 80%
  3-4 = 64%
  5-6 = 57%
  7-10 = 18%
  >>> bode_risk(5)
  57
  >>> bode_risk(8)
  18
  >>> bode_risk(0)
  80
  """

  if 0 <= bode_score <= 2:
    return 80
  elif 3 <= bode_score <= 4:
    return 64
  elif 5 <= bode_score <= 6:
   return 57
  elif 7 <= bode_score <= 10:
    return 18
  else:
    raise ValueError("BODE score must be between 0 and 10")


In [9]:
assert bode_risk(5) == 57
assert bode_risk(8) == 18
assert bode_risk(0) == 80

In [10]:
doctest.run_docstring_examples(bode_risk, globals(), verbose = True)

Finding tests in NoName
Trying:
    bode_risk(5)
Expecting:
    57
ok
Trying:
    bode_risk(8)
Expecting:
    18
ok
Trying:
    bode_risk(0)
Expecting:
    80
ok


### Step 4: Load Hospital Data

In [11]:
import json
from pathlib import Path
HOSPITAL_DATA = Path("hospitals.json")
with HOSPITAL_DATA.open() as h:
    hospitals = json.load(h)


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [17]:
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

#CSV FILE
with open(patient_csv, newline= '') as csvfile:
 reader = csv.DictReader(csvfile)
 patient_data = [row for row in reader]

patient_results = []

for patient in patient_data:
  fev1 = float(patient['fev_pct'])
  walk_distance_6min = float(patient['distance_in_meters'])
  mmcr_dyspnea = patient['dyspnea_description']
  weight = float(patient['WEIGHT_KG'])
  height = float(patient['HEIGHT_M'])
  bmi_value = bmi(weight, height)

  # Converting string values into Integers for the dyspnea description column 2
  dyspnea_description_dict = {"ONLY STRENUOUS EXERCISE": 0,
                         "WHEN HURRYING": 1,
                         "WALKING UPHILL": 1,
                         "SLOWER THAN PEERS": 2,
                         "STOPS WHEN WALKING AT PACE": 2,
                         "STOPS AFTER A FEW MINUTES": 3,
                         "STOPS AFTER 100 YARDS": 3,
                         "BREATHLESS WHEN DRESSING": 4,
                         "UNABLE TO LEAVE HOME": 4,

                         }

  mmcr_dyspnea = dyspnea_description_dict.get(mmcr_dyspnea, 0)
  bode_score_value = bode_score(fev1, walk_distance_6min, mmcr_dyspnea, bmi_value)
  bode_risk_value = bode_risk(bode_score_value)

  patient_results.append([patient ['NAME'], bode_score_value, bode_risk_value, patient ['hospital']])

#Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['NAME', 'BODE_SCORE','BODE_RISK', 'HOSPITAL'])
    writer.writerows(patient_results)

with open(hospital_json) as json_file:
   hospital_data = json.load(json_file)

   hospital_results = []
   hospital_results_dict = {}
   for patient in patient_results:
    hospital = patient[3]
    bode_score_value = patient[1]
    bode_risk_value = patient[2]
    if hospital not in hospital_results_dict:
      hospital_results_dict[hospital] = {
          "bode_scores": [],
          "bode_risks": [],
          "copd_count": 0
      }
    hospital_results_dict[hospital]["bode_scores"].append(bode_score_value)
    hospital_results_dict[hospital]["bode_risks"].append(bode_risk_value)
    hospital_results_dict[hospital]["copd_count"] += 1
#avergae percentage for each hospital
for hospital in hospital_results_dict:
  total_bode_score = sum(hospital_results_dict[hospital]["bode_scores"])
  total_bode_risk = sum(hospital_results_dict[hospital]["bode_risks"])
  copd_count = hospital_results_dict[hospital]["copd_count"]

  avg_bode_score = round(total_bode_score / copd_count, 2)
  avg_bode_risk = round(total_bode_risk / copd_count, 2)

  bed_count = 'Unknown'
  for system in hospital_data:
    for hosp in system['hospitals']:
      if hosp['name'] == hospital:
        bed_count = hosp.get('beds', 'Unknown')
        break
      if bed_count != 'Unknown':
        break
  if bed_count != 'Unknown':
    bed_count = int(bed_count)
    pc_of_copd_cases = round ((copd_count / bed_count) * 100, 2)
  else:
    pc_of_copd_cases = 'Unknown'

  hospital_results.append([hospital, copd_count, pc_of_copd_cases, avg_bode_score, avg_bode_risk])
#Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['hospital_name', 'copd_count', 'pc_of_copd_cases_over_beds','avg_score', 'avg_risk' ])
    writer.writerows(hospital_results)