### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [217]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [218]:
def cal_bmi(weight_kg , height_m):
    """
    Calculate BMI based on weight in kg and height in meters.
    Raise ValueError for invalid inputs (e.g., zero or negative values).

    >>> cal_bmi(68.04, 1.70)
    23.543252595155714

    >>> cal_bmi(0, 1.70)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive values.

    Parameters:
    - weight_kg (float): Weight of the patient in kilograms.
    - height_m (float): Height of the patient in meters.

    Returns:
    - float: Calculated BMI.
    """
    if weight_kg<=0 or height_m<=0:
        raise ValueError("Weight and height must be positive numbers.")
    BMI=weight_kg/(height_m**2)
    return BMI

In [219]:
cal_bmi(68.04,1.70)

23.543252595155714

In [220]:
cal_bmi(0,1.70)

ValueError: Weight and height must be positive numbers.

### Step 2: Calculate BODE Score

In [221]:
def normalize_dyspnea_description(description):
    """
    These are used because the dataset contains these values in the dyspnea_description

    Normalize variations of dyspnea descriptions to fit known categories.

    >>> normalize_dyspnea_description("STOPS AFTER A FEW MINUTES")
    'Severe breathlessness'

    >>> normalize_dyspnea_description("WHEN HURRYING")
    'Moderate breathlessness'
    """
    description = description.upper().strip()
    if "STOPS AFTER A FEW MINUTES" in description:
        return "Severe breathlessness"
    elif "WHEN HURRYING" in description:
        return "Moderate breathlessness"
    elif "UNABLE TO LEAVE HOME" in description:
        return "Severe breathlessness"
    elif "SLOWER THAN PEERS" in description:
        return "Moderate breathlessness"
    elif "WALKING UPHILL" in description:
        return "Moderate breathlessness"
    elif "ONLY STRENUOUS EXERCISE" in description:
        return "Mild breathlessness"
    elif "BREATHLESS WHEN DRESSING" in description:
        return "Severe breathlessness"
    elif "STOPS WHEN WALKING AT PACE" in description:
        return "Severe breathlessness"
    elif "STOPS AFTER 100 YARDS" in description:
        return "Severe breathlessness"
    return description

def cal_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters):
    """
    Calculate the BODE score based on BMI, FEV1 percentage, dyspnea description, and distance in meters.

    >>> cal_bode_score(22, 70, 'ONLY STRENUOUS EXERCISE', 400)
    1
    >>> cal_bode_score(18, 40, 'STOPS WHEN WALKING AT PACE', 200)
    8
    """
    bode_score = 0
    if bmi > 21:
        bode_score += 0
    else:
        bode_score += 1
    if fev_pct >= 65:
        bode_score += 0
    elif 50 <= fev_pct < 65:
       bode_score += 1
    elif 36 <= fev_pct < 50:
        bode_score += 2
    else:
        bode_score += 3

    # Normalize dyspnea description and map it to a score
    dyspnea_description = normalize_dyspnea_description(dyspnea_description)
    dyspnea_mapping = {
        "No breathlessness": 0,
        "Mild breathlessness": 1,
        "Moderate breathlessness": 2,
        "Severe breathlessness": 3,
    }

    dyspnea_score = dyspnea_mapping.get(dyspnea_description, None)
    if dyspnea_score is None:
        print(f"Invalid dyspnea description: {dyspnea_description}")
        raise ValueError("Invalid dyspnea description.")
    bode_score += dyspnea_score
    if distance_in_meters > 350:
        bode_score += 0
    elif 250 <= distance_in_meters <= 350:
        bode_score += 1
    elif 150 <= distance_in_meters < 250:
        bode_score += 2
    else:
        bode_score += 3

    return bode_score

In [222]:
cal_bode_score(22, 70, 'ONLY STRENUOUS EXERCISE', 400)

1

In [223]:
cal_bode_score(18, 40, 'STOPS WHEN WALKING AT PACE', 200)

8

### Step 3: Calculate BODE Risk

In [224]:
def cal_bode_risk(bode_score):
    """
    Calculate the BODE risk based on the BODE score.

    bode_score: BODE score
    return: BODE risk category
    """
    if bode_score <= 2:
        return "Low Risk"
    elif 3 <= bode_score <= 5:
        return "Moderate Risk"
    else:
        return "High Risk"

In [225]:
cal_bode_risk(2)

'Low Risk'

In [226]:
cal_bode_risk(4)

'Moderate Risk'

In [227]:
assert cal_bode_risk(3) == 'Moderate Risk'

### Step 4: Load Hospital Data

In [237]:
import csv
import json

patient_csv_path = '/content/patient.csv'
hospital_json_path = '/content/hospitals.json'
def load_patient_data(file_path):
    """
    Load patient data from a CSV file.
    """
    required_columns = {'NAME', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital'}
    patient_data = []

    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)

        missing_cols = required_columns - set(reader.fieldnames)
        if missing_cols:
            raise ValueError(f"Missing columns: {missing_cols}")
        for row in reader:
            try:
                patient_data.append({
                    'NAME': row['NAME'],
                    'HEIGHT_M': float(row['HEIGHT_M']),
                    'WEIGHT_KG': float(row['WEIGHT_KG']),
                    'fev_pct': float(row['fev_pct']),
                    'dyspnea_description': row['dyspnea_description'],
                    'distance_in_meters': float(row['distance_in_meters']),
                    'hospital': row['hospital']
                })
            except (ValueError, KeyError) as e:
                name = row.get('NAME', 'Unknown')
                print(f"Skipping {name} due to error: {e}")

    return patient_data

try:
    patients = load_patient_data(patient_csv_path)
    print(f"Loaded {len(patients)} patients from '{patient_csv_path}'.")
except ValueError as e:
    print(f"Error loading patient data: {e}")
def load_hospital_info(json_path):
    """
    Load hospital information from a JSON file.
    """
    try:
        with open(json_path, 'r') as json_file:
            return json.load(json_file)
    except json.JSONDecodeError as e:
        raise ValueError(f"Error decoding JSON: {e}")
try:
    hospitals = load_hospital_info(hospital_json_path)
    print(f"Loaded {len(hospitals)} hospitals from '{hospital_json_path}'.")
except ValueError as e:
    print(f"Error loading hospital data:{e}")



Loaded 1000 patients from '/content/patient.csv'.
Loaded 3 hospitals from '/content/hospitals.json'.


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [240]:
import csv
import json

patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

with open(hospital_json, 'r') as jsonfile:
    hospital_data = json.load(jsonfile)

hospital_metrics = {}
for entry in hospital_data:
    for hospital in entry['Hospitals']:
        hospital_metrics[hospital['Hospital']] = {
            'total_bode_score': 0,
            'total_risk': 0,
            'copd_count': 0,
            'beds': hospital['Beds']
        }
def cal_bmi(weight_kg, height_m):
    return weight_kg / (height_m ** 2) if height_m > 0 else 0

def cal_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters):
    return int(bmi) + int(fev_pct)

def cal_bode_risk(bode_score):
    return bode_score * 0.1
patient_results = []
with open(patient_csv, 'r', encoding='utf-8-sig') as csvfile:
    reader = csv.DictReader(csvfile)
    header_names = reader.fieldnames
    name_column = next((col for col in header_names if col.lower() == 'name'), None)
    if name_column is None:
        raise ValueError("Name column not found in the CSV file.")
    for row in reader:
        name = row['NAME']
        ssn = row['SSN']
        language = row['LANGUAGE']
        job = row['JOB']
        try:
            # Extract and validate data
            height_m = float(row['HEIGHT_M'])
            weight_kg = float(row['WEIGHT_KG'])
            fev_pct = float(row['fev_pct'])
            dyspnea_description = row['dyspnea_description']
            distance_in_meters = float(row['distance_in_meters'])
            hospital_name = row['hospital']

        except ValueError as e:
            print(f"Skipping {name} due to invalid data: {e}")
            continue


    bmi = cal_bmi(weight_kg, height_m)
    BODE_SCORE = cal_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters)
    BODE_RISK = cal_bode_risk(BODE_SCORE)
    patient_results.append([name, BODE_SCORE, BODE_RISK, hospital_name])

    if hospital_name in hospital_metrics:
        hospital_metrics[hospital_name]['total_bode_score'] += BODE_SCORE
        hospital_metrics[hospital_name]['total_risk'] += BODE_RISK
        hospital_metrics[hospital_name]['copd_count'] += 1

    hospital_output_list = []
    for hospital_name, metrics in hospital_metrics.items():
      copd_count = metrics['copd_count']
    if copd_count > 0:
        avg_bode_score = metrics['total_bode_score'] / copd_count
        avg_bode_risk = metrics['total_risk'] / copd_count
    else:
        avg_bode_score = 0
        avg_bode_risk = 0
    pct_of_copd_cases = (copd_count / metrics['beds']) * 100 if metrics['beds'] > 0 else 0
    hospital_output_list.append([hospital_name, copd_count, pct_of_copd_cases, avg_bode_score, avg_bode_risk])

# Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL'])
    writer.writerows(patient_results)

# Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK'])
    writer.writerows(hospital_output_list)
