### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [1]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [2]:
def calculate_bmi(weight_kg, height_m):
    """
    Calculate the Body Mass Index (BMI).

    Parameters:
    weight_kg (float): The weight of the individual in kilograms.
    height_m (float): The height of the individual in meters.

    Returns:
    float: The calculated BMI.

    Raises:
    ValueError: If weight or height is not positive.

    >>> calculate_bmi(70, 1.75)
    22.857142857142858
    >>> calculate_bmi(0, 1.75)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive values.
    >>> calculate_bmi(70, 0)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive values.
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive values.")

    bmi = weight_kg / (height_m ** 2)
    return bmi


### Step 2: Calculate BODE Score

In [6]:
def calculate_bode_score(bmi, fev_pct, dyspnea_grade, distance_m):
    """
    Calculate the BODE score based on BMI, FEV1 percentage, dyspnea grade, and distance walked.

    Parameters:
    bmi (float): The Body Mass Index of the patient.
    fev_pct (float): The FEV1 percentage predicted.
    dyspnea_grade (int): The dyspnea grade (0-4).
    distance_m (float): The distance walked in meters.

    Returns:
    tuple: A tuple containing the BODE score and risk category.

    Raises:
    ValueError: If dyspnea_grade is not between 0 and 4.

    Examples:
    >>> calculate_bode_score(22, 75, 2, 300)
    (4, 'Moderate risk')
    >>> calculate_bode_score(30, 40, 3, 100)
    (8, 'High risk')
    >>> calculate_bode_score(24, 85, 0, 400)
    (1, 'Low risk')
    >>> calculate_bode_score(22, 70, 5, 200)
    Traceback (most recent call last):
        ...
    ValueError: Dyspnea grade must be between 0 and 4.
    """

    # Calculate BODE points
    # BMI points
    if bmi < 21:
        bmi_points = 1
    elif 21 <= bmi < 25:
        bmi_points = 0
    elif 25 <= bmi < 30:
        bmi_points = 1
    else:
        bmi_points = 2

    # FEV1 points
    if fev_pct < 50:
        fev_points = 3
    elif 50 <= fev_pct < 65:
        fev_points = 2
    elif 65 <= fev_pct < 80:
        fev_points = 1
    else:
        fev_points = 0

    # Dyspnea points
    if dyspnea_grade < 0 or dyspnea_grade > 4:
        raise ValueError("Dyspnea grade must be between 0 and 4.")

    dyspnea_points = dyspnea_grade

    # Distance points
    if distance_m < 150:
        distance_points = 3
    elif 150 <= distance_m < 250:
        distance_points = 2
    elif 250 <= distance_m < 350:
        distance_points = 1
    else:
        distance_points = 0

    # Total BODE score
    bode_score = bmi_points + fev_points + dyspnea_points + distance_points

    # Risk category based on BODE score
    if bode_score <= 2:
        risk_category = 'Low risk'
    elif 3 <= bode_score <= 5:
        risk_category = 'Moderate risk'
    else:
        risk_category = 'High risk'

    return bode_score, risk_category



### Step 3: Calculate BODE Risk

In [7]:
def calculate_bmi(weight_kg, height_m):
    """
    Calculate the Body Mass Index (BMI).

    Parameters:
    weight_kg (float): The weight of the patient in kilograms.
    height_m (float): The height of the patient in meters.

    Returns:
    float: The calculated BMI value.

    Raises:
    ValueError: If weight is non-positive or height is non-positive.
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive values.")

    return weight_kg / (height_m ** 2)

def calculate_bode_score(bmi, fev_pct, dyspnea_grade, distance_m):
    """
    Calculate the BODE score based on BMI, FEV1 percentage, dyspnea grade, and distance walked.

    Parameters:
    bmi (float): The Body Mass Index of the patient.
    fev_pct (float): The FEV1 percentage predicted.
    dyspnea_grade (int): The dyspnea grade (0-4).
    distance_m (float): The distance walked in meters.

    Returns:
    tuple: A tuple containing the BODE score and risk category.

    Raises:
    ValueError: If dyspnea_grade is not between 0 and 4.

    Examples:
    >>> calculate_bode_score(22, 75, 2, 300)
    (4, 'Moderate risk')
    >>> calculate_bode_score(30, 40, 3, 100)
    (8, 'High risk')
    >>> calculate_bode_score(24, 85, 0, 400)
    (1, 'Low risk')
    >>> calculate_bode_score(22, 70, 5, 200)
    Traceback (most recent call last):
        ...
    ValueError: Dyspnea grade must be between 0 and 4.
    """

    # Calculate BODE points
    # BMI points
    if bmi < 21:
        bmi_points = 1
    elif 21 <= bmi < 25:
        bmi_points = 0
    elif 25 <= bmi < 30:
        bmi_points = 1
    else:
        bmi_points = 2

    # FEV1 points
    if fev_pct < 50:
        fev_points = 3
    elif 50 <= fev_pct < 65:
        fev_points = 2
    elif 65 <= fev_pct < 80:
        fev_points = 1
    else:
        fev_points = 0

    # Dyspnea points
    if dyspnea_grade < 0 or dyspnea_grade > 4:
        raise ValueError("Dyspnea grade must be between 0 and 4.")

    dyspnea_points = dyspnea_grade

    # Distance points
    if distance_m < 150:
        distance_points = 3
    elif 150 <= distance_m < 250:
        distance_points = 2
    elif 250 <= distance_m < 350:
        distance_points = 1
    else:
        distance_points = 0

    # Total BODE score
    bode_score = bmi_points + fev_points + dyspnea_points + distance_points

    # Risk category based on BODE score
    if bode_score <= 2:
        risk_category = 'Low risk'
    elif 3 <= bode_score <= 5:
        risk_category = 'Moderate risk'
    else:
        risk_category = 'High risk'

    return bode_score, risk_category


### Step 4: Load Hospital Data

In [9]:
import csv

def load_patient_data(file_path):
    """
    Load patient data from a CSV file and calculate BODE scores and risks.

    Parameters:
    file_path (str): The path to the patient input CSV file.

    Returns:
    list: A list of dictionaries containing patient BODE information.

    Raises:
    ValueError: If there is an issue with the data (e.g., missing or invalid values).

    Examples:
    >>> load_patient_data('patients.csv')
    [{'NAME': 'John Doe', 'BODE_SCORE': 4, 'BODE_RISK': 'Moderate risk', 'HOSPITAL': 'City Hospital'}, ...]
    """
    patients = []

    with open(file_path, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        for row in reader:
            try:
                # Parse the relevant fields
                name = row['NAME']
                weight_kg = float(row['WEIGHT_KG'])
                height_m = float(row['HEIGHT_M'])
                fev_pct = float(row['fev_pct'])
                dyspnea_description = row['dyspnea_description']
                distance_meters = float(row['distance_in_meters'])
                hospital = row['hospital']

                # Calculate BMI
                bmi = calculate_bmi(weight_kg, height_m)

                # Map dyspnea description to grade (assuming it's given as "0" to "4")
                dyspnea_grade_mapping = {
                    'None': 0,
                    'Mild': 1,
                    'Moderate': 2,
                    'Severe': 3,
                    'Very Severe': 4
                }

                dyspnea_grade = dyspnea_grade_mapping.get(dyspnea_description, None)
                if dyspnea_grade is None:
                    raise ValueError(f"Invalid dyspnea description: {dyspnea_description}")

                # Calculate BODE score and risk
                bode_score, bode_risk = calculate_bode_score(bmi, fev_pct, dyspnea_grade, distance_meters)

                # Append results to patients list
                patients.append({
                    'NAME': name,
                    'BODE_SCORE': bode_score,
                    'BODE_RISK': bode_risk,
                    'HOSPITAL': hospital
                })

            except ValueError as e:
                print(f"Error processing row {row}: {e}")

    return patients



### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [16]:
import csv

def calculate_bmi(weight_kg, height_m):
    """
    Calculate the Body Mass Index (BMI).

    Parameters:
    weight_kg (float): The weight of the patient in kilograms.
    height_m (float): The height of the patient in meters.

    Returns:
    float: The calculated BMI value.

    Raises:
    ValueError: If weight is non-positive or height is non-positive.
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive values.")

    return weight_kg / (height_m ** 2)

def calculate_bode_score(bmi, fev_pct, dyspnea_grade, distance_m):
    """
    Calculate the BODE score based on BMI, FEV1 percentage, dyspnea grade, and distance walked.

    Parameters:
    bmi (float): The Body Mass Index of the patient.
    fev_pct (float): The FEV1 percentage predicted.
    dyspnea_grade (int): The dyspnea grade (0-4).
    distance_m (float): The distance walked in meters.

    Returns:
    tuple: A tuple containing the BODE score and risk category.

    Raises:
    ValueError: If dyspnea_grade is not between 0 and 4.
    """
    # Calculate BODE points
    # BMI points
    if bmi < 21:
        bmi_points = 1
    elif 21 <= bmi < 25:
        bmi_points = 0
    elif 25 <= bmi < 30:
        bmi_points = 1
    else:
        bmi_points = 2

    # FEV1 points
    if fev_pct < 50:
        fev_points = 3
    elif 50 <= fev_pct < 65:
        fev_points = 2
    elif 65 <= fev_pct < 80:
        fev_points = 1
    else:
        fev_points = 0

    # Dyspnea points
    if dyspnea_grade < 0 or dyspnea_grade > 4:
        raise ValueError("Dyspnea grade must be between 0 and 4.")

    dyspnea_points = dyspnea_grade

    # Distance points
    if distance_m < 150:
        distance_points = 3
    elif 150 <= distance_m < 250:
        distance_points = 2
    elif 250 <= distance_m < 350:
        distance_points = 1
    else:
        distance_points = 0

    # Total BODE score
    bode_score = bmi_points + fev_points + dyspnea_points + distance_points

    # Risk category based on BODE score
    if bode_score <= 2:
        risk_category = 'Low risk'
    elif 3 <= bode_score <= 5:
        risk_category = 'Moderate risk'
    else:
        risk_category = 'High risk'

    return bode_score, risk_category

def load_patient_data(file_path):
    """
    Load patient data from a CSV file and calculate BODE scores and risks.

    Parameters:
    file_path (str): The path to the patient input CSV file.

    Returns:
    list: A list of dictionaries containing patient BODE information.

    Raises:
    ValueError: If there is an issue with the data (e.g., missing or invalid values).
    """
    patients = []

    with open(file_path, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        for row in reader:
            try:
                # Parse the relevant fields
                name = row['NAME']
                weight_kg = float(row['WEIGHT_KG'])
                height_m = float(row['HEIGHT_M'])
                fev_pct = float(row['fev_pct'])
                dyspnea_description = row['dyspnea_description']
                distance_meters = float(row['distance_in_meters'])
                hospital = row['hospital']

                # Calculate BMI
                bmi = calculate_bmi(weight_kg, height_m)

                # Map dyspnea description to grade
                dyspnea_grade_mapping = {
                    'None': 0,
                    'Mild': 1,
                    'Moderate': 2,
                    'Severe': 3,
                    'Very Severe': 4
                }

                dyspnea_grade = dyspnea_grade_mapping.get(dyspnea_description, None)
                if dyspnea_grade is None:
                    raise ValueError(f"Invalid dyspnea description: {dyspnea_description}")

                # Calculate BODE score and risk
                bode_score, bode_risk = calculate_bode_score(bmi, fev_pct, dyspnea_grade, distance_meters)

                # Append results to patients list
                patients.append({
                    'NAME': name,
                    'BODE_SCORE': bode_score,
                    'BODE_RISK': bode_risk,
                    'HOSPITAL': hospital
                })

            except ValueError as e:
                print(f"Error processing row {row}: {e}")

    return patients

def main(file_path):
    """
    Main function to load patient data, calculate BODE scores and risks,
    and generate summary statistics for each hospital.

    Parameters:
    file_path (str): The path to the patient input CSV file.

    Returns:
    tuple: A list of patients with their BODE information and a summary of hospitals.
    """
    # Load patient data
    patients = load_patient_data(file_path)

    # Initialize hospital summary dictionary
    hospital_summary = {}

    for patient in patients:
        hospital = patient['HOSPITAL']
        bode_score = patient['BODE_SCORE']
        bode_risk = patient['BODE_RISK']

        # Initialize hospital data if not present
        if hospital not in hospital_summary:
            hospital_summary[hospital] = {
                'COPD_COUNT': 0,
                'TOTAL_BODE_SCORE': 0,
                'RISK_COUNT': {
                    'Low risk': 0,
                    'Moderate risk': 0,
                    'High risk': 0
                }
            }

        # Update hospital summary data
        hospital_summary[hospital]['COPD_COUNT'] += 1
        hospital_summary[hospital]['TOTAL_BODE_SCORE'] += bode_score
        hospital_summary[hospital]['RISK_COUNT'][bode_risk] += 1

    # Prepare output data
    patient_output = [
        {
            'NAME': patient['NAME'],
            'BODE_SCORE': patient['BODE_SCORE'],
            'BODE_RISK': patient['BODE_RISK'],
            'HOSPITAL': patient['HOSPITAL']
        }
        for patient in patients
    ]

    hospital_output = []
    for hospital, data in hospital_summary.items():
        avg_score = data['TOTAL_BODE_SCORE'] / data['COPD_COUNT']
        avg_risk = max(data['RISK_COUNT'], key=data['RISK_COUNT'].get)  # Get the most frequent risk category

        hospital_output.append({
            'HOSPITAL_NAME': hospital,
            'COPD_COUNT': data['COPD_COUNT'],
            'PCT_OF_COPD_CASES_OVER_BEDS': 0,  # Placeholder for future use
            'AVG_SCORE': avg_score,
            'AVG_RISK': avg_risk
        })

    return patient_output, hospital_output

def save_output(patient_output, hospital_output):
    """
    Save the patient and hospital output data to CSV files.

    Parameters:
    patient_output (list): A list of patient data dictionaries.
    hospital_output (list): A list of hospital data dictionaries.
    """
    # Save patient output to CSV
    with open('patient_output.csv', 'w', newline='') as csvfile:
        fieldnames = ['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for patient in patient_output:
            writer.writerow(patient)

    # Save hospital output to CSV
    with open('hospital_output.csv', 'w', newline='') as csvfile:
        fieldnames = ['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for hospital in hospital_output:
            writer.writerow(hospital)

if __name__ == "__main__":
    patient_file_path = '/content/patient.csv'  # Updated file path
    patient_output, hospital_output = main(patient_file_path)
    save_output(patient_output, hospital_output)


Error processing row {'NAME': 'Vanessa Roberts', 'SSN': '295-82-3703', 'LANGUAGE': 'Belarusian', 'JOB': 'Teacher English as a foreign language', 'HEIGHT_M': '1.72', 'WEIGHT_KG': '90.28', 'fev_pct': '57.73', 'dyspnea_description': 'STOPS AFTER A FEW MINUTES', 'distance_in_meters': '367.9', 'hospital': "ST.LUKE'S"}: Invalid dyspnea description: STOPS AFTER A FEW MINUTES
Error processing row {'NAME': 'Christopher Fox', 'SSN': '286-30-9664', 'LANGUAGE': 'Macedonian', 'JOB': 'Local government officer', 'HEIGHT_M': '1.64', 'WEIGHT_KG': '83.09', 'fev_pct': '61.6', 'dyspnea_description': 'WHEN HURRYING', 'distance_in_meters': '184.16', 'hospital': 'SAINT LOUIS UNIVERSITY'}: Invalid dyspnea description: WHEN HURRYING
Error processing row {'NAME': 'Benjamin Johnston', 'SSN': '139-07-4381', 'LANGUAGE': 'Kirghiz', 'JOB': 'Multimedia programmer', 'HEIGHT_M': '1.61', 'WEIGHT_KG': '94.91', 'fev_pct': '83.11', 'dyspnea_description': 'BREATHLESS WHEN DRESSING', 'distance_in_meters': '260.66', 'hospital