### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [101]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [102]:
def calculate_bmi(weight, height):
    """
    Calculate BMI (Body Mass Index) given the weight (in kg) and height (in meters).

    Formula: BMI = weight / (height ** 2)

    >>> calculate_bmi(70, 1.75)
    22.86
    >>> calculate_bmi(50, 1.60)
    19.53
    """
    bmi = weight / (height ** 2)
    return round(bmi, 2)




In [103]:
print(calculate_bmi(70, 1.75))  # Output: 22.86
print(calculate_bmi(50, 1.60))  # Output: 19.53


22.86
19.53


### Step 2: Calculate BODE Score

In [104]:
def calculate_bode_score(bmi, fev1_percentage, mrc_score, walk_distance):
    """
    Calculate the BODE score for a patient.

    BODE score is based on:
    - BMI (Body Mass Index)
    - FEV1% predicted (Forced Expiratory Volume in 1 second as % of predicted)
    - MRC Dyspnoea Scale score
    - 6-minute walk distance (in meters)

    BODE score ranges from 0 to 10.

    >>> calculate_bode_score(22, 70, 2, 400)
    1
    >>> calculate_bode_score(18, 45, 4, 150)
    8
    """
    # BMI points (0 if BMI > 21, 1 if BMI <= 21)
    bmi_points = 1 if bmi <= 21 else 0

    # FEV1 percentage points
    if fev1_percentage >= 65:
        fev1_points = 0
    elif 50 <= fev1_percentage < 65:
        fev1_points = 1
    elif 36 <= fev1_percentage < 50:
        fev1_points = 2
    else:
        fev1_points = 3

    # MRC Dyspnoea Scale points (MRC score 1 to 5 is converted to 0 to 4)
    mrc_points = mrc_score - 1

    # 6-minute walk distance points
    if walk_distance >= 350:
        walk_points = 0
    elif 250 <= walk_distance < 350:
        walk_points = 1
    elif 150 <= walk_distance < 250:
        walk_points = 2
    else:
        walk_points = 3

    # BODE score is the sum of all points
    bode_score = bmi_points + fev1_points + mrc_points + walk_points
    return bode_score

# Running doctest
if __name__ == "__main__":
    import doctest
    doctest.testmod()


In [105]:
print(calculate_bode_score(22, 70, 2, 400))  # Output: 1
print(calculate_bode_score(18, 45, 4, 150))  # Output: 8


1
8


### Step 3: Calculate BODE Risk

In [106]:
def calculate_bode_risk(bode_score):
    """
    Calculate BODE risk category based on the BODE score.

    Risk categories:
    - 0-2: Low risk
    - 3-4: Moderate risk
    - 5-6: High risk
    - 7-10: Very high risk

    >>> calculate_bode_risk(1)
    'Low risk'
    >>> calculate_bode_risk(5)
    'High risk'
    >>> calculate_bode_risk(8)
    'Very high risk'
    """
    if bode_score <= 2:
        return 'Low risk'
    elif 3 <= bode_score <= 4:
        return 'Moderate risk'
    elif 5 <= bode_score <= 6:
        return 'High risk'
    else:
        return 'Very high risk'

# Running doctest
if __name__ == "__main__":
    import doctest
    doctest.testmod()


In [107]:
print(calculate_bode_risk(1))  # Output: 'Low risk'
print(calculate_bode_risk(5))  # Output: 'High risk'
print(calculate_bode_risk(8))  # Output: 'Very high risk'


Low risk
High risk
Very high risk


### Step 4: Load Hospital Data

In [108]:
import csv
import random
from datetime import datetime, timedelta

def generate_ssn():
    return f"{random.randint(100, 999)}-{random.randint(10, 99)}-{random.randint(1000, 9999)}"

def generate_name():
    first_names = ["John", "Jane", "Mike", "Emily", "David", "Sarah", "Robert", "Lisa", "William", "Mary"]
    last_names = ["Smith", "Johnson", "Williams", "Brown", "Jones", "Garcia", "Miller", "Davis", "Rodriguez", "Martinez"]
    return f"{random.choice(first_names)} {random.choice(last_names)}"

def generate_language():
    return random.choice(["English", "Spanish", "French", "German", "Chinese", "Arabic"])

def generate_job():
    jobs = ["Teacher", "Engineer", "Doctor", "Nurse", "Accountant", "Lawyer", "Salesperson", "Manager", "Chef", "Driver"]
    return random.choice(jobs)

def generate_height():
    return round(random.uniform(1.50, 2.00), 2)

def generate_weight():
    return round(random.uniform(45.0, 120.0), 1)

def generate_fev_pct():
    return round(random.uniform(20.0, 100.0), 1)

def generate_dyspnea_description():
    descriptions = [
        "Not troubled by breathlessness except on strenuous exercise",
        "Short of breath when hurrying on the level or walking up a slight hill",
        "Walks slower than people of the same age on the level because of breathlessness",
        "Stops for breath after walking about 100 yards or after a few minutes on the level",
        "Too breathless to leave the house"
    ]
    return random.choice(descriptions)

def generate_distance():
    return random.randint(50, 500)

def generate_hospital():
    hospitals = ["Hospital A", "Hospital B", "Hospital C", "Hospital D", "Hospital E"]
    return random.choice(hospitals)

def create_sample_dataset(filename, num_records):
    with open(filename, 'w', newline='') as csvfile:
        fieldnames = ['NAME', 'SSN', 'LANGUAGE', 'JOB', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for _ in range(num_records):
            writer.writerow({
                'NAME': generate_name(),
                'SSN': generate_ssn(),
                'LANGUAGE': generate_language(),
                'JOB': generate_job(),
                'HEIGHT_M': generate_height(),
                'WEIGHT_KG': generate_weight(),
                'fev_pct': generate_fev_pct(),
                'dyspnea_description': generate_dyspnea_description(),
                'distance_in_meters': generate_distance(),
                'hospital': generate_hospital()
            })

if __name__ == "__main__":
    create_sample_dataset('patient_input.csv', 100)  # Creates 100 sample records
    print("Sample dataset 'patient_input.csv' has been created.")

Sample dataset 'patient_input.csv' has been created.


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [109]:
import csv
import random

# Step 1: Function for calculating BMI
def calculate_bmi(weight, height):
    return round(weight / (height ** 2), 2)

# Step 2: Function for calculating BODE score
def calculate_bode_score(bmi, fev1_percentage, mrc_description, walk_distance):
    bmi_points = 1 if bmi <= 21 else 0

    if fev1_percentage >= 65:
        fev1_points = 0
    elif 50 <= fev1_percentage < 65:
        fev1_points = 1
    elif 36 <= fev1_percentage < 50:
        fev1_points = 2
    else:
        fev1_points = 3

    # Map MRC dyspnea description to MRC score
    mrc_mapping = {
        "Not troubled by breathlessness except on strenuous exercise": 1,
        "Short of breath when hurrying on the level or walking up a slight hill": 2,
        "Walks slower than people of the same age on the level because of breathlessness": 3,
        "Stops for breath after walking about 100 yards or after a few minutes on the level": 4,
        "Too breathless to leave the house": 5
    }
    mrc_score = mrc_mapping[mrc_description]
    mrc_points = mrc_score - 1  # MRC score 1-5 converted to 0-4 points

    if walk_distance >= 350:
        walk_points = 0
    elif 250 <= walk_distance < 350:
        walk_points = 1
    elif 150 <= walk_distance < 250:
        walk_points = 2
    else:
        walk_points = 3

    bode_score = bmi_points + fev1_points + mrc_points + walk_points
    return bode_score

# Step 3: Function for calculating BODE risk
def calculate_bode_risk(bode_score):
    if bode_score <= 2:
        return 'Low risk'
    elif 3 <= bode_score <= 4:
        return 'Moderate risk'
    elif 5 <= bode_score <= 6:
        return 'High risk'
    else:
        return 'Very high risk'

# Step 4: Define input and output file paths
patient_csv = "patient_input.csv"
patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

patient_results = []
hospital_stats = {}

# Step 5: Read patient_input.csv and process data
with open(patient_csv, 'r') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        name = row['NAME']
        height = float(row['HEIGHT_M'])
        weight = float(row['WEIGHT_KG'])
        fev_pct = float(row['fev_pct'])
        dyspnea_description = row['dyspnea_description']
        distance_in_meters = int(row['distance_in_meters'])
        hospital = row['hospital']

        # Calculate BMI, BODE score, and BODE risk
        bmi = calculate_bmi(weight, height)
        bode_score = calculate_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters)
        bode_risk = calculate_bode_risk(bode_score)

        # Store the patient results for patient_output.csv
        patient_results.append([name, bode_score, bode_risk, hospital])

        # Update hospital statistics
        if hospital not in hospital_stats:
            hospital_stats[hospital] = {
                'copd_count': 0,
                'total_bode_score': 0,
                'total_risk_levels': {
                    'Low risk': 0,
                    'Moderate risk': 0,
                    'High risk': 0,
                    'Very high risk': 0
                },
                'bed_count': random.randint(100, 500)  # Random number of beds per hospital
            }

        hospital_stats[hospital]['copd_count'] += 1
        hospital_stats[hospital]['total_bode_score'] += bode_score
        hospital_stats[hospital]['total_risk_levels'][bode_risk] += 1

# Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["NAME", "BODE_SCORE", "BODE_RISK", "HOSPITAL"])  # Write header
    writer.writerows(patient_results)

# Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["HOSPITAL_NAME", "COPD_COUNT", "PCT_OF_COPD_CASES_OVER_BEDS", "AVG_SCORE", "AVG_RISK"])  # Write header

    for hospital, stats in hospital_stats.items():
        copd_count = stats['copd_count']
        total_beds = stats['bed_count']
        total_bode_score = stats['total_bode_score']

        # Calculate percentage of COPD cases over total hospital beds
        pct_of_copd_cases = round((copd_count / total_beds) * 100, 2)

        # Calculate average BODE score
        avg_score = round(total_bode_score / copd_count, 2)

        # Determine average risk by calculating the most frequent risk level
        avg_risk = max(stats['total_risk_levels'], key=stats['total_risk_levels'].get)

        # Write hospital statistics to the output file
        writer.writerow([hospital, copd_count, pct_of_copd_cases, avg_score, avg_risk])

print("Patient output and Hospital output files have been created.")


Patient output and Hospital output files have been created.
