### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [1]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [6]:
import csv
import doctest

def calculate_bmi(weight_kg, height_m):
    """
    Calculate Body Mass Index (BMI).

    :param weight_kg: Weight in kilograms.
    :param height_m: Height in meters.
    :return: BMI value.

    >>> calculate_bmi(70, 1.75)
    22.86
    >>> calculate_bmi(0, 1.75)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive numbers.
    >>> calculate_bmi(70, 0)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive numbers.
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive numbers.")
    return round(weight_kg / (height_m ** 2), 2)

### Step 2: Calculate BODE Score

In [7]:
def calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters):
    """
    Calculate the BODE score based on FEV%, dyspnea description, and distance walked.

    :param fev_pct: Percentage of FEV1.
    :param dyspnea_description: Dyspnea scale (0 to 4).
    :param distance_in_meters: Distance walked in 6 minutes.
    :return: BODE score.

    >>> calculate_bode_score(50, 2, 400)
    5
    >>> calculate_bode_score(70, 0, 700)
    1
    """
    fev_score = 0
    if fev_pct < 50:
        fev_score = 3
    elif fev_pct < 65:
        fev_score = 1

    dyspnea_score = dyspnea_description

    if distance_in_meters < 150:
        distance_score = 3
    elif distance_in_meters < 350:
        distance_score = 2
    elif distance_in_meters < 500:
        distance_score = 1
    else:
        distance_score = 0

    return fev_score + dyspnea_score + distance_score

def calculate_bode_risk(bode_score):
    """
    Determine BODE risk category based on BODE score.

    :param bode_score: The calculated BODE score.
    :return: Risk category.

    >>> calculate_bode_risk(1)
    'Low risk'
    >>> calculate_bode_risk(6)
    'High risk'
    """
    if bode_score <= 2:
        return "Low risk"
    elif bode_score <= 4:
        return "Moderate risk"
    else:
        return "High risk"

### Step 3: Calculate BODE Risk

In [8]:
def load_patient_data(input_file):
    """
    Load patient data from a CSV file.

    :param input_file: The name of the input CSV file.
    :return: A list of patient records.
    """
    patients = []
    with open(input_file, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        for row in reader:
            patients.append(row)
    return patients

### Step 4: Load Hospital Data

In [14]:
import csv
import json

# Patients data
def load_patient_data(file_path):
    """
    Load patient data from a CSV file.

    :param file_path: Path to the CSV file containing patient data.
    :return: List of patient records.
    :raises ValueError: If required columns are missing or if there are data conversion issues.
    """
    required_columns = {'NAME', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital'}
    patient_data = []

    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)

        # Check for missing columns
        missing_cols = required_columns - set(reader.fieldnames)
        if missing_cols:
            raise ValueError(f"Missing columns: {missing_cols}")

        for row in reader:
            try:
                patient_data.append({
                    'NAME': row['NAME'],
                    'HEIGHT_M': float(row['HEIGHT_M']),
                    'WEIGHT_KG': float(row['WEIGHT_KG']),
                    'fev_pct': float(row['fev_pct']),
                    'dyspnea_description': row['dyspnea_description'],
                    'distance_in_meters': float(row['distance_in_meters']),
                    'hospital': row['hospital']
                })
            except (ValueError, KeyError) as e:
                name = row.get('NAME', 'Unknown')
                print(f"Skipping {name} due to error: {e}")

    return patient_data

# Load patient data
try:
    patients = load_patient_data('/content/patient.csv')
    print(f"Loaded {len(patients)} patients from '/content/patient.csv'.")
except ValueError as e:
    print(f"Error loading patient data: {e}")

# Hospital data
def load_hospital_info(json_path):
    """
    Load hospital information from a JSON file.

    :param json_path: Path to the JSON file containing hospital data.
    :return: List of hospital records.
    :raises ValueError: If there is an error decoding the JSON file.
    """
    try:
        with open(json_path, 'r') as json_file:
            return json.load(json_file)
    except json.JSONDecodeError as e:
        raise ValueError(f"Error decoding JSON: {e}")

# Load hospital data
try:
    hospitals = load_hospital_info('/content/hospitals.json')
    print(f"Loaded {len(hospitals)} hospitals from '/content/hospitals.json'.")
except ValueError as e:
    print(f"Error loading hospital data: {e}")


Loaded 1000 patients from '/content/patient.csv'.
Loaded 3 hospitals from '/content/hospitals.json'.


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [16]:
import csv
import json

# Step 1: Define file paths
patient_csv = "/content/patient.csv"
hospital_json = "/content/hospitals.json"

# Step 2: Define a function to calculate the BODE Score and Risk
def calculate_bode_score(patient):
    """
    Calculate the BODE score and risk based on patient's data.

    :param patient: Dictionary containing patient data.
    :return: Tuple containing BODE score and BODE risk.
    """
    bode_score = (patient['HEIGHT_M'] + patient['WEIGHT_KG']) / 2
    bode_risk = bode_score / 10
    return bode_score, bode_risk

# Step 3: Load the patient data from CSV
def load_patient_data(filename):
    """
    Load patient data from a CSV file.

    :param filename: Path to the patient CSV file.
    :return: List of patient records.
    """
    patient_data = []
    with open(filename, mode='r') as file:
        csv_reader = csv.DictReader(file)

        # Print the headers (column names)
        headers = csv_reader.fieldnames
        print("CSV Headers:", headers)

        for row in csv_reader:
            row['HEIGHT_M'] = float(row['HEIGHT_M'])
            row['WEIGHT_KG'] = float(row['WEIGHT_KG'])
            patient_data.append(row)

    return patient_data

# Step 4: Load the hospital data from JSON
def load_hospital_data(filename):
    """
    Load hospital data from a JSON file.

    :param filename: Path to the hospital JSON file.
    :return: List of hospital records.
    """
    with open(filename, mode='r') as file:
        hospital_data = json.load(file)
    return hospital_data

# Step 5: Process the patients and calculate BODE Scores and Risks
def process_patients_and_hospitals(patient_data, hospital_data):
    """
    Process patient and hospital data to calculate BODE scores and risks.

    :param patient_data: List of patient records.
    :param hospital_data: List of hospital records.
    :return: Tuple containing patient results and hospital output list.
    """
    patient_results = []
    hospital_aggregates = {}

    # Step 6: Process each patient
    for patient in patient_data:
        # Calculate the BODE score and risk for each patient
        bode_score, bode_risk = calculate_bode_score(patient)
        patient_id = patient['NAME']
        hospital_id = patient['hospital']

        # Store patient result
        patient_results.append([patient_id, hospital_id, bode_score, bode_risk])

        # Aggregate data by hospital
        if hospital_id not in hospital_aggregates:
            hospital_aggregates[hospital_id] = {
                'total_bode_score': 0,
                'total_bode_risk': 0,
                'num_patients': 0
            }

        hospital_aggregates[hospital_id]['total_bode_score'] += bode_score
        hospital_aggregates[hospital_id]['total_bode_risk'] += bode_risk
        hospital_aggregates[hospital_id]['num_patients'] += 1

    # Step 7: Calculate the averages for each hospital
    hospital_output_list = []
    for hospital_id, aggregates in hospital_aggregates.items():
        avg_bode_score = aggregates['total_bode_score'] / aggregates['num_patients']
        avg_bode_risk = aggregates['total_bode_risk'] / aggregates['num_patients']
        hospital_output_list.append([hospital_id, avg_bode_score, avg_bode_risk, aggregates['num_patients']])

    return patient_results, hospital_output_list

# Step 8: Write the results to CSV files
def write_csv(filename, data, headers=None):
    """
    Write data to a CSV file.

    :param filename: Name of the CSV file to write to.
    :param data: Data to write.
    :param headers: Optional list of headers for the CSV.
    """
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        if headers:
            writer.writerow(headers)
        writer.writerows(data)

# Step 9: Load data, process it, and save the results
patient_data = load_patient_data(patient_csv)
hospital_data = load_hospital_data(hospital_json)

# Process the data and get the results
patient_results, hospital_output_list = process_patients_and_hospitals(patient_data, hospital_data)

# Write the patient and hospital results to their respective CSV files
write_csv("patient_output.csv", patient_results, headers=["PATIENT_NAME", "HOSPITAL", "BODE_SCORE", "BODE_RISK"])
write_csv("hospital_output.csv", hospital_output_list, headers=["HOSPITAL", "AVG_BODE_SCORE", "AVG_BODE_RISK", "NUM_PATIENTS"])

# Output for verification (first few lines)
print("Patient Results:")
for row in patient_results[:5]:
    print(row)

print("\nHospital Results:")
for row in hospital_output_list[:5]:
    print(row)


CSV Headers: ['NAME', 'SSN', 'LANGUAGE', 'JOB', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital']
Patient Results:
['Vanessa Roberts', "ST.LUKE'S", 46.0, 4.6]
['Christopher Fox', 'SAINT LOUIS UNIVERSITY', 42.365, 4.2365]
['Benjamin Johnston', 'BJC', 48.26, 4.826]
['Christopher Hernandez', 'MISSOURI BAPTIST', 41.61, 4.161]
['Valerie Burch', 'BJC WEST COUNTY', 43.144999999999996, 4.3145]

Hospital Results:
["ST.LUKE'S", 49.28707317073169, 4.92870731707317, 164]
['SAINT LOUIS UNIVERSITY', 49.36060975609756, 4.936060975609758, 164]
['BJC', 49.58717391304347, 4.95871739130435, 184]
['MISSOURI BAPTIST', 49.856801242236, 4.985680124223601, 161]
['BJC WEST COUNTY', 49.26999999999999, 4.9270000000000005, 171]
