# Introduction to this Notebook

This Jupyter Notebook encompassess a series of scripts written in Python by Daniel Teixeira dos Santos, a Data Community Innovator at the Data Community of Practice ([link to my forum account](https://rcop.michaeljfox.org/u/danieltds/summary)). These scripts were written using data from PPMI, obtained through LONI. These files are linked to the MJFF Research Community's GitHub repository ([link here](https://github.com/MJFF-ResearchCommunity/Useful-PPMI-Clinical-Codes))

The goal of these scripts is to provide researchers some relevant clinical data that are extracted in a meaningful way from the data that is already available in PPMI. All the necessary input datasets can be obtained [here](https://ida.loni.usc.edu/pages/access/studyData.jsp?project=PPMI) after applying for registration for access to the PPMI data. All outputs from the analyses were removed to comply with privacy and data sharing principles. Some of these scripts were developed with the help of AI tools such as ChatGPT 4o.

This analysis requires two different folders to exist within the main folder. Those are "data" and "priv". The "data" folder is the place where you should store your datasets downloaded from LONI. The priv folder is the one the results will be exported to. These folders will be generated automatically at the beginning of this script, if they don't exist.

# Importing and Setting Paths

In [None]:
import os
import pandas as pd
import numpy as np
import warnings
import sys

#add path to utils folder with shared functions
sys.path.append("../utils")
from helpers import get_latest_file

# Automatically find the "Useful PPMI Clinical Codes" directory
CURRENT_DIR = os.getcwd()
while not CURRENT_DIR.endswith("Useful-PPMI-Clinical-Codes") and os.path.dirname(CURRENT_DIR) != CURRENT_DIR:
    CURRENT_DIR = os.path.dirname(CURRENT_DIR)

BASE_DIR = CURRENT_DIR

# Define paths for "data" and "report" directories
DATA_DIR = os.path.join(BASE_DIR, "data")
PRIV_DIR = os.path.join(BASE_DIR, "priv")

# Ensure both directories exist, create them if not
for directory in [DATA_DIR, PRIV_DIR]:
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"Created missing folder: {directory}")
    else:
        print(f"Found folder: {directory}")

# Ignore persistent warnings
warnings.simplefilter("ignore", UserWarning)

# Configure Pandas for better data visualization
pd.set_option('display.max_rows', 250)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
pd.options.display.float_format = "{:,.3f}".format

# List available files in both directories
print("Files in data directory:", os.listdir(DATA_DIR))
print("Files in priv directory:", os.listdir(PRIV_DIR))


# LEDD calculation

Levodopa Equivalent Daily Dose (LEDD) is a concept that identifies the total dosage of medications used to treat PD. It is important because it gives us information on how difficult it is to treat a specific patients. This data is already present in a PPMI data cut, however, this script also tries to produce LEDD of specific subgroups of medications such as levodopa-specific LEDD, MAO-B inhibitors, dopamina agonists etc

The latest LEDD calculation formulas are based on: https://movementdisorders.onlinelibrary.wiley.com/doi/full/10.1002/mds.29410

**Necessary PPMI datasets:** LEDD Concomitant Medication Log and MDS-UPDRS Part III Treatment Determination and Part III: Motor Examination

Limitation: interpreting COMT adjustments should probaby take into account how many times that patient takes it, but the current script doesn't do this

**Last Update:** February 9, 2025

## Reading

MDS-UPDRS Part III. This will be used as a proxy for a merger of EVENT_ID

In [None]:
MDS3_FILE = get_latest_file(prefix="MDS-UPDRS_Part_III", directory=DATA_DIR)
MDS3 = pd.read_csv(MDS3_FILE)
print('Length of the dataset:', len(MDS3))
MDS3.head()

LEDD datasheet

In [None]:
LEDD_FILE = get_latest_file(prefix="LEDD_Concomitant_Medication_Log")
LEDD = pd.read_csv(LEDD_FILE)
print('Length of the dataset:', len(LEDD))
LEDD.head()

**Explanation:** Differently from most PPMI notebooks, data in this dataset is not organized in EVENT_ID, so each we will need to apply the total LEDD for each EVENT_ID. Columns of interest are "LEDTRT" (name of the medication), "STARTDT" (Start Data), STOPDT (Stop Date) and "LEDD" (total already calculated LEDD). This script will, therefore, try to interpret this dataset and create columns that represent the LEDD for all medication times applied to each timepoint, so that it can facilitate analyses.

However, some rows doesn't have their LEDD calculated, and some adjustments can be necessary. Also, mediciation-specific type of LEDD are not calculated (e.g. dopamine agonist, MAO-B inhibitors etc). This script will address these issues.

## Adjusting Levodopa to COMT

Entacapone and Opicapone enhances the half-life of levodopa, so, usually, there needs to happen a multiplication of the LEDD when someone used Entacapone Some rows have an incomplete LEDD value just saying "LD x 0.33" or something like that, which indicates that the total dose of levodopa must be multiplied and added that amount (example: Carbidopa/Levodopa/Entacapone, telling the doses of everyone). Other rows have only this information on multiplication but no adjacent levodopa dose (example: Entacapone).

For privacy reasons, I can't give direct numerical examples of the patients that have this type of entry in my code, but to identify those, just look for PATNOs that have "Carbidopa/Levodopa/Entacapone" in the column "LEDTRT" and you can see that, in those patients, the LEDD column isn't calculated. It is "LD x 0.33" instead. You can also noticed that "LEDDSTRMG" brings forth the dosage of levodopa present in that compound, so we can further calculate


This codes creates a LEDD for instances where entacapone formulations are combined with levodopa

In [4]:
# Extract the number that appears after "LD x" in the "LEDD" column
LEDD['LD_multiplier'] = LEDD['LEDD'].str.extract(r'LD x (\d+\.\d+)').astype(float)

Adding an already converted value of LEDD in the first type of case

In [None]:
# Function to calculate LEDD
def calculate_ledd_ld(row):
    # Check if "levodopa" is present in LEDTRT (case insensitive)
    contains_levodopa = isinstance(row['LEDTRT'], str) and 'levodopa' in row['LEDTRT'].lower()

    # Proceed with calculation if LD_multiplier and LEDDSTRMG are not NaN and LEDTRT contains "levodopa"
    if pd.notna(row['LD_multiplier']) and pd.notna(row['LEDDSTRMG']) and contains_levodopa:
        result = (row['LEDDOSE'] * row['LEDDOSFRQ'] * row['LEDDSTRMG']) + \
                 (row['LEDDOSE'] * row['LEDDOSFRQ'] * row['LEDDSTRMG'] * row['LD_multiplier'])
        return result
    return np.nan  # Return NaN for rows where conditions are not met

# Running the function on all rows
LEDD['Calculated_LEDD_LD'] = LEDD.apply(calculate_ledd_ld, axis=1)

# Count number of modified rows
num_modified = LEDD['Calculated_LEDD_LD'].notna().sum()

# Update LEDD where Calculated_LEDD_LD is not NaN
LEDD.loc[pd.notna(LEDD['Calculated_LEDD_LD']), 'LEDD'] = LEDD['Calculated_LEDD_LD']

# Print number of modified rows
print(f"Number of rows modified: {num_modified}")

# Checking the end result and all columns involved in their calculations
LEDD[LEDD['LEDTRT'] == 'Carbidopa/Levodopa/Entacapone'][['LEDTRT', 'LEDDOSSTR', 'LEDDSTRMG', 'LEDDOSE', 'LEDDOSFRQ', 'LD_multiplier', 'LEDD','Calculated_LEDD_LD']].head(5)

In [None]:
# Now just applying this to the original LEDD column
# This only applies to columns where Calculated_LEDD_LD is not nan
# Count the number of rows where LEDD will be updated
num_modified = LEDD['Calculated_LEDD_LD'].notna().sum()

# Update LEDD values where Calculated_LEDD_LD is not NaN
LEDD.loc[pd.notna(LEDD['Calculated_LEDD_LD']), 'LEDD'] = LEDD['Calculated_LEDD_LD']

# Print the number of modified rows
print(f"Number of rows modified: {num_modified}")

## LD x 0.2, 0.33 and 0.5

Some rows have the COMT inhibitor isolated, therefore, we need to apply the calculations to the respective levodopa dosage that matches the period the patient was taking any of those medications.

Most common options are Entacapone (0.33) and Opicapone (0.5). However, Istradefylline, also uses a 0.2 increased conversion factor.

In [None]:
# Identifying how many entries had one medication that needs LEDD correction
LEDD['LD_multiplier'].value_counts()

**Before analysing:** in order for the code to work, entries should have exactly the same duration as the COMT inhibitor specified start and stop dates. If not, these calclations won't work. Therefore, we have to subdivide into new rows.

This script has been verified that it works correctly by selecting some specific patients and confirming that it correctly creates the necessary new rows that align with medication usage and that the original rows are deleted to not do double calculations.

Select some specific PATNOs modified by this script to verify its feasibility

**IMPORTANT:** as patients currently undertaking medication have a STOPDT of NaN (which invalidates the script), **the first two ines of the script are a comment and a setting for the date you downloaded the dataset, so it will fill all NaN with the specified date!**

In [None]:
# Input the date you downloaded the dataset (this is important for this script to work)
LEDD['STOPDT'] = LEDD['STOPDT'].fillna(pd.to_datetime('2025-01-01'))  # YYYY-MM-DD

# Mark original rows
LEDD['Row_Type'] = 'Original'

# Convert dates for processing
LEDD['STARTDT'] = pd.to_datetime(LEDD['STARTDT'], format='%m/%Y', errors='coerce')
LEDD['STOPDT'] = pd.to_datetime(LEDD['STOPDT'], format='%m/%Y', errors='coerce')

# Identify patients (PATNO) with a non-NaN LD_multiplier
patients_with_ldx = LEDD.loc[LEDD['LD_multiplier'].notna(), 'PATNO'].unique()

# Create lists to store new and junk rows
new_rows = []
junk_rows = []
modified_patnos = set()  # Track PATNOs where at least one change occurred

# Process each patient separately
for patno in patients_with_ldx:
    # Extract the periods where the patient has a valid LD_multiplier
    ldx_periods = LEDD[(LEDD['PATNO'] == patno) & (LEDD['LD_multiplier'].notna())][['STARTDT', 'STOPDT', 'LD_multiplier']]

    # Extract all Levodopa rows for the patient
    levodopa_rows = LEDD[(LEDD['PATNO'] == patno) & (LEDD['LEDTRT'].str.contains('Levodopa', case=False, na=False))]

    for _, ldx in ldx_periods.iterrows():
        ld_start, ld_stop, ld_multiplier = ldx['STARTDT'], ldx['STOPDT'], ldx['LD_multiplier']

        for _, levodopa in levodopa_rows.iterrows():
            levo_start, levo_stop = levodopa['STARTDT'], levodopa['STOPDT']

            # Identify non-exact but overlapping matches
            if (levo_start < ld_stop) and (levo_stop > ld_start) and not ((levo_start == ld_start) and (levo_stop == ld_stop)):

                # Move the original row to the junk dataset before replacing it
                junk_rows.append(levodopa.copy())

                # Mark PATNO as modified
                modified_patnos.add(patno)

                # First row: Exact match to LD multiplier time
                exact_match = levodopa.copy()
                exact_match['STARTDT'] = ld_start
                exact_match['STOPDT'] = ld_stop
                exact_match['Row_Type'] = 'Generated'
                new_rows.append(exact_match)

                # Second row: Period before LD multiplier medication
                if levo_start < ld_start:
                    before_match = levodopa.copy()
                    before_match['STOPDT'] = ld_start
                    before_match['Row_Type'] = 'Generated'
                    new_rows.append(before_match)

                # Third row: Period after LD multiplier medication
                if (levo_stop is pd.NaT) or (levo_stop > ld_stop):
                    after_match = levodopa.copy()
                    after_match['STARTDT'] = ld_stop
                    after_match['Row_Type'] = 'Generated'
                    new_rows.append(after_match)

# Convert the new and junk rows into DataFrames
new_rows_df = pd.DataFrame(new_rows)
junk_rows_df = pd.DataFrame(junk_rows)

# Remove the junk rows from the main dataset
LEDD = LEDD[~LEDD.index.isin(junk_rows_df.index)]

# Append the generated rows to the original dataset
LEDD = pd.concat([LEDD, new_rows_df], ignore_index=True)

# Sort the dataset by PATNO and STARTDT to maintain alignment
LEDD = LEDD.sort_values(by=['PATNO', 'STARTDT']).reset_index(drop=True)

# Print summary statistics
print(f"Number of unique PATNOs modified: {len(modified_patnos)}")
print(f"Total number of new rows added: {len(new_rows_df)}")

Now we are going to prepare to run the script that recalculates the LEDD. But first, we need to set different possibilities for levodopa names

In [None]:
# Creating the list
levodopa_names = ['Levodopa', 'Dhivy', 'Duodopa', 'Duopa', 'Inbrija',
                  'Parcopa','Prolopa','Rytary','Sinemet','Stalevo']

# Create regex pattern by joining the list elements with "|"
levodopa_pattern = '|'.join(levodopa_names)

# Seeing the result
print(levodopa_pattern)

Now the real deal: multiplying according levodopa LEDD in which the timestamp is similar

In [10]:
# Initialize 'Multiplied' column
LEDD['Multiplied'] = 'No'

# Creating the "Original_LEDD" values to check if any inconsistencies may happen
LEDD['LEDD'] = LEDD['LEDD'].astype(str)

# Extract numeric values only when "LD" is NOT in the string
LEDD['Original_LEDD'] = np.where(
    LEDD['LEDD'].str.contains('LD', case=False, na=False),
    np.nan,
    LEDD['LEDD'].str.extract(r'(\d+\.\d+|\d+)', expand=False).astype(float)
)

# Iterate over unique PATNO values to process each patient separately
for patno in LEDD['PATNO'].unique():
    # Filter dataset for the specific patient
    patient_data = LEDD[LEDD['PATNO'] == patno]

    # Identify rows with 'LD x' in LEDD for the specific patient
    rows_with_ldx = patient_data[patient_data['LEDD'].str.contains('LD x', na=False)]

    # Iterating over identified rows
    for index, row in rows_with_ldx.iterrows():
        multiplier = row['LD_multiplier']
        start_date = row['STARTDT']
        stop_date = row['STOPDT']

        # Identify corresponding Levodopa rows within the same date range for the same patient
        corresponding_rows = LEDD[
            (LEDD['PATNO'] == patno) &  # Ensuring only within the same patient
            (LEDD['LEDTRT'].str.contains(levodopa_pattern, case=False, na=False)) &
            (LEDD['STARTDT'] == start_date) &
            ((LEDD['STOPDT'] == stop_date) | (LEDD['STOPDT'].isna() & pd.isna(stop_date))) &
            (LEDD['Multiplied'] == 'No')
        ]

        # Apply the multiplication logic correctly
        for corr_index, corr_row in corresponding_rows.iterrows():
            original_ledd = LEDD.loc[corr_index, 'Original_LEDD']
            if pd.notna(multiplier):  # Ensure multiplier is not NaN
                new_ledd = original_ledd + (original_ledd * multiplier)
                LEDD.loc[corr_index, 'LEDD'] = new_ledd
                LEDD.loc[corr_index, 'Multiplied'] = 'Yes'

## Specific medication LEDD and joining to the timepoint organization

This part of the script accomplishes two different goals: first it merges the data present in the LEDD script in a way that is is aligned with the EVENT_ID in the dataset (BL, V02, V04 etc). For this, we use the MDS-UPDRS 3 questionnaire as a proxy to gather these timeframes, as it seems to be one of the most complete questionnaires. Second, it calculates medication_specific LEDD based on synthax understanding. It calculates for levodopa, MAO-B, amantadine, anticholinergics etc

This script is just to showcase how you can identify the different ways levodopa are written. I did a manual check for most medication types that will be presented below to ensure most of them could be included in LEDD-specific medication calculations

In [None]:
# Selecting rows where the "LEDTRT" column contains 'Levodopa'
levodopa_rows = LEDD[LEDD['LEDTRT'].str.contains('Levodopa', case=False, na=False)]

# Printing or further processing the selected rows
levodopa_rows['LEDTRT'].value_counts().index.tolist()[0:5]

## Uniting LEDD to timepoints

In [12]:
# Defining names for specific categories
# Levodopa
levodopa_names = ['Levodopa', 'Dhivy', 'Duodopa', 'Duopa', 'Inbrija',
                  'Parcopa','Prolopa','Rytary','Sinemet','Stalevo']

# Create a regex pattern that matches any of the terms
levodopopa_pattern = '|'.join(levodopa_names)

# Dopamine agonists
dopamine_agonist_names = ['Pramipexol', 'Mirapex', 'Mirapexin', 'Sifrol',
                          'Ropirinol', 'Requip', 'Rotigotin', 'Neupro',
                          'Apomorphin', 'Apokyn']

# Create a regex pattern that matches any of the terms
dopamine_agonist_pattern = '|'.join(dopamine_agonist_names)

# MAO-B
maob_names = ['Selegilin', 'Eldepryl', 'Zelapar', 'Rasagilin', 'Azilect'
                          'Safinamid', 'Xadago']

# Create a regex pattern that matches any of the terms
maob_pattern = '|'.join(maob_names)

# COMT
comt_names = ['Entacapon', 'Comtan', 'Tolcapon', 'Tasmar',
              'Opicapon', 'Ongentys']

# Create a regex pattern that matches any of the terms
comt_pattern = '|'.join(comt_names)

# Muscarinic antagonist
anticholingergic_names = ['Trihexyphenidyl', 'Artanis', 'Biperiden', 'Akineton']

# Create a regex pattern that matches any of the terms
anticholingergic_pattern = '|'.join(anticholingergic_names)

# Amantadine (some typos present in the dataset, so giving all options)
amantadine_names = ['AMANDATINE', 'AMANDTADINE', 'AMANTADIN', 'AMANTADIN 100',
                    'AMANTADIN 150', 'AMANTADINA', 'AMANTADINE', 'AMANTADINE (100 MG)',
                    'AMANTADINE 100 MG', 'AMANTADINE 100MG', 'AMANTADINE HCL',
                    'AMANTADINE09', 'GOCOVERI', 'GOCOVRI', 'GOCOVRI 137 MG',
                    'GOCOVRI ER', 'Gocovri (Amantadine CR )', 'Gocovri (Amantadine CR)',
                     'OSMOLEX ER', 'Osmolex (Amantadine ER)']

# Create a regex pattern that matches any of the terms
amantadine_pattern = '|'.join(amantadine_names)

# Creating lists of names for drugs
levodopa_values = LEDD[LEDD['LEDTRT'].str.contains(levodopopa_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()
dopamine_agonist_values = LEDD[LEDD['LEDTRT'].str.contains(dopamine_agonist_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()
maob_values = LEDD[LEDD['LEDTRT'].str.contains(maob_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()
comt_values = LEDD[LEDD['LEDTRT'].str.contains(comt_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()
anticholingergic_values = LEDD[LEDD['LEDTRT'].str.contains(anticholingergic_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()
amantadine_values = LEDD[LEDD['LEDTRT'].str.contains(amantadine_pattern, case=False, regex=True)]['LEDTRT'].drop_duplicates().tolist()

In [None]:
# Convert the date columns to datetime format
MDS3['INFODT'] = pd.to_datetime(MDS3['INFODT'], format='%m/%Y')
LEDD['STARTDT'] = pd.to_datetime(LEDD['STARTDT'], format='%m/%Y')
LEDD['STOPDT'] = pd.to_datetime(LEDD['STOPDT'], format='%m/%Y', errors='coerce')  # Handle NaN

# Convert 'LEDD' column to numeric
LEDD['LEDD'] = pd.to_numeric(LEDD['LEDD'], errors='coerce')

# Function to calculate LEDD sums for specific medication categories
def calculate_ledd_sums(relevant_meds, values_list):
    return relevant_meds[relevant_meds['LEDTRT'].str.lower().isin([val.lower() for val in values_list])]['LEDD'].sum()

# Initialize lists to collect results
led_values = []
amantadine_values_list = []
levodopa_values_list = []
dopamine_agonist_values_list = []
maob_values_list = []
comt_values_list = []
anticholingergic_values_list = []

# Iterate through unique patients in MDS3
for patno in MDS3['PATNO'].unique():
    patient_mds3 = MDS3[MDS3['PATNO'] == patno]
    patient_ledd = LEDD[LEDD['PATNO'] == patno]

    for index, row in patient_mds3.iterrows():
        infodt = row['INFODT']
        relevant_meds = patient_ledd[(patient_ledd['STARTDT'] <= infodt) &
                                     ((patient_ledd['STOPDT'] >= infodt) | pd.isna(patient_ledd['STOPDT']))]

        # Sum the total LEDD values
        led_values.append(relevant_meds['LEDD'].sum())

        # Sum the LEDD values for the specified 'LEDTRT' names for each category
        amantadine_values_list.append(calculate_ledd_sums(relevant_meds, amantadine_values))
        levodopa_values_list.append(calculate_ledd_sums(relevant_meds, levodopa_values))
        dopamine_agonist_values_list.append(calculate_ledd_sums(relevant_meds, dopamine_agonist_values))
        maob_values_list.append(calculate_ledd_sums(relevant_meds, maob_values))
        comt_values_list.append(calculate_ledd_sums(relevant_meds, comt_values))
        anticholingergic_values_list.append(calculate_ledd_sums(relevant_meds, anticholingergic_values))

# Assign the collected results back to the DataFrame
MDS3['LEDD'] = led_values
MDS3['AMANTADINE_LEDD'] = amantadine_values_list
MDS3['LEVODOPA_LEDD'] = levodopa_values_list
MDS3['DOPAMINE_AGONIST_LEDD'] = dopamine_agonist_values_list
MDS3['MAOB_LEDD'] = maob_values_list
MDS3['COMT_LEDD'] = comt_values_list
MDS3['ANTICHOLINERGIC_LEDD'] = anticholingergic_values_list

MDS3[['PATNO','INFODT','LEDD', 'AMANTADINE_LEDD', 'DOPAMINE_AGONIST_LEDD', 'MAOB_LEDD', 'COMT_LEDD', 'ANTICHOLINERGIC_LEDD', 'LEVODOPA_LEDD']].head()

Now the main part of the script. This can take 2 - 3 minutes to fully execute

In [None]:
# Convert the date columns to datetime format
MDS3['INFODT'] = pd.to_datetime(MDS3['INFODT'], format='%m/%Y')
LEDD['STARTDT'] = pd.to_datetime(LEDD['STARTDT'], format='%m/%Y')
LEDD['STOPDT'] = pd.to_datetime(LEDD['STOPDT'], format='%m/%Y', errors='coerce')  # Handle NaN

# Convert 'LEDD' column to numeric
LEDD['LEDD'] = pd.to_numeric(LEDD['LEDD'], errors='coerce')

# Function to calculate LEDD sums for specific medication categories
def calculate_ledd_sums(relevant_meds, values_list):
    return relevant_meds[relevant_meds['LEDTRT'].str.lower().isin([val.lower() for val in values_list])]['LEDD'].sum()

# Function to check COMT inhibitor presence
def check_comt_inhibitor(relevant_meds, comt_names):
    return "Yes" if relevant_meds['LEDTRT'].str.lower().isin([val.lower() for val in comt_names]).any() else "No"

# Initialize lists to collect results
led_values = []
amantadine_values_list = []
levodopa_values_list = []
dopamine_agonist_values_list = []
maob_values_list = []
comt_presence_list = []  # Store Yes/No instead of sum
anticholingergic_values_list = []

# Iterate through unique patients in MDS3
for patno in MDS3['PATNO'].unique():
    patient_mds3 = MDS3[MDS3['PATNO'] == patno]
    patient_ledd = LEDD[LEDD['PATNO'] == patno]

    for index, row in patient_mds3.iterrows():
        infodt = row['INFODT']
        relevant_meds = patient_ledd[(patient_ledd['STARTDT'] <= infodt) &
                                     ((patient_ledd['STOPDT'] >= infodt) | pd.isna(patient_ledd['STOPDT']))]

        # Sum the total LEDD values
        led_values.append(relevant_meds['LEDD'].sum())

        # Sum the LEDD values for the specified 'LEDTRT' names for each category
        amantadine_values_list.append(calculate_ledd_sums(relevant_meds, amantadine_values))
        levodopa_values_list.append(calculate_ledd_sums(relevant_meds, levodopa_values))
        dopamine_agonist_values_list.append(calculate_ledd_sums(relevant_meds, dopamine_agonist_values))
        maob_values_list.append(calculate_ledd_sums(relevant_meds, maob_values))
        anticholingergic_values_list.append(calculate_ledd_sums(relevant_meds, anticholingergic_values))

        # Check for COMT inhibitor presence
        comt_presence_list.append(check_comt_inhibitor(relevant_meds, comt_values))

# Assign the collected results back to the DataFrame
MDS3['LEDD'] = led_values
MDS3['AMANTADINE_LEDD'] = amantadine_values_list
MDS3['LEVODOPA_LEDD'] = levodopa_values_list
MDS3['DOPAMINE_AGONIST_LEDD'] = dopamine_agonist_values_list
MDS3['MAOB_LEDD'] = maob_values_list
MDS3['COMT_INHIBITOR'] = comt_presence_list  # Changed from sum to Yes/No
MDS3['ANTICHOLINERGIC_LEDD'] = anticholingergic_values_list

# Display the results
MDS3[['PATNO','INFODT','LEDD', 'AMANTADINE_LEDD', 'DOPAMINE_AGONIST_LEDD', 'MAOB_LEDD', 'COMT_INHIBITOR', 'ANTICHOLINERGIC_LEDD', 'LEVODOPA_LEDD']].head()

## Exporting

In [None]:
# Subsetting the dataset to important variables
LEDD_dataset = MDS3[['PATNO','INFODT','LEDD', 'AMANTADINE_LEDD', 'DOPAMINE_AGONIST_LEDD', 'MAOB_LEDD', 'COMT_INHIBITOR', 'ANTICHOLINERGIC_LEDD', 'LEVODOPA_LEDD']]
LEDD_dataset.head()

In [16]:
# Exporting
LEDD_dataset.to_csv(os.path.join(PRIV_DIR, "LEDD_by_timepoints_and_med_types.csv"), index=False)