Drugs are generally administered/prescribed by the physicians for a certain
period of time or they are administered at regular intervals, but for various reasons patients
might stop taking the treatment . Consider following example for better understanding
Let’s say you get a throat infection, the physician prescribes you an antibiotic for 10 days,
but you stop taking the treatment after 3 days because of some adverse events.
In the above example ideal treatment duration is 10 days but patients stopped taking
treatment after 3 days due to adverse events. Patients stopping a treatment is called dropoff.
We want to study dropoff for “Target Drug”, the aim is to generate insights on what events
lead to patients stopping on “Target Drug”.
Assume ideal treatment duration for “Target Drug” is 1 year, come up with analysis showing
how drop-off rate is, dropoff rate is defined as number of patients dropping off each month.
Then come up with analysis to generate insights on what events are driving a patient to stop
taking “Target Drug”.

## Importing Libraries

In [1]:
import pandas as pd

## Dataset

In [2]:
# Read the dataset
df=pd.read_parquet('train.parquet')
df

Unnamed: 0,Patient-Uid,Date,Incident
0,a0db1e73-1c7c-11ec-ae39-16262ee38c7f,2019-03-09,PRIMARY_DIAGNOSIS
1,a0dc93f2-1c7c-11ec-9cd2-16262ee38c7f,2015-05-16,PRIMARY_DIAGNOSIS
3,a0dc94c6-1c7c-11ec-a3a0-16262ee38c7f,2018-01-30,SYMPTOM_TYPE_0
4,a0dc950b-1c7c-11ec-b6ec-16262ee38c7f,2015-04-22,DRUG_TYPE_0
8,a0dc9543-1c7c-11ec-bb63-16262ee38c7f,2016-06-18,DRUG_TYPE_1
...,...,...,...
29080886,a0ee9f75-1c7c-11ec-94c7-16262ee38c7f,2018-07-06,DRUG_TYPE_6
29080897,a0ee1284-1c7c-11ec-a3d5-16262ee38c7f,2017-12-29,DRUG_TYPE_6
29080900,a0ee9b26-1c7c-11ec-8a40-16262ee38c7f,2018-10-18,DRUG_TYPE_10
29080903,a0ee1a92-1c7c-11ec-8341-16262ee38c7f,2015-09-18,DRUG_TYPE_6


# Drop-off Rate

To analyze the drop-off rate for "Target Drug" and identify the events driving patients to stop taking the drug:

    Calculate the drop-off rate:
    
        Extract the relevant columns from the dataset, including "Patient-Uid" and "Date" for drop-off events.
        Filter the data to include only patients who have taken the "Target Drug" at least once.
        Group the data by month and count the number of unique patients who dropped off within each month.
        Calculate the drop-off rate by dividing the number of drop-offs by the total number of patients who started the treatment in each month.

    Analyze events driving drop-offs:
    
        Identify the adverse events or other relevant columns in the dataset that might indicate the reasons for drop-offs.
        Extract the necessary columns for analysis, including "Patient-Uid", "Date", and the relevant event columns.
        Filter the data to include only patients who dropped off the treatment.
        Analyze the occurrence and patterns of events leading to drop-offs, such as adverse events or specific conditions.

In [4]:
train_data = df.copy()
# Filter the data to include only the patients who have taken the "Target Drug" at least once
target_drug_patients = train_data[train_data['Incident'] == 'TARGET DRUG']

# Calculate the drop-off rate
dropoff_data = target_drug_patients[['Patient-Uid', 'Date']].copy()
dropoff_data['Month'] = dropoff_data['Date'].dt.to_period('M')
dropoff_rate = dropoff_data.groupby('Month')['Patient-Uid'].nunique()

# Print the drop-off rate
print("Drop-off Rate:")
print(dropoff_rate)

# Analyze events driving drop-offs
events_data = train_data[['Patient-Uid', 'Date', 'Incident']].copy()
dropoff_events = events_data[events_data['Patient-Uid'].isin(target_drug_patients['Patient-Uid']

Drop-off Rate:
Month
2017-02       1
2017-03       2
2017-04       1
2017-05       5
2017-06      11
2017-07       6
2017-08      10
2017-09       6
2017-10       6
2017-11       6
2017-12      14
2018-01      15
2018-02      19
2018-03     472
2018-04     732
2018-05    1042
2018-06    1217
2018-07    1244
2018-08    1522
2018-09    1397
2018-10    1620
2018-11    1661
2018-12    1623
2019-01    1907
2019-02    1596
2019-03    1781
2019-04    1869
2019-05    2207
2019-06    2089
2019-07    2253
2019-08    2457
2019-09    2152
2019-10    2627
2019-11    2383
2019-12    2502
2020-01    2558
2020-02    2517
2020-03    2372
2020-04    2652
2020-05    2728
2020-06    2674
2020-07    2946
2020-08    2383
2020-09     384
Freq: M, Name: Patient-Uid, dtype: int64
Drop-off Events Data:


In [11]:
# Print the drop-off events data
print("Drop-off Events Data:\n\n",dropoff_events)

Drop-off Events Data:

                                    Patient-Uid       Date           Incident
8         a0e9c384-1c7c-11ec-81a0-16262ee38c7f 2018-02-22     SYMPTOM_TYPE_6
22        a0e9c3b3-1c7c-11ec-ae8e-16262ee38c7f 2018-02-21     SYMPTOM_TYPE_6
23        a0e9c3e3-1c7c-11ec-a8b9-16262ee38c7f 2017-05-11    SYMPTOM_TYPE_10
29        a0e9c414-1c7c-11ec-889a-16262ee38c7f 2019-11-22  PRIMARY_DIAGNOSIS
32        a0e9c443-1c7c-11ec-9eb0-16262ee38c7f 2020-01-28  PRIMARY_DIAGNOSIS
...                                        ...        ...                ...
29080886  a0ee9f75-1c7c-11ec-94c7-16262ee38c7f 2018-07-06        DRUG_TYPE_6
29080897  a0ee1284-1c7c-11ec-a3d5-16262ee38c7f 2017-12-29        DRUG_TYPE_6
29080900  a0ee9b26-1c7c-11ec-8a40-16262ee38c7f 2018-10-18       DRUG_TYPE_10
29080903  a0ee1a92-1c7c-11ec-8341-16262ee38c7f 2015-09-18        DRUG_TYPE_6
29080911  a0ee146e-1c7c-11ec-baee-16262ee38c7f 2018-10-05        DRUG_TYPE_1

[1436789 rows x 3 columns]


This code calculates the drop-off rate by grouping the drop-off events by month and counting the number of unique patients. It also extracts the events data for patients who dropped off and allows for further analysis of the specific events leading to drop-offs.