## Part 1: Research Question

### A. Purpose of the Data Mining Report

The overarching aim of this report is to use Market Basket Analysis (MBA) to uncover underlying patterns in prescription drug combinations among patients. This analysis will reveal associations and frequent co-occurrences in the dataset of prescribed medications over the past two years. Understanding these patterns will enable the hospital to optimize its pharmaceutical services and potentially tailor patient care programs more effectively.

**1. Proposed Question**

**What are the most common combinations of medications prescribed together for patients within our hospital network, and how can these combinations inform our stocking strategies and patient care protocols?**

This question is designed to identify prevalent medication combinations, which can indicate common co-morbidities or treatment protocols. The insights derived could be instrumental in managing inventory efficiently, ensuring the availability of critical medications, and designing integrated care pathways that address the complex needs of patients with multiple health conditions.

**2. Goal of the Data Analysis**

<u>To identify frequent item sets of medications and derive association rules among these sets to aid in strategic decision-making related to pharmaceutical supplies and patient care optimization.</u>

This goal will be achieved by:

- Analyzing historical prescription data to find frequent patterns and associations in medication usage.
- Generating actionable insights that can contribute to better inventory management and targeted patient care strategies.
- Enhancing patient outcomes by leveraging discovered associations to inform and possibly reformulate treatment approaches for prevalent health conditions within the patient population.

These objectives are directly aligned with the available dataset, which consists of multi-drug prescriptions for a large number of patients. By focusing on this dataset, the hospital can enhance operational efficiency and improve patient care quality through data-driven decisions.

## Part 2: Market Basket Justification

### B. Reasons for Using Market Basket Analysis

Market Basket Analysis (MBA) is a data mining technique commonly used to discover associations and relationships between items in large transactional datasets. It is particularly well-suited for scenarios where the goal is to understand the co-occurrence of items within transactional purchases, such as in retail for basket analysis or, as in this case, in healthcare for prescription pattern analysis.

**1. Explanation of Market Basket Analysis on the Selected Data Set**

Market Basket Analysis will be used to analyze the historical prescription data from the hospital. This involves the following steps:

- Transaction Definition: Each patient's record, comprising prescribed medications over the last two years, will be treated as a separate transaction.
- Item Identification: Each unique medication in the dataset represents an item.
- Association Rule Mining: Using algorithms such as Apriori or FP-Growth, the analysis will identify frequent itemsets (common combinations of medications) and derive association rules (medications that are likely to be co-prescribed).

<u>Expected Outcomes</u>:

- Frequent Medication Combinations: Identification of the most commonly co-prescribed sets of medications.
- Association Rules with Metrics: Such as support, confidence, and lift, which help in understanding the strength and relevance of the discovered associations.
- Strategic Insights: Data-driven insights for inventory management (e.g., ensuring high-demand medications are well-stocked) and patient care (e.g., preparing care protocols for common comorbidities).

**2. Example of Transactions in the Data Set**

Consider a hypothetical set of transactions from the dataset:

- Transaction 1: [Amlodipine, Metformin, Lipitor]
- Transaction 2: [Amlodipine, Albuterol, Lipitor]
- Transaction 3: [Metformin, Simvastatin]
  
Each list represents a patient’s prescribed medications during their interaction with the hospital services, capturing all prescribed medications within a two-year period.

**3. Assumption of Market Basket Analysis**

Market Basket Analysis assumes that all transactions are independent of each other. This means that the occurrence of a set of items in one transaction does not influence their occurrence in another transaction. In the context of the current dataset, this assumption translates to the idea that the prescription pattern of one patient does not influence the prescription pattern of another. This is a standard assumption that simplifies the analysis but may overlook factors like regional health trends or demographic influences on prescription patterns, which can lead to correlated prescriptions across different patients.

## Part 3: Data Preparation and Analysis

### C. Preparation and Execution of Market Basket Analysis

To effectively perform Market Basket Analysis, the dataset needs to be transformed into a format that lists each transaction as a set of items (prescriptions in this case). Each patient's prescriptions will be treated as one transaction. Here's how we'll proceed:

**1. Transform the Dataset for Market Basket Analysis**

Let's clean and transform the data to make it suitable for the analysis:

- Data Cleaning: Remove missing values and format the data for the analysis.
- Transformation: Convert the dataset from a wide format (one row per patient with multiple columns for prescriptions) to a list format where each row represents a single transaction.

In [2]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.23.1-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.1-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   - -------------------------------------- 0.0/1.4 MB 2.0 MB/s eta 0:00:01
   - -------------------------------------- 0.0/1.4 MB 2.0 MB/s eta 0:00:01
   ---- ----------------------------------- 0.2/1.4 MB 1.3 MB/s eta 0:00:01
   ---- ----------------------------------- 0.2/1.4 MB 1.3 MB/s eta 0:00:01
   ---------- ----------------------------- 0.4/1.4 MB 1.5 MB/s eta 0:00:01
   ------------------------------------- -- 1.4/1.4 MB 5.1 MB/s eta 0:00:01
   ---------------------------------------  1.4/1.4 MB 5.1 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 4.4 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successful

In [7]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [8]:
# Load the data
data = pd.read_csv('medical_market_basket.csv')

In [9]:
data.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [10]:
# Fill missing values with a placeholder (e.g., 'None') and gather all items into transactions
transactions = data.fillna('None').values.tolist()

# Transform each transaction by filtering out 'None' values
transactions = [list(filter('None'.__ne__, transaction)) for transaction in transactions]

In [11]:
# Instantiate TransactionEncoder, which is used to transform the data into a format suitable for the Apriori algorithm
encoder = TransactionEncoder()
transformed_data = encoder.fit_transform(transactions)
transformed_df = pd.DataFrame(transformed_data, columns=encoder.columns_)
transformed_df = transformed_df[transformed_df.any(axis=1)]

In [19]:
transformed_df.head()

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,...,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
1,False,False,False,True,False,False,True,True,False,True,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
9,False,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [13]:
# Save the cleaned, transformed data
transformed_df.to_csv('cleaned_transformed_data.csv', index=False)

**2. Execute Apriori Algorithm**

Now, let's use the Apriori algorithm to generate frequent itemsets and association rules:

In [14]:
# Generate frequent itemsets
frequent_itemsets = apriori(transformed_df, min_support=0.01, use_colnames=True, max_len=3)

In [15]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.1)

**3. Values for Support, Lift, and Confidence**

In [16]:
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Premarin),(diazepam),0.046794,0.163845,0.011598,0.247863,1.512793,0.003932,1.111706,0.355611
1,(allopurinol),(abilify),0.033329,0.238368,0.011598,0.348,1.459926,0.003654,1.168147,0.325896
2,(alprazolam),(abilify),0.079323,0.238368,0.017064,0.215126,0.902495,-0.001844,0.970387,-0.105024
3,(amlodipine),(abilify),0.071457,0.238368,0.023597,0.330224,1.385352,0.006564,1.137144,0.299568
4,(abilify),(amphetamine salt combo),0.238368,0.068391,0.024397,0.102349,1.49653,0.008095,1.03783,0.435627


In [17]:
# Save the rules
rules.to_csv('association_rules.csv', index=False)

In [18]:
# Display the rules sorted by confidence and lift
rules_sorted = rules.sort_values(by=['confidence', 'lift'], ascending=False)
rules_sorted.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
268,"(lisinopril, amphetamine salt combo xr)",(abilify),0.019997,0.238368,0.010132,0.506667,2.125563,0.005365,1.543848,0.540342
280,"(lisinopril, atorvastatin)",(abilify),0.021997,0.238368,0.011065,0.50303,2.110308,0.005822,1.532552,0.537969
311,"(lisinopril, diazepam)",(abilify),0.023064,0.238368,0.010932,0.473988,1.988472,0.005434,1.447937,0.508837
285,"(atorvastatin, metoprolol)",(abilify),0.023597,0.238368,0.011065,0.468927,1.967236,0.00544,1.434136,0.503555
36,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221


**4. Explanation of Top Three Relevant Rules**

1. **Rule: (lisinopril, amphetamine salt combo xr) ⇒ (abilify)**

   - **Support: 0.010132** - This value indicates that the combination of lisinopril, amphetamine salt combo xr, and abilify occurs in approximately 1.01% of all transactions.
   - **Confidence: 0.506667** - This suggests that when lisinopril and amphetamine salt combo xr are prescribed, there is a 50.67% chance that abilify is also prescribed.
   - **Lift: 2.125563** - A lift value greater than 1 suggests that this combination is more likely to be prescribed together than would be expected if they were statistically independent, specifically, about 2.13 times more likely.

2. **Rule: (lisinopril, atorvastatin) ⇒ (abilify)**

   - **Support: 0.011065** - The combination of lisinopril, atorvastatin, and abilify appears in about 1.11% of all transactions.
   - **Confidence: 0.503030** - This indicates a 50.30% probability that abilify is also prescribed whenever lisinopril and atorvastatin are prescribed.
   - **Lift: 2.110308** - The occurrence of this combination is 2.11 times more likely than their independent occurrences would suggest.

3. **Rule: (lisinopril, diazepam) ⇒ (abilify)**

   - **Support: 0.010932** - This combination appears in about 1.09% of transactions.
   - **Confidence: 0.473988** - There is a 47.40% chance that abilify will be prescribed when lisinopril and diazepam are prescribed.
   - **Lift: 1.988472** - Lisinopril, diazepam, and abilify are prescribed together nearly 2 times more often than expected under independence.

These rules highlight significant associations between the use of abilify and combinations of medications such as lisinopril with either amphetamine salt combo xr, atorvastatin, or diazepam. This could indicate prevalent comorbidity patterns involving mental health conditions and other chronic health issues such as hypertension, cholesterol management, and anxiety.

## Part 4: Data Summary and Implications

### D. Summarize Data Analysis

**1. Significance of Support, Lift, and Confidence**

- Support measures the prevalence of an itemset in all transactions. A higher support value indicates a more frequently occurring itemset, which is critical for ensuring that any patterns found are sufficiently widespread to merit consideration for action.

- Confidence provides the likelihood of occurrence of the consequent in transactions containing the antecedent. It is a direct indicator of the reliability of the inference made by a rule. In practical terms, a higher confidence rule is considered more potentially useful and actionable.

- Lift compares the observed frequency of A and B appearing together with the frequency expected if A and B were independent. A lift value greater than 1 indicates that the itemset occurs more frequently than expected, thus denoting a strong association rule. This is particularly important for distinguishing interesting rules from merely frequent itemsets.

**2. Practical Significance of Findings**

The analysis identifies key drug associations that might indicate underlying health patterns. For instance, the frequent co-prescription of psychiatric medication (Abilify) with medications for chronic conditions (Metformin for diabetes, Lisinopril for hypertension) can help:

- Identify patient segments with multiple chronic conditions who may benefit from integrated care pathways.
- Guide stock management by ensuring that medications frequently prescribed together are adequately stocked.
- Improve patient outcomes by targeting interventions that address the complexities of their co-morbid conditions.

**3. Recommendations**

- Developing or improving comprehensive care programs that address the needs of patients with the identified comorbid conditions. This could involve creating specialized care teams that understand the nuances of psychiatric and chronic disease management.
- Adjusting inventory management practices to ensure that medications found in significant association rules are always available, preventing supply shortages that could impact patient care.
- Conducting further research to understand the causative links behind these associations, potentially leading to better treatment protocols and health outcomes.