# Part I: Proposal

## A1.  Purpose

The purpose of this market basket analysis is to understand patterns of prescription medication use among patients at the hospital. The key question we want to answer is: **What combinations of medications are commonly prescribed together to patients at the hospital?** This question will help the hospital identify frequently co-occurring prescriptions, enabling better understanding of patient needs and potential areas for improving efficiency.

## A2. Goal

The goal is to identify sets of associated medications that are often prescribed together to patients. The analysis aims to enhance resource allocation, anticipate medication needs, and potentially improve the hospital's efficiency in managing prescriptions.

# Part II: Market Basket Analysis

## B1.  How Market Basket Analysis (MBA) Analyzes Data.

- **Data Collection:** Collect transactional data where each transaction represents items purchased together.
- **Data Preparation:** Convert the dataset into a binary matrix indicating the presence or absence of each item in transactions.
- **Finding Frequent Itemsets:** The Apriori algorithm calculates the support of each item and itemset, retaining only those that meet a minimum support threshold. The Apriori principle helps reduce computational effort by eliminating itemsets whose subsets aren't frequent.
- **Rule Generation:** Generate association rules from these frequent itemsets, evaluating them based on metrics such as confidence and lift. Only rules that exceed predefined thresholds are considered.
- **Analysis and Interpretation:** Analyze the rules to extract actionable insights that can guide decisions such as promotional strategies or inventory management.

In the context of analyzing prescription data, MBA identifies common medication combinations, shedding light on prevalent treatment patterns. This can aid healthcare providers in optimizing treatment plans and improving patient care by understanding which medications are frequently prescribed together.

An anticipated outcome of this analysis is to identify association rules that prominently feature medications appearing frequently in patient transactions, serving either as antecedents or consequents. This will provide valuable insights into prevalent prescribing patterns, facilitate the understanding of common medication combinations, and enhance decision-making in patient care management.

## B2. Transaction Example

Transactional data refers to the collection of items purchased by a customer in a single transaction. In a healthcare setting, a patient's visit resulting in the prescription of medications like Metformin and Lisinopril would be considered a single transaction. MBA could then be used to identify that whenever Metformin is prescribed, Lisinopril might also be prescribed.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

pd.set_option("display.max_columns", None)


In [2]:
df = pd.read_csv('medical_market_basket.csv')
df.iloc[27]

Presc01    carvedilol
Presc02      Premarin
Presc03     ezetimibe
Presc04           NaN
Presc05           NaN
Presc06           NaN
Presc07           NaN
Presc08           NaN
Presc09           NaN
Presc10           NaN
Presc11           NaN
Presc12           NaN
Presc13           NaN
Presc14           NaN
Presc15           NaN
Presc16           NaN
Presc17           NaN
Presc18           NaN
Presc19           NaN
Presc20           NaN
Name: 27, dtype: object

Carvedilol, Premarin, and ezetimibe were purchased together in the transaction shown above.

## B3. Assumption

In market basket analysis, it is essential to assume that each item in the dataset represents an independent transaction. This means that each transaction is separate from the others. This assumption is the foundation of market basket analysis, as it helps to understand item associations within individual transactions. If transactions are not independent, the reliability of the association rules generated through the analysis might be affected. Therefore, it is necessary to ensure that each transaction is independent and not affected by any other transaction.

# Part III: Data Preparation and Analysis

## C1.  Transforming the Data

The dataset contains columns labeled "Presc01" to "Presc20," each representing a different prescription associated with a patient. Each row represents a patient’s transaction, and the values are the names of the medications prescribed.To perform market basket analysis, we need to transform each row into a list of non-null prescriptions for each patient

In [3]:
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [4]:
# Remove Null Rows
df = df[df['Presc01'].notna()]
df.reset_index(drop=True, inplace=True)

In [5]:
# Remove NaN values and convert each row to a list of prescriptions
transactions = df.apply(lambda row: row.dropna().tolist(), axis=1).tolist()

The dataset has been transformed into a list of transactions where each transaction is a list of prescriptions taken by a patient.

Displaying the first few transactions for illustration:

In [6]:
transactions[:5] 

[['amlodipine',
  'albuterol aerosol',
  'allopurinol',
  'pantoprazole',
  'lorazepam',
  'omeprazole',
  'mometasone',
  'fluconozole',
  'gabapentin',
  'pravastatin',
  'cialis',
  'losartan',
  'metoprolol succinate XL',
  'sulfamethoxazole',
  'abilify',
  'spironolactone',
  'albuterol HFA',
  'levofloxacin',
  'promethazine',
  'glipizide'],
 ['citalopram', 'benicar', 'amphetamine salt combo xr'],
 ['enalapril'],
 ['paroxetine', 'allopurinol'],
 ['abilify', 'atorvastatin', 'folic acid', 'naproxen', 'losartan']]

### Transaction Encoding

A transaction encoder is a tool used in data preprocessing for transactional data. Its primary function is to transform data into a format that MBA algorithms like Apriori can efficiently use. It converts transactional data into an encoded matrix, where rows represent individual transactions and columns represent items. Each cell contains a binary (0,1) or boolean value (True, False).


#### Steps in Transaction Encoding
1. **Identifying Unique Items:** First, the encoder scans through all transactions to identify all unique items across the dataset.
2. **Creating a Matrix:** It then creates a matrix where each row corresponds to a transaction and each column corresponds to one of the identified unique items.
3. **Filling the Matrix:** For each transaction, the encoder fills in the matrix:
    - True: Indicates the item is present in the transaction.
    - False: Indicates the item is not present in the transaction.

The `mlxtend` library directly supports transaction encoding and running MBA algorithms, simplifying the process of preparing transactional data for market basket analysis. The `fit` method identifies unique labels from the data, while the `transform` method converts transactions into an encoded boolean matrix.

In [7]:
# Instantiate the transaction encoder
encoder = TransactionEncoder()
# Fit the transaction encoder and then transform that data
temp = encoder.fit(transactions).transform(transactions)
# Generate a new dataframe from this temporary array
df = pd.DataFrame(temp, columns=encoder.columns_)

The dataset has been encoded and is now in a format that the Apriori algorithm can efficiently use.

In [8]:
df.head()

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,alprazolam,amitriptyline,amlodipine,amoxicillin,amphetamine,amphetamine salt combo,amphetamine salt combo xr,atenolol,atorvastatin,azithromycin,benazepril,benicar,boniva,bupropion sr,carisoprodol,carvedilol,cefdinir,celebrex,celecoxib,cephalexin,cialis,ciprofloxacin,citalopram,clavulanate K+,clonazepam,clonidine HCI,clopidogrel,clotrimazole,codeine,crestor,cyclobenzaprine,cymbalta,dextroamphetamine XR,diazepam,diclofenac sodium,doxycycline hyclate,enalapril,escitalopram,esomeprazole,ezetimibe,fenofibrate,fexofenadine,finasteride,flovent hfa 110mcg inhaler,fluconozole,fluoxetine HCI,fluticasone,fluticasone nasal spray,folic acid,furosemide,gabapentin,glimepiride,glipizide,glyburide,hydrochlorothiazide,hydrocodone,hydrocortisone 2.5% cream,ibuprophen,isosorbide mononitrate,lansoprazole,lantus,levofloxacin,levothyroxine sodium,lisinopril,lorazepam,losartan,lovastatin,meloxicam,metformin,metformin HCI,methylprednisone,metoprolol,metoprolol succinate XL,metoprolol tartrate,mometasone,naproxen,omeprazole,oxycodone,pantoprazole,paroxetine,pioglitazone,potassium Chloride,pravastatin,prednisone,pregabalin,promethazine,quetiapine,ranitidine,rosuvastatin,salmeterol inhaler,sertraline HCI,simvastatin,spironolactone,sulfamethoxazole,synthroid,tamsulosin,temezepam,topiramate,tramadol,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
0,False,False,False,True,False,False,True,True,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False,True,False,True,False,True,False,True,False,False,False,True,False,False,True,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


## C2. Apriori Algorithm

The Apriori algorithm is widely used in data mining to find frequent itemsets in large transaction datasets. The algorithm works by identifying frequent individual items and extending them to larger itemsets based on a minimum support threshold. It follows a "bottom-up" approach, where frequent subsets are extended, tested, and pruned until no further extensions are found. The process continues until no more frequent itemsets can be found.

### Steps of the Apriori Algorithm:
1. **Set a Minimum Support Threshold:** Determine the minimum support level required for an itemset to be considered frequent.
2. **Generate Candidate Itemsets:** Start with single items that meet the minimum support threshold, forming the frequent 1-itemsets.
3. **Join Step:** Take all frequent k-itemsets and generate (k+1)-itemsets by joining each pair of k-itemsets that share (k-1) items.
4. **Prune Step:** Eliminate candidate (k+1) itemsets where one of the k-item subsets is not frequent. Based on the Apriori property, all subsets of a frequent itemset must also be frequent.
5. **Test the Candidates:** Calculate each candidate's support by checking each transaction to see if it contains the candidate itemset.
6. **Determine Frequent Itemsets:** Identify the frequent itemsets among the candidates based on the support threshold.
7. **Repeat:** Continue the process until no more frequent itemsets can be found. This usually happens when the set of candidates is empty.

This section of the code filters out itemsets that do not meet the frequency criteria. Here, the threshold for considering itemsets as frequent is set at 2% of all transactions.

In [9]:
# Apriori algorithm
# min_support=0.02: Sets the threshold to consider itemsets as frequent if they appear in at least 2% of all transactions.
frequent_itemsets = apriori(df, min_support=0.02, use_colnames=True) 
# Sorting the itemsets based on their support values.
frequent_itemsets.sort_values(by=['support'],ascending=False)

Unnamed: 0,support,itemsets
1,0.238368,(abilify)
8,0.179709,(amphetamine salt combo xr)
11,0.174110,(carvedilol)
29,0.170911,(glyburide)
19,0.163845,(diazepam)
...,...,...
2,0.020397,(albuterol aerosol)
67,0.020264,"(abilify, levofloxacin)"
64,0.020131,"(abilify, fenofibrate)"
72,0.020131,"(naproxen, abilify)"


The highest support, calculated based on the frequency of "Abilify," is approximately 0.238. This means that "Abilify" appears in about 23.8% of all transactions.

## C3. Association Rules Table

Association rules are if-then statements that illustrate the likelihood of relationships between items. Typically, an association rule has two parts:
- **Antecedent (if):** an item or a combination of items found in the data.
- **Consequent (then):** an item or combination of items that are typically found in conjunction with the antecedent.

### Generating Association Rules
- **Selecting Metrics:** Set metrics such as support, lift, or confidence thresholds to identify important rules within the data. 

- **Frequent Itemset Generation:** Use algorithms like Apriori to generate itemsets with support above the specified threshold.

- **Rule Generation:** Generate rules from frequent itemsets with confidence above the user-specified threshold.

- **Rule Evaluation:** Evaluate rules using metrics like lift, leverage, and conviction to measure effectiveness and interest beyond basic support and confidence.


This code section generates rules using lift as the metric and then sorts the rules by confidence.

In [10]:
# Generating rules using lift as the metric
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
# Sorting rules by confidence
rules.sort_values('confidence', ascending = False, inplace = True)

In [11]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
31,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.011020,1.401255,0.503221
24,(glipizide),(abilify),0.065858,0.238368,0.027596,0.419028,1.757904,0.011898,1.310962,0.461536
28,(lisinopril),(abilify),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,0.474369
74,(lisinopril),(carvedilol),0.098254,0.174110,0.039195,0.398915,2.291162,0.022088,1.373997,0.624943
23,(fenofibrate),(abilify),0.051060,0.238368,0.020131,0.394256,1.653978,0.007960,1.257349,0.416672
...,...,...,...,...,...,...,...,...,...,...
30,(abilify),(metformin),0.238368,0.050527,0.023064,0.096756,1.914955,0.011020,1.051182,0.627330
14,(abilify),(clopidogrel),0.238368,0.059992,0.022797,0.095638,1.594172,0.008497,1.039415,0.489364
26,(abilify),(levofloxacin),0.238368,0.063325,0.020264,0.085011,1.342461,0.005169,1.023701,0.334938
22,(abilify),(fenofibrate),0.238368,0.051060,0.020131,0.084452,1.653978,0.007960,1.036472,0.519145


## C4. Top Three Rules

In [12]:
# selectimg the 3 best rules based on lift and confidence
rules[(rules['lift'] > 1.9) & (rules['confidence'] > 0.3)].sort_values(by=['lift'], ascending= False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
74,(lisinopril),(carvedilol),0.098254,0.17411,0.039195,0.398915,2.291162,0.022088,1.373997,0.624943
72,(glipizide),(carvedilol),0.065858,0.17411,0.02293,0.348178,1.999758,0.011464,1.267048,0.535186
31,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221


# Part IV: Data Summary and Implications

## D1. Significance of Support, Lift, and Confidence

- **Support:** Measures how frequently item pairs appear in transactions.
    - 'Lisinopril' and 'Carvedilol' appear together in 3.9195% of all transactions.
    - 'Glipizide' and 'Carvedilol' appear together in 2.2939% of all transactions.
    - 'Metformin' and 'Abilify' appear together in 2.3064% of all transactions.
- **Confidence:** Indicates the likelihood of purchasing the consequent when the antecedent is purchased.
    - 39.8915% of transactions with 'Lisinopril' also include 'Carvedilol'.
    - 34.8178% of transactions with 'Glipizide' also include 'Carvedilol'.
    - 45.6464% of transactions with 'Metformin' also include 'Abilify'.
- **Lift:** Shows how much more likely the items are to be bought together than expected if they were independent.
    - Transactions with 'Lisinopril' are 2.29 times more likely to include 'Carvedilol'.
    - Transactions with 'Glipizide' are 1.99 times more likely to include 'Carvedilol'.
    - Transactions with 'Metformin' are 1.91 times more likely to include 'Abilify'.

These metrics help evaluate the strength and significance of the association between different medications in clinical settings, indicating potential prescription patterns.

## D2. Practical Significance

The medication pairings in the results seem plausible.

1. **Lisinopril and Carvedilol:**
- **Lisinopril** is an ACE inhibitor commonly used to treat high blood pressure and heart failure.
- **Carvedilol** is a beta-blocker used to manage heart conditions including heart failure and hypertension.
- Pairing these medications can be common in patients with heart failure or hypertension as they work synergistically to reduce blood pressure and improve heart function. The high lift value (2.291162) suggests that they appear together more often than would be expected by chance, which aligns with their combined use in treating cardiovascular diseases.
2. **Glipizide and Carvedilol:**
- **Glipizide** is a sulfonylurea class medication used to control blood sugar levels in patients with type 2 diabetes.
- **Carvedilol**, as noted, manages heart conditions.
- Patients with type 2 diabetes often have comorbid cardiovascular diseases. Thus, seeing these medications prescribed together makes clinical sense due to the overlap of diabetes and heart disease in many patients. This might explain their association in your dataset, though the lift value (1.999758) is slightly lower compared to the first pairing, indicating a less strong but still significant relationship.
3. **Metformin and Abilify:**
- **Metformin** is widely used for managing type 2 diabetes.
- **Abilify (Aripiprazole)** is an antipsychotic used in the treatment of schizophrenia and bipolar disorder.
- This pairing is less intuitive from a direct treatment perspective, as they treat very different conditions. However, the use of metformin with antipsychotics like Abilify can be seen in patients with metabolic syndrome induced by antipsychotic medications, or in those with both diabetes and a psychiatric condition. The lift value (1.914955) suggests a weaker association relative to the cardiovascular medications but still indicates a more frequent co-prescription than would be expected by chance.


In summary, the association rules can reflect common clinical practices, especially where comorbid conditions exist. The strength of these associations (as indicated by lift) provides insights into the co-management strategies for patients with overlapping chronic diseases. Understanding these patterns can help in optimizing therapeutic strategies and patient education about their medications.

## D3. Course of Action

### Medication Management and Review
- **Clinical Review:** Conduct a clinical review of Carvedilol use to ensure it is being prescribed optimally. Evaluate if there are specific conditions or patient demographics where its use is more prevalent and effective.
- **Medication Optimization:** Consider optimizing Carvedilol treatment protocols. This could involve reassessing the dosage and combinations with other medications to enhance efficacy and reduce potential side effects.
- **Ongoing Research:** Continuously analyze prescription data to identify trends and shifts in the use of Carvedilol, adapting hospital policies accordingly.