# D212 Data Mining II Performance Assessment, Task \# 3

Submitted by William J Townsend, Student ID 003397146, for WGU's MSDA program

## Table of Contents
<ul>
<li><a href="#PartA1">A1: Research Question</a></li>
<li><a href="#PartA2">A2: Objectives and Goals of Analysis</a></li>
<li><a href="#PartB1">B1: Explanation of Market Basket Analysis</a></li>
<li><a href="#PartB2">B2: Transaction Example</a></li>
<li><a href="#PartB3">B3: Assumptions of Market Basket Analysis</a></li> 
<li><a href="#PartC1">C1: Data Preparation</a></li>
<li><a href="#PartC2">C2: Generation of Association Rules</a></li>
<li><a href="#PartC3">C3: Association Rules Table</a></li>
<li><a href="#PartC4">C4: Top Rules</a></li>
<li><a href="#PartD1">D1: Results of Analysis</a></li>
<li><a href="#PartD2">D2: Practical Significance</a></li>
<li><a href="#PartD3">D3: Recommended Action</a></li>
<li><a href="#PartE">E: Panopto Recording</a></li>
<li><a href="#PartF">F: Code References</a></li>
<li><a href="#PartG">G: Source References</a></li>    
</ul>

<a id='PartA1'></a>
## A1: Research Question

My research question for this project is determining what (if any) medications are positively associated with prescriptions & purchases of Cialis. In other words, which medications are also prescribed and purchased by patients alongside Cialis?

<a id="#PartA2"></a>
## A2: Objectives and Goals of Analysis

Examining what medications are often co-prescribed alongside another can have incentives under capitalism that do not belong in this analysis. However, this information can also be used for outcomes that positively impact patients, rather than Big Pharma bottom lines, such as finding other health issues which have high rates of co-occurrence through looking at the rates at which prescriptions co-occur.  

<a id="#PartB1"></a>
## B1: Explanation of Market Basket Analysis

Market basket analysis is the process of examining lists of items purchased in a dataset full of transactions to determine "association rules" regarding the purchasing of certain items within that basket. If we imagine a trip to the grocery store, we can understand that someone buying a loaf of bread is likely to buy something to put on that bread, such as peanut butter and/or jelly, or lunch meat and/or cheese. Market basket analysis works by taking combing through every transaction in a database and then calculating a number of statistics about how often bread is purchased as well as how often bread is purchased alongside other items such as peanut butter or lunch meat. 

In this way, a concluding rule is eventually formed which states "If bread, then lunch meat", indicating that the purchase of bread by a customer increases the likelihood of that customer also purchasing lunch meat on that transaction. In that example, bread is the antecedent (it is determined to come first), and lunch meat is the consequent. Many rules can be formed, and they can share antecedents (such as Bread -> Lunch Meat *and* Bread -> Peanut Butter) or consequents (such as Rice Cakes -> Peanut Butter), and rules may have multiple antecedents or consequents (such as Bread -> Peanut Butter, Jelly). 

In this analysis, I'm anticipating getting rules which include Cialis, whether that be as the antecedent or the consequent. This could be either "If Cialis, then X" or "If X, then Cialis". 

<a id="#PartB2"></a>
## B2: Transaction Example

We can see an example of a transaction by importing the data and printing out a single transaction. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import association_rules, apriori
from mlxtend.preprocessing import TransactionEncoder

# The CSV's first column is an index and Pandas will duplicate this and create an column without 'index_col=0'
df = pd.read_csv('./medical_market_basket.csv')
# Show an example of a transaction in the dataset
df.iloc[3]

Presc01                   citalopram
Presc02                      benicar
Presc03    amphetamine salt combo xr
Presc04                          NaN
Presc05                          NaN
Presc06                          NaN
Presc07                          NaN
Presc08                          NaN
Presc09                          NaN
Presc10                          NaN
Presc11                          NaN
Presc12                          NaN
Presc13                          NaN
Presc14                          NaN
Presc15                          NaN
Presc16                          NaN
Presc17                          NaN
Presc18                          NaN
Presc19                          NaN
Presc20                          NaN
Name: 3, dtype: object

Here we can see that the dataset allows up to 20 different items/prescriptions (Presc01 - Presc20). This particular transaction contains 3 prescriptions: 'citalopram', 'benicar', and 'amphetamine salt combo xr'. Each row in the dataset represents a particular transaction where at least 1 and up to 20 prescriptions were purchased, and the market basket analysis will look at the associations of prescriptions purchased together to form associations between those prescriptions.  

<a id="#PartB3"></a>
## B3: Assumptions of Market Basket Analysis

Market basket analysis works on the idea that products purchased together in a transaction (a basket) have a meaning, that the purchase of one complements another. While we know from our own grocery store trips that sometimes things are a random throw-in (especially if we're hungry), the idea is that by looking at a large enough dataset, we can see consistent relationships start to emerge, such as the prior example about buying bread often involving another purchase to put something on that bread.

By using market basket analysis, these relationships can be sifted out of a much larger pile of data where the relationship might not necessarily be visible. Additionally, because the dataset includes all transactions, this can also be used for tracking placement of items within a retail location to see if different placements have different impacts, such as putting the guacamole next to the tortilla chips instead of next to the ketchup or even staging all of those candy bars near the cash register. 

<a id="#PartC1"></a>
## C1: Data Preparation

With the data already loaded into a dataframe, it needs cleaned up into a format that will allow market basket analysis to be performed. 

This is a good place to note that we're dealing with prescriptions by name, which will lead to associations between specific medications, rather than *types* of medications. To use a non-prescription example, this is like comparing 'ibuprofen', 'naproxen', 'acetaminophen', and 'aspirin' with various other medicines. There's a legitimate argument that it would be more effective to combine similar types of prescriptions instead, such as if we combined 'ibuprofen', 'naproxen', 'acetaminophen', and 'aspirin' into a single category of 'pain_reliever'. I tend to think that this would be a better way to perform this analysis, but I am not a pharmacist. As a result, I lack the domain knowledge necessary to be able to do this sort of grouping with any real effectiveness. 

Because of this, the data will be prepared and analyzed in its present form, comparing specific medications against each other, rather than groups or classes of medications. 

In [2]:
# Check data types and number of values, as well as overall size of dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15002 entries, 0 to 15001
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Presc01  7501 non-null   object
 1   Presc02  5747 non-null   object
 2   Presc03  4389 non-null   object
 3   Presc04  3345 non-null   object
 4   Presc05  2529 non-null   object
 5   Presc06  1864 non-null   object
 6   Presc07  1369 non-null   object
 7   Presc08  981 non-null    object
 8   Presc09  654 non-null    object
 9   Presc10  395 non-null    object
 10  Presc11  256 non-null    object
 11  Presc12  154 non-null    object
 12  Presc13  87 non-null     object
 13  Presc14  47 non-null     object
 14  Presc15  25 non-null     object
 15  Presc16  8 non-null      object
 16  Presc17  4 non-null      object
 17  Presc18  4 non-null      object
 18  Presc19  3 non-null      object
 19  Presc20  1 non-null      object
dtypes: object(20)
memory usage: 2.3+ MB


In [3]:
# Visually inspect dataframe to facilitate exploration, spot problems
pd.set_option("display.max_columns", None)
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [4]:
# For some reason, the provided dataset has every line of data separated by a blank line. We don't want these rows
df = df[df['Presc01'].notna()]
# Reset index while we're at it, so we're not "missing" every other row
df.reset_index(drop=True, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Presc01  7501 non-null   object
 1   Presc02  5747 non-null   object
 2   Presc03  4389 non-null   object
 3   Presc04  3345 non-null   object
 4   Presc05  2529 non-null   object
 5   Presc06  1864 non-null   object
 6   Presc07  1369 non-null   object
 7   Presc08  981 non-null    object
 8   Presc09  654 non-null    object
 9   Presc10  395 non-null    object
 10  Presc11  256 non-null    object
 11  Presc12  154 non-null    object
 12  Presc13  87 non-null     object
 13  Presc14  47 non-null     object
 14  Presc15  25 non-null     object
 15  Presc16  8 non-null      object
 16  Presc17  4 non-null      object
 17  Presc18  4 non-null      object
 18  Presc19  3 non-null      object
 19  Presc20  1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


In [5]:
# Verify that NaN rows are gone
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
1,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
2,enalapril,,,,,,,,,,,,,,,,,,,
3,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
4,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,


In [6]:
# Store data in a big list of lists
temp_big_list = []
# Iterate through each row, and within each row, iterate through each column
for row_number in range(len(df)):
    # Generate a temporary small list for each row
    temp_small_list = []
    for cell in range(len(df.columns)):
        # Check that cell contents are NOT null (we don't want to carry forth NaNs into the resulting Dataframe)
        if not pd.isnull(df.iloc[row_number, cell]):
            # If cell contents are not null (so, a prescription is present) then add a string version of that cell's contents
            # to the temporary small list
            temp_small_list.append(str(df.values[row_number, cell]))
    # Add the small list to the ongoing big list, for our list of lists
    temp_big_list.append(temp_small_list)
# Check that temp_big_list looks how we expect (a list of lists) by checking a few entries
print(f"Checking list of lists... \nindex 0: {temp_big_list[0]}\nindex 1: {temp_big_list[1]}\n...\nindex7500: {temp_big_list[7500]}")

Checking list of lists... 
index 0: ['amlodipine', 'albuterol aerosol', 'allopurinol', 'pantoprazole', 'lorazepam', 'omeprazole', 'mometasone', 'fluconozole', 'gabapentin', 'pravastatin', 'cialis', 'losartan', 'metoprolol succinate XL', 'sulfamethoxazole', 'abilify', 'spironolactone', 'albuterol HFA', 'levofloxacin', 'promethazine', 'glipizide']
index 1: ['citalopram', 'benicar', 'amphetamine salt combo xr']
...
index7500: ['amphetamine salt combo xr', 'levofloxacin', 'diclofenac sodium', 'cialis']


In [7]:
# Instantiate the transaction encoder
encoder = TransactionEncoder()
# Fit the transaction encoder to our list of lists, and then transform that data and store it in a temporary array
temp_array = encoder.fit(temp_big_list).transform(temp_big_list)
# Generate a new dataframe from this temporary array
clean_df = pd.DataFrame(temp_array, columns=encoder.columns_)
# Check that the new dataframe looks how we'd expect
clean_df

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,alprazolam,amitriptyline,amlodipine,amoxicillin,amphetamine,amphetamine salt combo,amphetamine salt combo xr,atenolol,atorvastatin,azithromycin,benazepril,benicar,boniva,bupropion sr,carisoprodol,carvedilol,cefdinir,celebrex,celecoxib,cephalexin,cialis,ciprofloxacin,citalopram,clavulanate K+,clonazepam,clonidine HCI,clopidogrel,clotrimazole,codeine,crestor,cyclobenzaprine,cymbalta,dextroamphetamine XR,diazepam,diclofenac sodium,doxycycline hyclate,enalapril,escitalopram,esomeprazole,ezetimibe,fenofibrate,fexofenadine,finasteride,flovent hfa 110mcg inhaler,fluconozole,fluoxetine HCI,fluticasone,fluticasone nasal spray,folic acid,furosemide,gabapentin,glimepiride,glipizide,glyburide,hydrochlorothiazide,hydrocodone,hydrocortisone 2.5% cream,ibuprophen,isosorbide mononitrate,lansoprazole,lantus,levofloxacin,levothyroxine sodium,lisinopril,lorazepam,losartan,lovastatin,meloxicam,metformin,metformin HCI,methylprednisone,metoprolol,metoprolol succinate XL,metoprolol tartrate,mometasone,naproxen,omeprazole,oxycodone,pantoprazole,paroxetine,pioglitazone,potassium Chloride,pravastatin,prednisone,pregabalin,promethazine,quetiapine,ranitidine,rosuvastatin,salmeterol inhaler,sertraline HCI,simvastatin,spironolactone,sulfamethoxazole,synthroid,tamsulosin,temezepam,topiramate,tramadol,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
0,False,False,False,True,False,False,True,True,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False,True,False,True,False,True,False,True,False,False,False,True,False,False,True,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [8]:
clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Columns: 119 entries, Duloxetine to zolpidem
dtypes: bool(119)
memory usage: 871.8 KB


In [9]:
# Save dataframe to CSV, ignore index (if included, this will create an additional unnecessary column)
clean_df.to_csv('task3_full_clean.csv', index=False)

At this point, each prescription occurring in the overall dataset has become a column, giving us 119 columns. We still have the same 7501 rows of transactions, but rather than each transaction being a list of medications, they become a list of Trues (the customer did get this medication) and Falses (the customer did not get this medication). 

<a id="#PartC2"></a>
## C2: Generation of Association Rules

Generation of the association rules is done by using two functions. First, `apriori` is used to pull all associations that meet a minimum support threshold that we provide, and then `association_rules` is used to fully flesh out those rules, while applying a further reduction through providing another minimum threshold, for which I'll focus on each rule's lift value. 

In [10]:
# Use the Apriori algorithm to generate frequent itemsets
frequent_itemsets = apriori(clean_df, min_support = 0.02, use_colnames = True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.046794,(Premarin)
1,0.238368,(abilify)
2,0.020397,(albuterol aerosol)
3,0.033329,(allopurinol)
4,0.079323,(alprazolam)
...,...,...
98,0.023064,"(diazepam, lisinopril)"
99,0.023464,"(losartan, diazepam)"
100,0.022930,"(diazepam, metoprolol)"
101,0.020131,"(glyburide, doxycycline hyclate)"


In [11]:
# Use association_rules with a lift of greater than 1 (antecedent increasing likelihood of consequent)
rules = association_rules(frequent_itemsets, metric = 'lift', min_threshold = 1.0)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(amlodipine),(abilify),0.071457,0.238368,0.023597,0.330224,1.385352,0.006564,1.137144
1,(abilify),(amlodipine),0.238368,0.071457,0.023597,0.098993,1.385352,0.006564,1.030562
2,(abilify),(amphetamine salt combo),0.238368,0.068391,0.024397,0.102349,1.496530,0.008095,1.037830
3,(amphetamine salt combo),(abilify),0.068391,0.238368,0.024397,0.356725,1.496530,0.008095,1.183991
4,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.008090,1.062815
...,...,...,...,...,...,...,...,...,...
89,(metoprolol),(diazepam),0.095321,0.163845,0.022930,0.240559,1.468215,0.007312,1.101015
90,(glyburide),(doxycycline hyclate),0.170911,0.095054,0.020131,0.117785,1.239135,0.003885,1.025766
91,(doxycycline hyclate),(glyburide),0.095054,0.170911,0.020131,0.211781,1.239135,0.003885,1.051852
92,(losartan),(glyburide),0.132116,0.170911,0.028530,0.215943,1.263488,0.005950,1.057436


I used 0.02 as a minimum support measurement for the frequent_itemsets table. This is, in my opinion, a decent balance between being inclusive and restrictive here. The support score of a given rule indicates the proportion of transactions containing this rule. Because the dataset is not particularly large as it is, with 7501 transactions spread across 119 different medications, I did not want to go with a very low support value, which could introduce issues with small sample sizes. A higher support value could end up being "too exclusive" and include only the most obvious rules in the analysis. A threshold for support value of 0.02 strikes a reasonable balance between the two, in my opinion, as it requires that the rule has occurred at least 151 times in the dataset, out of the 7501 transactions. 

Setting the minimum lift to 1.0 for the associated rules table ensures that we are taking the rules for which the antecedent can be seen to have increased the likelihood of the consequent appearing in the transaction. 

<a id="#PartC3"></a>
## C3: Association Rules Table

I am required to provide the association rules table, complete with scores for support, confidence, and lift. This was done in C2, and is reproduced here to satisfy the rubric. 

In [12]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(amlodipine),(abilify),0.071457,0.238368,0.023597,0.330224,1.385352,0.006564,1.137144
1,(abilify),(amlodipine),0.238368,0.071457,0.023597,0.098993,1.385352,0.006564,1.030562
2,(abilify),(amphetamine salt combo),0.238368,0.068391,0.024397,0.102349,1.496530,0.008095,1.037830
3,(amphetamine salt combo),(abilify),0.068391,0.238368,0.024397,0.356725,1.496530,0.008095,1.183991
4,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.008090,1.062815
...,...,...,...,...,...,...,...,...,...
89,(metoprolol),(diazepam),0.095321,0.163845,0.022930,0.240559,1.468215,0.007312,1.101015
90,(glyburide),(doxycycline hyclate),0.170911,0.095054,0.020131,0.117785,1.239135,0.003885,1.025766
91,(doxycycline hyclate),(glyburide),0.095054,0.170911,0.020131,0.211781,1.239135,0.003885,1.051852
92,(losartan),(glyburide),0.132116,0.170911,0.028530,0.215943,1.263488,0.005950,1.057436


<a id="#PartC4"></a>
## C4: Top Rules

The top three rules in the associated rules table can be seen here, having a lift of over 1.9 and confidence of 0.3. 

In [13]:
top_3_rules = rules[(rules['lift'] > 1.9) & (rules['confidence'] > 0.3)].sort_values(by=['lift'], ascending= False)
top_3_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
75,(lisinopril),(carvedilol),0.098254,0.17411,0.039195,0.398915,2.291162,0.022088,1.373997
72,(glipizide),(carvedilol),0.065858,0.17411,0.02293,0.348178,1.999758,0.011464,1.267048
30,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255


The higher lift indicates a stronger effect of the consequent being included in the transaction with the antecedent. The confidence score indicates a proportion of all transactions including the antecedent also featuring the consequent, and imposing a moderate confidence threshold here requires that we see a more significant proportion of antecedent transactions also include the consequent. 

<a id="#PartD1"></a>
## D1: Results of Analysis

Having satisfied the rubric, I can now complete my analysis, to attempt to answer my research question. First, I'll check the value counts for both the antecedents and the consequents to see if Cialis shows up at all in the final set of rules, generated above. 

In [14]:
rules.antecedents.value_counts()

(abilify)                      18
(carvedilol)                   12
(amphetamine salt combo xr)     9
(diazepam)                      8
(atorvastatin)                  7
(glyburide)                     6
(metoprolol)                    5
(lisinopril)                    4
(doxycycline hyclate)           4
(losartan)                      4
(citalopram)                    4
(glipizide)                     2
(amphetamine salt combo)        2
(amlodipine)                    2
(dextroamphetamine XR)          1
(clopidogrel)                   1
(fenofibrate)                   1
(levofloxacin)                  1
(metformin)                     1
(cialis)                        1
(naproxen)                      1
Name: antecedents, dtype: int64

In [15]:
rules.consequents.value_counts()

(abilify)                      18
(carvedilol)                   12
(amphetamine salt combo xr)     9
(diazepam)                      8
(atorvastatin)                  7
(glyburide)                     6
(metoprolol)                    5
(doxycycline hyclate)           4
(lisinopril)                    4
(losartan)                      4
(citalopram)                    4
(amlodipine)                    2
(amphetamine salt combo)        2
(glipizide)                     2
(dextroamphetamine XR)          1
(levofloxacin)                  1
(clopidogrel)                   1
(metformin)                     1
(naproxen)                      1
(cialis)                        1
(fenofibrate)                   1
Name: consequents, dtype: int64

In [16]:
ant_df = rules[rules['antecedents'] == {'cialis'}]
con_df = rules[rules['consequents'] == {'cialis'}]
cialis_df = pd.concat([ant_df, con_df])
cialis_df

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
10,(cialis),(abilify),0.076523,0.238368,0.023997,0.313589,1.315565,0.005756,1.109585
11,(abilify),(cialis),0.238368,0.076523,0.023997,0.100671,1.315565,0.005756,1.026851


We return two rules, one that stipulates that "If Cialis, then Abilify" and the other stipulating "If Abilify, then Cialis". We can summarize what the support, lift, and confidence scores in the above table tell us, [with assistance from Susan Currie Sivek at Towards Data Science (2020)](https://towardsdatascience.com/market-basket-analysis-101-key-concepts-1ddc6876cd00). 
- Support is the proportion of transactions containing the rule of the total transactions in the dataset. As we can see, nearly 2.4% of the 7501 transactions contain both Abilify and Cialis. 
- Lift is how much we exceed expectations that some people would happen to purchase both Abilify and Cialis if there were no relationship between the two. The lift being greater than 1 indicates that there is a positive relationship, in that the antecedent is increasing the likelihood of the consequent occurring in the transaction (1.00 would indicate no relationship). Like Support, this score is the same for both directions that the rule could exist. 
- Confidence is the proportion of all transactions which include the rule (Abilify *and* Cialis) over the proportion of transactions containing just the antecedent for the rule. In this case, the Cialis -> Abilify rule has a much higher confidence (about 3x higher) than the Abilify -> Cialis rule. 

The issue of which direction this rule should go (which medication is the antecedent) can largely be determined by the confidence level, which is much higher for Cialis as the antecedent, rather than the consequent. A little bit of research provides some intuitive backing for this direction of the rule, as well. 

Some quick searches for [Cialis](https://www.webmd.com/drugs/2/drug-77881/cialis-oral/details) and [Abilify](https://www.webmd.com/drugs/2/drug-64439/abilify-oral/details) on WebMD give information about the use cases for each. Cialis is used to treat impotence or erectile dysfunction, while Abilify is used to treat several mood disorders, such as bipolar disorder, schizophrenia, Tourette's syndrome, irritability associated with autistic disorder, and depression. Given this information, the much larger use case for Abilify could reasonably be expected to include many more situations where the patient does not also experience erectile dysfunction and thus would not need Cialis. In the opposite situation, it can also be expected that someone needing to take Cialis might also be dealing with other mood disorders, which may be responsible for their situation of needing Cialis in the first place. 

In this way, we can reasonably conclude that the rule is correctly "If Cialis, then Abilify". This means that if someone is purchasing Cialis, then they are likely to also purchase Abilify. 

<a id="#PartD2"></a>
## D2: Practical Significance 

I think that this relationship is reasonably significant, practically speaking. While I did not expect the hospital to prescribe Cialis very much (I figured that this would generally be done by family doctors instead), over 7.5% of transactions in the dataset included Cialis. This also adds up intuitively, as erectile dysfunction is a persistent issue (as opposed to an occasional one) that is generally a symptom of other physical or mental health issues, rather than being a standalone diagnosis itself [(Mayo Clinic, 2022)](https://www.mayoclinic.org/diseases-conditions/erectile-dysfunction/symptoms-causes/syc-20355776). The variety of causes of erectile dysfunction does mean there are cases where Cialis is needed but not Abilify, but a prescribing doctor should have a good idea of a patient's overall health, including diagosed conditions, and a mood disorder is one that can often be more easily concealed than a physical disorder. 

<a id="#PartD3"></a>
## D3: Recommended Action

The relationship here of "If Cialis, then Abilify" indicates that people who need a prescription for Cialis often also need a prescription for Abilify. When Cialis is prescribed for erectile dysfunction, especially without an obvious coexisting medical condition for which the erectile dysfunction is a likely symptom, this relationship is a strong indication that the prescribing doctor should give serious consideration to the potential for a mood disorder as a possibility for the patient. Emotionally intelligent probative questions by the doctor can indicate if an evaluation for such may be a good idea and worth recommending to the patient. I recommend that this be a standard consideration for all of the hospital system's doctors when prescribing Cialis, especially where another medical condition is not apparent which would be a likely cause for the erectile dysfunction. 

<a id="#PartE"></a>
## E: Panopto Recording

My presentation of this performance assessment [can be viewed here, via Panopto.](https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=745cbd13-c058-4497-b343-af9000596417)

<a id="#PartF"></a>
## F: Code References

[DataCamp: Market Basket Analysis in Python by Isaiah Hull](https://campus.datacamp.com/courses/market-basket-analysis-in-python/aggregation-and-pruning?learningMode=course&ex=10) was used for getting the rules table generated. 

[mlxtend documentation: Filtering](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/) was useful for filtering the 'rules' dataframe by drug name. The antecedent and consequent columns do not contain strings but rather frozensets of strings, which caused me some problems before the documentation was able to help me correctly filter for Cialis. 

<a id="#PartG"></a>
## G: Source References

[Susan Currie Sivek @ Towards Data Science](https://towardsdatascience.com/market-basket-analysis-101-key-concepts-1ddc6876cd00) was immensely helpful for clearly summarizing the importance of support, confidence, and lift in a way that made much more sense than the course material did. 

[WebMD: Abilify (Oral)](https://www.webmd.com/drugs/2/drug-64439/abilify-oral/details) and [WebMD: Cialis (Oral)](https://www.webmd.com/drugs/2/drug-77881/cialis-oral/details) were used for gathering information to determine which direction the Cialis/Abilify rule was pointing. 

[Mayo Clinic: Erectile Dysfunction](https://www.mayoclinic.org/diseases-conditions/erectile-dysfunction/symptoms-causes/syc-20355776) was used for some additional information regarding the practical significance of my analysis.