Dung Nguyen

Data Mining II, Dimensionality Reduction Methods - Task 2

Market Basket Analysis

December 25, 2021

**Scenario 2**

One of the most critical factors in patient relationship management that directly affects a hospital’s long-term cost-effectiveness is understanding its patients and the conditions leading to hospital admissions. When a hospital can better understand its patients’ characteristics, it is better able to target treatment to patients, resulting in more effective cost of care for the hospital in the long term.
 
You are an analyst for a hospital that wants to better understand the characteristics of its patients. You have been asked to use PCA to analyze patient data to identify the principal variables of your patients, ultimately allowing better business and strategic decision-making for the hospital.



# Part 1: Research Question

**A1. Research Question:**

Can we determine which variables are the most important in combination with prescription that might reduce the number of patients at high risk of readmission? What are the top 3 associated prescriptions drugs which can help us better understand our patients and identify patterns unique to the readmitted patient?

I will use the market basket analysis in this research assignment.

**A2. The goal of the data analysis:**
    
The business will be able to identify which variables are the most important in combination with the prescription drug to predict which patients are at high risk of readmission with some measure of confidence. This process will provide weight for making improvements to hospital services and treatments. This data analysis aims to present numerical values to hospital businesses to help them better understand their patients and the factor that causes readmissions.

# Part 2: Market Basket Justification

**B1. Explanation of the market basket analysis:**

"Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. It involves analyzing large data sets, such as purchase history, to reveal product groupings, as well as products that are likely to be purchased together." (Tech Target Contributor, 2019)

Market basket analysis (MBA) is the technique that allows us to analyze what patients buy together and why they buy it. For example, we can determine which variables are the most important in combination with prescription that might reduce the number of patients at high risk of readmission. The MBA is based on the "basket/cart"–items patients purchase at one time to understand the purchase strategy.

The plan for analysis includes:

1. Prepare the dataset
* Install necessary packages
* Read and check the data set in Python using Pandas' read_csv command
* Naming the dataset as the variable "medical_df" and subsequent proper slices of the data frame as "medical_df".
* Evaluate the data structure to understand input data better
* Re-validate column, null values, and find missing values.
* Drop records with missing values and Review changes
* Replace empty values with 0
* Show the prepared data.
2. Use the Apriori method to identify association rules
3. Check the rules with the highest importance for confidence, support, and lift
4. The action following the results of the analysis


**B2. One example of transactions**

From the given data set, the first transaction of twenty prescription drugs was taken by the patient, including:

- amlodipine
- albuterol aerosol
- allopurinol
- pantoprazole
- lorazepam
- omeprazole
- mometasone
- fluconozole
- gabapentin
- pravastatin
- cialis
- losartan
- metoprolol succinate XL
- sulfamethoxazole
- abilify
- spironolactone
- albuterol HFA
- levofloxacin
- promethazine
- glipizide

**B3. Assumption of the market basket analysis** (one)

According to Hua, "The underlying assumption in market basket analysis is that joint occurrence of two or more products in most baskets imply that these products are complements in purchase, therefore, purchase of one will lead to purchase of others." (Hua, n.d)

In this assignment, I will look at three methods of MBA measuring:
1. Support: give the number of transactions containing all the cart items.
2. Confidence is the conditional probability predicting the statement's right-hand side.
3. Lift is the rule's confidence ratio, which is calculated by comparing the complete management to the right-hand side.


**Tool and technique:**

I have many programming languages that I can use, such as R and Python, to achieve this process. In this assignment, I use Python to assess data quality, clean the data, and predict the data. Python is a multipurpose programming language with libraries that extend its capabilities to do statistical analysis. For the beginner, coding in Python is easy to read and easy to understand the flows of the program. Also, I will work with Python in the Jupyter notebooks as a convenient way to run code and visualizations and accessible to running documentation for my reference. 
 
The libraries and packages used to clean the data run in python environments such as Panda, NumPy, Scipy, Matplotlib, and Seaborn. These libraries and packages provide functionalities like reading large datasets like statistical functions like Zscore, creating visualization models like box plots and histograms. I also use the apriori algorithm package in order to help the manager in market basket analysis within Python.


# Part 3: Data Preparation

**C1. Transforming the dataset**

I chose the medical dataset for this performance assignment by using **medical_market_basket.cvs** file.
The medical_clean data set is 7501 patient precription records in the medical industry and 20 columns (variables).




**Explain the steps**

In [1]:
#1. Install necessary packages

# Install necessary packages
!pip install pandas
!pip install numpy
!pip install scipy
!pip install sklearn
!pip install matplotlib
!pip install sns
!pip install plotly
!pip install missingno
!pip install six
!pip install pydotplus
!pip install graphviz
!pip install apyori
# Please watch the video to see the file path.



In [2]:
# Ignor warning error
import warnings
warnings.filterwarnings('ignore')

# Standard imports
import os
import sys
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import missingno as msno
from sklearn import decomposition
from sklearn.preprocessing import scale
from sklearn.decomposition import PCA


#for encoding
%matplotlib inline 

from apyori import apriori

# Scipy
from scipy.cluster.vq import kmeans, vq

# Import Scikit-learn 
import sklearn
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split


#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report, confusion_matrix #for visualizing tree 
from sklearn.tree import plot_tree




In [3]:
#Read and check the data set in Python using Pandas' read_csv command
path = 'medical_market_basket.csv'
medical_df_basket = pd.read_csv(path)

In [4]:
#Display Medical dataframe
medical_df = medical_df_basket.copy()
medical_df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [5]:
#Evaluate the data structure to understand input data better

# Dataset size
medical_df.shape

(15002, 20)

In [6]:
#Check numerical and categorical data
# check numerical and categorical data
cat_df = medical_df.select_dtypes(include=['object'])
num_df = medical_df.select_dtypes(exclude=['object'])

def printColumnTypes(non_numeric_df, numeric_df):
    '''separates non-numeric and numeric columns'''
    print("Non-Numeric columns:")
    for col in non_numeric_df:
        print(f"{col}")
    print("")
    print("Numeric columns:")
    for col in numeric_df:
        
        print(f"{col}")
        
printColumnTypes(cat_df, num_df)

Non-Numeric columns:
Presc01
Presc02
Presc03
Presc04
Presc05
Presc06
Presc07
Presc08
Presc09
Presc10
Presc11
Presc12
Presc13
Presc14
Presc15
Presc16
Presc17
Presc18
Presc19
Presc20

Numeric columns:


In [7]:
# Check the information of data
medical_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15002 entries, 0 to 15001
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Presc01  7501 non-null   object
 1   Presc02  5747 non-null   object
 2   Presc03  4389 non-null   object
 3   Presc04  3345 non-null   object
 4   Presc05  2529 non-null   object
 5   Presc06  1864 non-null   object
 6   Presc07  1369 non-null   object
 7   Presc08  981 non-null    object
 8   Presc09  654 non-null    object
 9   Presc10  395 non-null    object
 10  Presc11  256 non-null    object
 11  Presc12  154 non-null    object
 12  Presc13  87 non-null     object
 13  Presc14  47 non-null     object
 14  Presc15  25 non-null     object
 15  Presc16  8 non-null      object
 16  Presc17  4 non-null      object
 17  Presc18  4 non-null      object
 18  Presc19  3 non-null      object
 19  Presc20  1 non-null      object
dtypes: object(20)
memory usage: 2.3+ MB


In [8]:
# Describe Churn dataset statistics
medical_df.describe()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
count,7501,5747,4389,3345,2529,1864,1369,981,654,395,256,154,87,47,25,8,4,4,3,1
unique,115,117,115,114,110,106,102,97,88,80,66,50,43,28,19,8,3,3,3,1
top,abilify,abilify,abilify,abilify,losartan,glyburide,losartan,losartan,losartan,losartan,cialis,losartan,losartan,losartan,celebrex,spironolactone,levofloxacin,temezepam,boniva,glipizide
freq,577,484,375,201,153,107,96,67,57,31,22,15,8,4,3,1,2,2,1,1


In [9]:
#Re-validate column data types and missing value

medical_df.columns.to_series().groupby(medical_df.dtypes).groups


{object: ['Presc01', 'Presc02', 'Presc03', 'Presc04', 'Presc05', 'Presc06', 'Presc07', 'Presc08', 'Presc09', 'Presc10', 'Presc11', 'Presc12', 'Presc13', 'Presc14', 'Presc15', 'Presc16', 'Presc17', 'Presc18', 'Presc19', 'Presc20']}

In [10]:
# check null values
df_nulls = medical_df.isnull().sum()
print(df_nulls)

Presc01     7501
Presc02     9255
Presc03    10613
Presc04    11657
Presc05    12473
Presc06    13138
Presc07    13633
Presc08    14021
Presc09    14348
Presc10    14607
Presc11    14746
Presc12    14848
Presc13    14915
Presc14    14955
Presc15    14977
Presc16    14994
Presc17    14998
Presc18    14998
Presc19    14999
Presc20    15001
dtype: int64


In [11]:
# check missing data and its size
def missing_cols(medical_df):
    '''prints out columns with its amount of missing values'''
    total = 0
    for col in medical_df.columns:
        missing_vals = medical_df[col].isnull().sum()
        total += missing_vals
        if missing_vals != 0:
            print(f"{col} => {medical_df[col].isnull().sum()}")
    
    if total == 0:
        print("no missing values left")
            
missing_cols(medical_df)

Presc01 => 7501
Presc02 => 9255
Presc03 => 10613
Presc04 => 11657
Presc05 => 12473
Presc06 => 13138
Presc07 => 13633
Presc08 => 14021
Presc09 => 14348
Presc10 => 14607
Presc11 => 14746
Presc12 => 14848
Presc13 => 14915
Presc14 => 14955
Presc15 => 14977
Presc16 => 14994
Presc17 => 14998
Presc18 => 14998
Presc19 => 14999
Presc20 => 15001


In [12]:
# Drop records with missing values (Null value in all rows only)
medical_df.dropna(how='all', inplace=True)
# Review changes
medical_df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
5,enalapril,,,,,,,,,,,,,,,,,,,
7,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
9,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,


In [13]:
# Replace empty values with 0
medical_df.fillna(0, inplace=True)

In [14]:
# Check dataset size after changes
medical_df.shape

(7501, 20)

In [15]:
# Confirm no null values
medical_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7501 entries, 1 to 15001
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Presc01  7501 non-null   object
 1   Presc02  7501 non-null   object
 2   Presc03  7501 non-null   object
 3   Presc04  7501 non-null   object
 4   Presc05  7501 non-null   object
 5   Presc06  7501 non-null   object
 6   Presc07  7501 non-null   object
 7   Presc08  7501 non-null   object
 8   Presc09  7501 non-null   object
 9   Presc10  7501 non-null   object
 10  Presc11  7501 non-null   object
 11  Presc12  7501 non-null   object
 12  Presc13  7501 non-null   object
 13  Presc14  7501 non-null   object
 14  Presc15  7501 non-null   object
 15  Presc16  7501 non-null   object
 16  Presc17  7501 non-null   object
 17  Presc18  7501 non-null   object
 18  Presc19  7501 non-null   object
 19  Presc20  7501 non-null   object
dtypes: object(20)
memory usage: 1.2+ MB


In [16]:
# Prepared data set copy:
for col in medical_df.columns:
    print('-' * 40 + col + '-' * 40 , end=' - ')
    display(medical_df[col].value_counts())

medical_df.to_csv("medical_market_prepared.csv", index=False)

----------------------------------------Presc01---------------------------------------- - 

abilify           577
citalopram        576
paroxetine        458
diazepam          391
metoprolol        373
                 ... 
fluoxetine HCI      1
fexofenadine        1
valaciclovir        1
carisoprodol        1
crestor             1
Name: Presc01, Length: 115, dtype: int64

----------------------------------------Presc02---------------------------------------- - 

0                            1754
abilify                       484
carvedilol                    411
amphetamine salt combo xr     302
lisinopril                    291
                             ... 
alendronate                     2
clonazepam                      1
cyclobenzaprine                 1
fluoxetine HCI                  1
hydrocortisone 2.5% cream       1
Name: Presc02, Length: 118, dtype: int64

----------------------------------------Presc03---------------------------------------- - 

0                            3112
abilify                       375
carvedilol                    279
amphetamine salt combo xr     225
atorvastatin                  213
                             ... 
clonazepam                      1
fluoxetine HCI                  1
topiramate                      1
hydrocortisone 2.5% cream       1
metformin HCI                   1
Name: Presc03, Length: 116, dtype: int64

----------------------------------------Presc04---------------------------------------- - 

0                             4156
abilify                        201
amphetamine salt combo xr      181
glyburide                      174
carvedilol                     167
                              ... 
hydrocortisone 2.5% cream        1
meloxicam                        1
finasteride                      1
sulfamethoxazole                 1
flovent hfa 110mcg inhaler       1
Name: Presc04, Length: 115, dtype: int64

----------------------------------------Presc05---------------------------------------- - 

0                            4972
losartan                      153
amphetamine salt combo xr     134
glyburide                     130
diazepam                      115
                             ... 
sulfamethoxazole                2
tamsulosin                      2
valaciclovir                    2
fluoxetine HCI                  1
amphetamine salt combo          1
Name: Presc05, Length: 111, dtype: int64

----------------------------------------Presc06---------------------------------------- - 

0                            5637
glyburide                     107
amphetamine salt combo xr     102
losartan                      100
diazepam                       71
                             ... 
promethazine                    1
metoprolol                      1
bupropion sr                    1
mometasone                      1
viagra                          1
Name: Presc06, Length: 107, dtype: int64

----------------------------------------Presc07---------------------------------------- - 

0                            6132
losartan                       96
glyburide                      81
doxycycline hyclate            69
amphetamine salt combo xr      59
                             ... 
glimepiride                     2
trazodone HCI                   1
triamcinolone Ace topical       1
verapamil SR                    1
Yaz                             1
Name: Presc07, Length: 103, dtype: int64

----------------------------------------Presc08---------------------------------------- - 

0                      6520
losartan                 67
doxycycline hyclate      44
cialis                   43
glyburide                40
                       ... 
metformin                 1
finasteride               1
ranitidine                1
glimepiride               1
benicar                   1
Name: Presc08, Length: 98, dtype: int64

----------------------------------------Presc09---------------------------------------- - 

0                   6847
losartan              57
cialis                38
levofloxacin          35
glyburide             34
                    ... 
metformin              1
ranitidine             1
sulfamethoxazole       1
cephalexin             1
acetaminophen          1
Name: Presc09, Length: 89, dtype: int64

----------------------------------------Presc10---------------------------------------- - 

0                 7106
losartan            31
glyburide           19
cialis              17
pravastatin         17
                  ... 
tramadol             1
crestor              1
Yaz                  1
bupropion sr         1
clavulanate K+       1
Name: Presc10, Length: 81, dtype: int64

----------------------------------------Presc11---------------------------------------- - 

0                   7245
cialis                22
losartan              20
lantus                14
glyburide             12
                    ... 
sulfamethoxazole       1
azithromycin           1
viagra                 1
carisoprodol           1
ciprofloxacin          1
Name: Presc11, Length: 67, dtype: int64

----------------------------------------Presc12---------------------------------------- - 

0                            7347
losartan                       15
levofloxacin                   10
glyburide                      10
cialis                          9
doxycycline hyclate             7
lantus                          7
pravastatin                     7
zolpidem                        4
metoprolol succinate XL         4
celebrex                        3
clonidine HCI                   3
clotrimazole                    3
alprazolam                      3
diazepam                        3
boniva                          3
gabapentin                      3
dextroamphetamine XR            3
ezetimibe                       3
metoprolol tartrate             3
hydrocodone                     3
temezepam                       3
oxycodone                       3
actonel                         3
fluconozole                     3
celecoxib                       2
rosuvastatin                    2
Premarin                        2
amphetamine salt combo xr       2
crestor       

----------------------------------------Presc13---------------------------------------- - 

0                             7414
losartan                         8
cialis                           6
lantus                           6
glyburide                        4
pravastatin                      4
alprazolam                       4
celecoxib                        3
metoprolol tartrate              3
simvastatin                      3
actonel                          3
diclofenac sodium                2
clotrimazole                     2
gabapentin                       2
temezepam                        2
zolpidem                         2
doxycycline hyclate              2
lovastatin                       2
valsartan                        2
levofloxacin                     2
quetiapine                       2
ezetimibe                        1
clonazepam                       1
atenolol                         1
cymbalta                         1
fluconozole                      1
fexofenadine                     1
oxycodone                        1
celebrex            

----------------------------------------Presc14---------------------------------------- - 

0                            7454
losartan                        4
levofloxacin                    3
glyburide                       3
metoprolol tartrate             2
doxycycline hyclate             2
zolpidem                        2
lantus                          2
triamterene                     2
sulfamethoxazole                2
cialis                          2
alprazolam                      2
levothyroxine sodium            2
fluconozole                     2
pioglitazone                    2
fluticasone nasal spray         2
temezepam                       1
pregabalin                      1
amphetamine salt combo xr       1
amitriptyline                   1
metformin HCI                   1
venlafaxine XR                  1
clotrimazole                    1
ezetimibe                       1
hydrocodone                     1
pravastatin                     1
diclofenac sodium               1
simvastatin                     1
abilify                         1
Name: Presc14,

----------------------------------------Presc15---------------------------------------- - 

0                      7476
celebrex                  3
doxycycline hyclate       2
losartan                  2
lantus                    2
cialis                    2
pravastatin               1
cymbalta                  1
zolpidem                  1
metformin HCI             1
hydrocodone               1
ezetimibe                 1
diclofenac sodium         1
esomeprazole              1
clonidine HCI             1
spironolactone            1
metoprolol tartrate       1
clavulanate K+            1
pioglitazone              1
abilify                   1
Name: Presc15, dtype: int64

----------------------------------------Presc16---------------------------------------- - 

0                       7493
albuterol HFA              1
levofloxacin               1
celebrex                   1
quetiapine                 1
spironolactone             1
diazepam                   1
temezepam                  1
dextroamphetamine XR       1
Name: Presc16, dtype: int64

----------------------------------------Presc17---------------------------------------- - 

0                7497
levofloxacin        2
albuterol HFA       1
glyburide           1
Name: Presc17, dtype: int64

----------------------------------------Presc18---------------------------------------- - 

0               7497
temezepam          2
levofloxacin       1
promethazine       1
Name: Presc18, dtype: int64

----------------------------------------Presc19---------------------------------------- - 

0                7498
clonidine HCI       1
boniva              1
promethazine        1
Name: Presc19, dtype: int64

----------------------------------------Presc20---------------------------------------- - 

0            7500
glipizide       1
Name: Presc20, dtype: int64

In [17]:
# Convert the pandas dataset into list of lists format for use with Apriori algorithm
medical_list = []
for i in range(0, 7501):
    medical_list.append([str(medical_df.values[i, j]) for j in range(0, 20)])
    
medical_cleaned = pd.DataFrame(medical_list)

In [18]:
# Review DataFrame
medical_cleaned.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
1,citalopram,benicar,amphetamine salt combo xr,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,enalapril,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,paroxetine,allopurinol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,abilify,atorvastatin,folic acid,naproxen,losartan,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
#The first transaction of data set
medical_list[:1]

[['amlodipine',
  'albuterol aerosol',
  'allopurinol',
  'pantoprazole',
  'lorazepam',
  'omeprazole',
  'mometasone',
  'fluconozole',
  'gabapentin',
  'pravastatin',
  'cialis',
  'losartan',
  'metoprolol succinate XL',
  'sulfamethoxazole',
  'abilify',
  'spironolactone',
  'albuterol HFA',
  'levofloxacin',
  'promethazine',
  'glipizide']]

**C2. Code Execution with the Apriori Algorithm**

In [20]:
# Importing the library - Apriori algorithm
from apyori import apriori

# Train Apriori algorithm  model on the dataset
association_rules = apriori(medical_list, min_support = 0.002, min_confidence = 0.7, min_lift = 1.2, min_length = 2)

# Review generate rules
association_rules = list(association_rules)


In [21]:
#The number of rules
print(len(association_rules))

22


In [22]:
#Have a glance at the first rule

print(association_rules[0])

RelationRecord(items=frozenset({'carvedilol', 'abilify', 'valsartan'}), support=0.0023996800426609784, ordered_statistics=[OrderedStatistic(items_base=frozenset({'carvedilol', 'valsartan'}), items_add=frozenset({'abilify'}), confidence=0.72, lift=3.0205369127516777)])


The support value for the first rule is 0.0023996800426609784, which is calculated by dividing the number of transactions containing 'carvedilol', 'valsartan', 'abilify' by the total number of transactions.

The confidence level is 0.72, showing that out of all the transactions containing 'carvedilol' and 
'valsartan', 72% include 'abilify'.

The lift of 3.0205369127516777 tells us that the customer who buys both 'carvedilol' and 'valsartan' will be 3.0205369127516777 times more likely also buy abilify' and compared to the default likelihood sale of 'abilify'.

**C3. Association Rules Table**

In [23]:
#Now let run the list of association rules between all items

for i in range(0, len(association_rules)):
    print(association_rules[i][0])

frozenset({'carvedilol', 'abilify', 'valsartan'})
frozenset({'tamsulosin', 'paroxetine', 'abilify'})
frozenset({'alprazolam', 'acetaminophen', 'hydrocodone'})
frozenset({'carvedilol', '0', 'abilify', 'valsartan'})
frozenset({'tamsulosin', 'paroxetine', '0', 'abilify'})
frozenset({'alprazolam', '0', 'acetaminophen', 'hydrocodone'})
frozenset({'atorvastatin', 'amphetamine salt combo xr', 'abilify', 'glipizide'})
frozenset({'fenofibrate', 'carvedilol', 'amphetamine salt combo xr', 'abilify'})
frozenset({'atorvastatin', 'metformin', 'abilify', 'metoprolol'})
frozenset({'fenofibrate', 'carvedilol', 'doxycycline hyclate', 'abilify'})
frozenset({'metformin', 'carvedilol', 'doxycycline hyclate', 'abilify'})
frozenset({'glipizide', 'abilify', 'metoprolol', 'diazepam'})
frozenset({'carvedilol', 'lisinopril', 'metoprolol', 'amlodipine'})
frozenset({'glipizide', 'carvedilol', 'metoprolol', 'amphetamine salt combo'})
frozenset({'atorvastatin', '0', 'amphetamine salt combo xr', 'glipizide', 'abilify

In [24]:
#Display Rule, Support, Confidence and lift ratio for every above association rule by using for loop

for item in association_rules:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])
    # second index of the inner list
    print("Support: " + str(item[1]))
    # third index of the list located at 0th position
    # of the third index of the inner list
    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("-------------------------------------------")

Rule: carvedilol -> abilify
Support: 0.0023996800426609784
Confidence: 0.72
Lift: 3.0205369127516777
-------------------------------------------
Rule: tamsulosin -> paroxetine
Support: 0.0021330489268097585
Confidence: 0.7272727272727272
Lift: 3.051047386617856
-------------------------------------------
Rule: alprazolam -> acetaminophen
Support: 0.0025329956005865884
Confidence: 0.95
Lift: 11.976386554621849
-------------------------------------------
Rule: carvedilol -> 0
Support: 0.0023996800426609784
Confidence: 0.72
Lift: 3.0222271964185783
-------------------------------------------
Rule: tamsulosin -> paroxetine
Support: 0.0021330489268097585
Confidence: 0.7272727272727272
Lift: 3.0527547438571494
-------------------------------------------
Rule: alprazolam -> 0
Support: 0.0025329956005865884
Confidence: 0.95
Lift: 11.976386554621849
-------------------------------------------
Rule: atorvastatin -> amphetamine salt combo xr
Support: 0.0026663111585121984
Confidence: 0.7142857142

After running the final results to create the association rules table with Support, Confidence and lift ratio, we can prove statistically that 'carvedilol', and 'valsartan' have the highest combination of values for our three metrics:

Rule: carvedilol -> valsartan

Support: 0.0023996800426609784

Confidence: 0.72

Lift: 3.0205369127516777

The support value for the rule is 0.0023996800426609784, which is calculated by dividing the number of transactions containing 'carvedilol', 'valsartan' by the total number of transactions.

The confidence level is 0.72, which shows that out of all the transactions containing 'carvedilol', 72% include 'valsartan'.

The lift of 3.0205369127516777 tells us that the customer who buys 'carvedilol' will be 3.0205369127516777 times more likely also buy 'valsartan' and compared to the default likelihood sale of 'valsartan'.

In [25]:
# Transform results into DataFrame structure
medical_results = pd.DataFrame(association_rules)
medical_results

Unnamed: 0,items,support,ordered_statistics
0,"(carvedilol, abilify, valsartan)",0.0024,"[((carvedilol, valsartan), (abilify), 0.72, 3...."
1,"(tamsulosin, paroxetine, abilify)",0.002133,"[((tamsulosin, paroxetine), (abilify), 0.72727..."
2,"(alprazolam, acetaminophen, hydrocodone)",0.002533,"[((acetaminophen, hydrocodone), (alprazolam), ..."
3,"(carvedilol, 0, abilify, valsartan)",0.0024,"[((carvedilol, valsartan), (0, abilify), 0.72,..."
4,"(tamsulosin, paroxetine, 0, abilify)",0.002133,"[((tamsulosin, paroxetine), (0, abilify), 0.72..."
5,"(alprazolam, 0, acetaminophen, hydrocodone)",0.002533,"[((acetaminophen, hydrocodone), (alprazolam, 0..."
6,"(atorvastatin, amphetamine salt combo xr, abil...",0.002666,"[((atorvastatin, amphetamine salt combo xr, gl..."
7,"(fenofibrate, carvedilol, amphetamine salt com...",0.002933,"[((fenofibrate, carvedilol, amphetamine salt c..."
8,"(atorvastatin, metformin, abilify, metoprolol)",0.003066,"[((atorvastatin, metformin, metoprolol), (abil..."
9,"(fenofibrate, carvedilol, doxycycline hyclate,...",0.002133,"[((fenofibrate, carvedilol, doxycycline hyclat..."


In [26]:
# Prepared data set copy:
medical_results.to_csv("medical_market_results.csv", index=False)

**C4. Top Three Rules**

**1. If patients had 'carvedilol', then 'valsartan'**

Support: 0.0023996800426609784

Confidence: 0.72

Lift: 3.0205369127516777

The support value for the rule is 0.0023996800426609784, which  is calculated by dividing the number of transactions containing 'carvedilol', 'valsartan' by the total number of transactions.

The confidence level is 0.72, which shows that out of all the transactions containing 'carvedilol', 72% include 'valsartan' too. Our confidence in this rule demonstrates that out of all patients who got the 'carvedilol', 72% also have the 'valsartan'.

The lift of 3.0205369127516777 tells us that the customer who buys 'carvedilol' will be 3.0205369127516777 times more likely also buy 'valsartan' and compared to the default likelihood sale of 'valsartan'.

**2. If patients had 'tamsulosin', then 'paroxetine'**

Support: 0.0021330489268097585

Confidence: 0.7272727272727272

Lift: 3.051047386617856

The support value for the rule is 0.0021330489268097585, which is calculated by dividing the number of transactions containing 'tamsulosin', 'paroxetine' by the total number of transactions.

The confidence level is 0.73, which shows that out of all the transactions containing 'tamsulosin', 73% include 'paroxetine' too. Our confidence in this rule demonstrates that out of all patients who got the 'tamsulosin', 73% also have the 'paroxetine'.

The lift of 3.051047386617856 tells us that the customer who buys 'paroxetine' will be 3.051047386617856 times more likely to buy 'paroxetine' compared to the default likelihood sale of 'paroxetine'.

**3. If pateints had 'acetaminophen', then 'alprazolam'**

Support: 0.0025329956005865884

Confidence: 0.95

Lift: 11.976386554621849

The support value for the rule is 0.0025329956005865884, which is calculated by dividing the number of transactions containing 'acetaminophen', 'alprazolam' by the total number of transactions.

The confidence level is 0.95, which shows that out of all the transactions containing 'acetaminophen', 95% include 'alprazolam' too. Our confidence in this rule demonstrates that out of all patients who got the 'acetaminophen', 95% also have the 'alprazolam'.

The lift of 11.976386554621849 tells us that the customer who buys 'acetaminophen' will be 11.976386554621849 times more likely also buy 'alprazolam' and compared to the default likelihood sale of 'alprazolam'.

# Part 4: Summary and Implications

**D1. Significance of support, lift, and confidence**

The three methods of MBA measuring:
1. Support: give the number of transactions containing all the cart items.
2. Confidence is the conditional probability predicting the statement's right-hand side.
3. Lift is the rule's confidence ratio, which is calculated by comparing the complete management to the right-hand side and giving us how many times the consequent is to be purchased once the originator has been purchased.

The results of the analysis are compelling. For example, all the rules have a confidence level of greater than 70% and can be greater than 80% (the third top rule is 95%) which would be an optimal value for significance. 

First, the lowest confidence is in rule number 1 at 72%, and the highest confidence is in rule number 3 at 95%. Second, the support for all top 3 rules is less than 0.3%, so it is not compelling. Finally, the lift ratio gives us how many more times the consequent is purchased once the antecedent has been purchased. The highest lift is 11.976386554621849 provides us with the relationship between purchasing/using 'acetaminophen' and 'alprazolam'. The 'lift' of 11.976386554621849 tells us that 'alprazolam' is 11.976386554621849 times more likely to be got by the patients who get both 'acetaminophen' and compared to the default likelihood sale of 'alprazolam'.

**D2. The practical significance of findings**

From the analysis above, we found the results contain the value for significance as the confidence of rule 3 is 0.95, and the lift is 11.976386554621849. We have a great chance to predict the most critical variables combined with prescriptions that might reduce the number of patients at high risk of readmission. The top 3 associated prescriptions drugs can help us better understand our patients and identify patterns unique to the readmitted patient. For example, if one of the patients gets 'acetaminophen', 95% will get 'alprazolam'. And the 'alprazolam' is 11.976386554621849 times more likely to be obtained by the patients who get both 'acetaminophen' compared to the default likelihood sale of 'alprazolam'.

The limitation of this market basket analysis is that we can not see many candidates with frequent itemsets. It also requires high computation power and needs to scan the entire database. Therefore, when it comes to prescription drugs, the hospital should consider the risk of addiction. Therefore, further analysis is recommended, and we need to collect more data before suggesting any significance. 

**D3. Actions**

Based on the results and practical significance, I do not recommend the hospital move forward with the original plan of offering free samples or marketing prescription drugs together. "The lifetime prevalence of using any addictive substance was 73.3%, and it has been decreasing during the past few years. Although the lifetime prevalence of PDM was 19.2%, it has been increasing. Males and whites were more likely to use drugs and engage in multiple addictions. Market Basket Analysis identified common drug use initiation sequences that involved 11 drugs." (Jayawardence, 2014)
The hospital should consider the risk of addiction when determining which variables are the most important in combination with prescription that might reduce the number of patients at high risk of readmission. The hospital should also offer prevention programs that address multiple addictions and substitute addictions. No action is necessary at this time. Further analysis is recommended. We need to collect more data before suggesting any significance and recommend actual actions.


# Part 5: Demonstration

**F. Panopto recording**

Panopto video recording of warning and error free code execution to perform data minning tasks is uploaded to



**G. Sources for third-party code**

[1] IntelliPaat. (2021, December 24). Introduction to Apriori Algorithm in Python. https://intellipaat.com/blog/data-science-apriori-algorithm/

[2] Kadlaskar Amruta. (2021, Octorber 2). A Comprehensive Guide on Market Basket Analysis. https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/

**H. Acknowledge source**

[3] Tech Target Contributor. (2019, May). Market Basket Analysis. https://searchcustomerexperience.techtarget.com/definition/market-basket-analysis

[4] Hua. (n.d)Market Basket Analysis. https://sarahtianhua.wordpress.com/portfolio/market-basket-analysis/

[5] Wasantha Parakrama Jayawardene. (2014, March). Multiple and substitute addictions involving prescription drugs misuse among 12th graders: gateway theory revisited with Market Basket Analysis. https://pubmed.ncbi.nlm.nih.gov/24440894/

