<style type="text/css">
    h1,h2,h3,h4{
        margin-bottom: 0px;
    }
    p,ul{
        margin-bottom: 20px;
    }
    h2{
        margin-left: 10px;
        margin-top: 15px;
    }
    h3{
        margin-left: 20px;
        margin-top: 10px;
    }
    h4{
        margin-left: 30px;
        margin-top: 10px;
    }
    p{
        margin-left: 40px;
        margin-top: 7px;
    }
    li{
        margin-left: 40px;
        margin-top: 5px;
    }
</style>

<h1>Data Mining - Association Rules and Lift Analysis</h1>
<h2>Part I: Research Question</h2>
<h3>A. Purpose of Report</h3>
<h4>1. Proposal of Question</h4>
<p>&nbsp; &nbsp; Using a market basket analysis on customer transactions, I will be answering the
following questions: Which items are frequently bought together in the same transaction?
And how can we use this knowledge for real world application?</p>
<h4>2. Defined Goal</h4>
<p>&nbsp; &nbsp; The aim of this analysis is to discover items that are associated with each other
through purchases. With this knowledge, marketing departments can improve cross
merchandising strategies, better predict necessary stock of certain items, and provide
convenient promotional deals.</p>

<h2>Part II: Market Basket Justification</h2>
<h3>B. Reason for Analysis</h3>
<h4>1. Explanation of Market Basket</h4>
<p>&nbsp; &nbsp; For the market basket the Apriori algorithm will be used to analyze the data set. This
algorithm begins by finding the frequency of every individual item. Items that meet the
minimum number of occurrences required (called the support threshold) are then
combined with other items and the frequency of this item combination is counted. This
process of combining items and counting occurrences continues until no more
combinations meet the support threshold. Combinations that do meet the threshold will
have support, confidence and lift rules calculated. These are values that describe how
these products purchased are related and the strength of said relation. The outcomes of
performing the Apriori algorithm on this data set will reveal combinations of items that are
frequently bought together (support), the likelihood of purchasing the entire combination
having purchased some of its items (confidence), and how closely associated these items
are (lift) (GeeksforGeeks, 2025).</p>
<h4>2. Transaction Example</h4>
<p>&nbsp; &nbsp; Customers can purchase up to 20 items in one transaction. Looking at the prepared
csv file we can see multiple examples of transactions. Here are a few of examples:
<ul>
<li>2nd transaction:<ul>
<li>Apple Lightning to Digital AV Adapter</li>
<li>TP-Link AC1750 Smart WiFi Router</li>
<li>Apple Pencil</li>
</ul></li>
<li>3rd transaction:<ul>
<li>UNEN Mfi Certified 5-pack Lightning Cable</li>
</ul></li>
<li>4th transaction:<ul>
<li>Cat8 Ethernet Cable</li>
<li>HP 65 ink</li>
</ul></li>
</ul></p>
<h4>3. Market Basket Assumption</h4>
<p>&nbsp; &nbsp; To find relationships between transactional items, it is assumed that the data used
in the analysis represents consistent customer behavior with “no significant
changes…during the dataset period” (Deniran, 2023). External factors such as natural
disasters or economic hardships can drastically change the results of the data and will fail
at demonstrating predictable patterns in purchasing. In another assumption along the
same vein, the data used during the analysis must be an accurate sample of customer
purchases. Using too small or specific data sets as input results in biased or skewed
scores and calculations. To draw any meaningful conclusion about spending habits, the
data must demonstrate a typical purchasing period with typical customers.</p>

<h2>Part III: Data Preparation and Analysis</h2>
<h3>C. Preparation and Market Basket</h3>
<h4>1. Transforming the Data Set</h4>
<p>&nbsp; &nbsp; The data was transformed for the analysis by removing blank rows then reducing
every column into one column. The cleaned dataset is the ‘transactions_prepared.csv’ file.

&nbsp; &nbsp; Removing blank rows:
</p>

In [1]:
import pandas as pd
from apyori import apriori
 

class text:
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

# specifying the n/a values will allow us to remove empty rows
df = pd.read_csv('teleco_market_basket.csv', keep_default_na = False, 
                 na_values = [' ', '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN',
                             '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'])

print(text.UNDERLINE + "Clean & Preparing Data" + text.END)
# Missing values [In-Text Citation: (NumFOCUS, Inc., n.d.)]
print(f"Purchase Shape Before: {df.shape}")
df.dropna(how='all', inplace=True, ignore_index=True)
print(f"Purchase Shape After: {df.shape}")
print(df.head(3))

[4mClean & Preparing Data[0m
Purchase Shape Before: (15002, 20)
Purchase Shape After: (7501, 20)
                                      Item01  \
0               Logitech M510 Wireless mouse   
1      Apple Lightning to Digital AV Adapter   
2  UNEN Mfi Certified 5-pack Lightning Cable   

                             Item02        Item03                      Item04  \
0                         HP 63 Ink     HP 65 ink  nonda USB C to USB Adapter   
1  TP-Link AC1750 Smart WiFi Router  Apple Pencil                         NaN   
2                               NaN           NaN                         NaN   

                      Item05        Item06                        Item07  \
0  10ft iPHone Charger Cable  HP 902XL ink  Creative Pebble 2.0 Speakers   
1                        NaN           NaN                           NaN   
2                        NaN           NaN                           NaN   

                                Item08                         Item09  \
0  Cl

<style type="text/css">
    h1,h2,h3,h4{
        margin-bottom: 0px;
    }
    p,ul{
        margin-bottom: 20px;
    }
    h2{
        margin-left: 10px;
        margin-top: 15px;
    }
    h3{
        margin-left: 20px;
        margin-top: 10px;
    }
    h4{
        margin-left: 30px;
        margin-top: 10px;
    }
    p{
        margin-left: 40px;
        margin-top: 7px;
    }
    li{
        margin-left: 40px;
        margin-top: 5px;
    }
</style>
<p>&nbsp; &nbsp; Converting the data frame into one column (essentially a list of lists):</p>

In [3]:
# turn purchases into list of list, the inner list will be each transaction
transactions_list = []
column_count = df.count(1)
for i in range (0, df.shape[0]):
   single_transaction = []
   for j in range (0, column_count[i]):
    single_transaction.append(df.values[i,j])
   transactions_list.append(single_transaction)

transactions_df = pd.DataFrame({'Transactions': transactions_list})
print("\nData in one column:")
print(transactions_df)

# save to file
transactions_df.to_csv('transactions_prepared.csv', index=False)
print("\nTransformations complete. File saved as 'transactions_prepared.csv'")


Data in one column:
                                           Transactions
0     [Logitech M510 Wireless mouse, HP 63 Ink, HP 6...
1     [Apple Lightning to Digital AV Adapter, TP-Lin...
2           [UNEN Mfi Certified 5-pack Lightning Cable]
3                      [Cat8 Ethernet Cable, HP 65 ink]
4     [Dust-Off Compressed Gas 2 pack, Screen Mom Sc...
...                                                 ...
7496  [SanDisk 32GB Ultra SDHC card, Vsco 70 pack st...
7497  [Apple Lightning to Digital AV Adapter, Nylon ...
7498                   [Falcon Dust Off Compressed Gas]
7499           [HP 63XL Ink, Apple USB-C Charger cable]
7500  [Apple Pencil, SanDisk Ultra 128GB card, RUNMU...

[7501 rows x 1 columns]

Transformations complete. File saved as 'transactions_prepared.csv'


<style type="text/css">
    h1,h2,h3,h4{
        margin-bottom: 0px;
    }
    p,ul{
        margin-bottom: 20px;
    }
    h2{
        margin-left: 10px;
        margin-top: 15px;
    }
    h3{
        margin-left: 20px;
        margin-top: 10px;
    }
    h4{
        margin-left: 30px;
        margin-top: 10px;
    }
    p{
        margin-left: 40px;
        margin-top: 7px;
    }
    li{
        margin-left: 40px;
        margin-top: 5px;
    }
</style>
<h4>2. Code Execution</h4>

In [4]:
# region Apriori Algorithm [In-Text Citation: (Amruta, 2024)]
print(text.UNDERLINE + "\nApriori Algorithm" + text.END)
association_rules = apriori(transactions_df['Transactions'], min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

# organize results into a dataframe
items = []
antecedents = []
consequents = []
support = []
confidence = []
lift = []
for i in range(0, len(association_results)):
   #  appends the products as an array
    items.append([item for item  in  association_results[i].items])
    antecedents.append([item for item  in  association_results[i].ordered_statistics[0].items_base])
    consequents.append([item for item  in  association_results[i].ordered_statistics[0].items_add])
    support.append(association_results[i].support)
    confidence.append(association_results[i][2][0].confidence)
    lift.append(association_results[i][2][0].lift)

df_results = pd.DataFrame({
      "Items": items,
      "Antecedents": antecedents,
      "Consequents": consequents,
      "Support": support,
      "Confidence" : confidence,
      "Lift" : lift,
   })

df_results.sort_values(by='Lift', ascending=False, inplace=True,)

print(df_results.head(3))
df_results.to_csv('apriori.csv', index=False)

[4m
Apriori Algorithm[0m
                                               Items  \
2  [Falcon Dust Off Compressed Gas, Apple Lightni...   
4                      [iPhone 11 case, HP 63XL Ink]   
5     [iPhone 11 case, Logitech M510 Wireless mouse]   

                      Antecedents                       Consequents   Support  \
2  [Apple Lightning to USB cable]  [Falcon Dust Off Compressed Gas]  0.004533   
4                [iPhone 11 case]                     [HP 63XL Ink]  0.005866   
5                [iPhone 11 case]    [Logitech M510 Wireless mouse]  0.005066   

   Confidence      Lift  
2    0.290598  4.843951  
4    0.372881  4.700812  
5    0.322034  4.506672  


<style type="text/css">
    h1,h2,h3,h4{
        margin-bottom: 0px;
    }
    p,ul{
        margin-bottom: 20px;
    }
    h2{
        margin-left: 10px;
        margin-top: 15px;
    }
    h3{
        margin-left: 20px;
        margin-top: 10px;
    }
    h4{
        margin-left: 30px;
        margin-top: 10px;
    }
    p{
        margin-left: 40px;
        margin-top: 7px;
    }
    li{
        margin-left: 40px;
        margin-top: 5px;
    }
</style>
<h4>3. Association Rules Table</h4>
<p>&nbsp; &nbsp; As some items have lengthy names, they are not as easily displayed in the picture
below so I have provided the csv file named ‘apriori.csv’ for better inspection.</p>
<img src="Images/apriori csv.png" alt="Visual of Association Rules Table" />

<style type="text/css">
    h1,h2,h3,h4{
        margin-bottom: 0px;
    }
    p,ul{
        margin-bottom: 20px;
    }
    h2{
        margin-left: 10px;
        margin-top: 15px;
    }
    h3{
        margin-left: 20px;
        margin-top: 10px;
    }
    h4{
        margin-left: 30px;
        margin-top: 10px;
    }
    p{
        margin-left: 40px;
        margin-top: 7px;
    }
    li{
        margin-left: 40px;
        margin-top: 5px;
    }
</style>
<h4>4. Top Three Rules</h4>
<p>&nbsp; &nbsp; The top three relevant rules generated by the Apriori algorithm are based on the
highest lift scores of the table. This score describes “how much more likely two items are to
be purchased together compared to being purchased independently” and thus show the
strongest relationship in item purchases (GeeksforGeeks, 2025). Here are the top three
rules:
<ol>
<li>Apple Lightning to USB cable → Falcon Dust Off Compressed Gas<ul>
<li>Support: 0.00453272896947073</li>
<li>Confidence: 0.29059829059829</li>
<li>Lift: 4.84395061728395</li>
</ul></li>
<li>iPhone 11 case → HP 63XL Ink<ul>
<li>Support: 0.00586588454872683</li>
<li>Confidence: 0.372881355932203</li>
<li>Lift: 4.70081185016379</li>
</ul></li>
<li>iPhone 11 case → Logitech M510 Wireless mouse<ul>
<li>Support: 0.00506599120117317</li>
<li>Confidence: 0.322033898305084</li>
<li>Lift: 4.50667214773589</li>
</ul></li>
</ol>
</p>

<h2>Part IV: Data Summary and Implications</h2>
<h3>D. Summary</h3>
<h4>1. Significance of Support, Lift, and Confidence Summary</h4>
<p>&nbsp; &nbsp; The support value is the number of transactions for the item in question compared
to all transactions. Using the top rule from C4 as an example, the purchase of ‘Apple
Lightning to USB cable’ and ‘Falcon Dust Off Compressed Gas’ makes up 0.45% of all
transactions.

&nbsp; &nbsp; Confidence determines the probability of purchasing the consequent having
purchased the antecedent. It is calculated by dividing the support value of the item
combination (antecedent and consequent together) by the support value of the
consequent. When a customer purchases ‘Apple Lightning to USB cable’ there is a 29.05%
confidence chance that ‘Falcon Dust Off Compressed Gas’ will also be bought in the same
transaction.

&nbsp; &nbsp; Lift is a value that represents how strong the relationship of the consequent and the
antecedent are and how likely they will be bought together. This metric is evaluated as
numbers greater than one indicating related items and values that are one or less than one
represent no connection between the products. Items ‘Apple Lightning to USB cable’ and
‘Falcon Dust Off Compressed Gas’ have a lift of around 4.844 meaning they are more likely
purchased together than separately.</p>
<h4>2. Practical Significance of Findings</h4>
<p>&nbsp; &nbsp; It is clear that the results of the analysis demonstrate statistical significance since
the lift values are greater than one. Likewise, there is a practical significance to the
association rules as well. The sum of all support values of the apriori is 0.151 meaning that
these product relationships account for 15.1% of item combinations purchased.
Performing a market basket analysis can enable more precision in stock availability and
improve marketing strategies so these rule discoveries will significantly refine the business.</p>
<h4>3. Course of Action</h4>
<p>&nbsp; &nbsp; Referring back to the questions at the beginning of the analysis, there are 24
common item transactions deduced from the market basket technique and this
information is realistically useful. The business can provide special deals when these
products are bought together to promote the combination and potentially attract
customers who don’t typically buy these products. Given the popularity of buying an item
such as ‘Falcon Dust Off Compressed Gas' when 'Apple Lightning to USB cable' is
purchased, the customer would have a greater experience with the business when these
products are located near each other in a physical store or recommended together online.
These practices could make the customer feel more positivity and loyalty towards the
company knowing that the business is interested in their needs.</p>
<h2>Part V: Attachments</h2>
<h3>F. Sources for Third Party Code</h3>
<p>Amruta. (2024, December 1). Market Basket Analysis: A Comprehensive Guide for
Businesses. Analytics Vidhya. <a href="https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/">https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/</a>

NumFOCUS, Inc. (n.d.). pandas.DataFrame.dropna. Pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html</p>
<h3>G. Sources</h3>
<p>Deniran, O. H. (2023, November 27). Boosting Sales with Data: The Power of Market Basket
Analysis in Retail. Medium. <a href="https://medium.com/@chemistry8526/boosting-sales-with-data-the-power-of-market-basket-analysis-in-retail-c79cc10a14df">https://medium.com/@chemistry8526/boosting-sales-with-data-the-power-of-market-basket-analysis-in-retail-c79cc10a14df</a>

GeeksforGeeks. (2025, January 15). Apriori algorithm. GeeksforGeeks.
https://www.geeksforgeeks.org/apriori-algorithm/<?p>