
As a data analyst working with "FreashEats" , our mission is to uncover meaningful patterns within customer transactions to enhance their shopping experience and help us compete with our competitor "Not-So-FreshEats".

### About the dataset
- "FreshEats_transactions.csv"
- Each record of the dataset represents a transaction made by a customer at "FreashEats Superstore".
- The transaction contains the items bought in that transaction

In [1]:
# Install apyori package
# Use % or ! based on the environment to install
%pip install apyori 

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l- done
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l- \ done
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5955 sha256=5472c63cb0b7e92d26b0e1e803f98f0b21c78879e9b040c56578906d079216dd
  Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
Note: you may need to restart the kernel to use updated packages.


apyori documentation: https://pypi.org/project/apyori/ (Refer API Usage)

In [2]:
import pandas as pd
from apyori import apriori

### Preprocessing

In [3]:
# Loading dataframe
df = pd.read_csv("/kaggle/input/fresheats-superstore/FreshEats_transactions.csv", header=None)

In [4]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
 1   1       5747 non-null   object
 2   2       4389 non-null   object
 3   3       3345 non-null   object
 4   4       2529 non-null   object
 5   5       1864 non-null   object
 6   6       1369 non-null   object
 7   7       981 non-null    object
 8   8       654 non-null    object
 9   9       395 non-null    object
 10  10      256 non-null    object
 11  11      154 non-null    object
 12  12      87 non-null     object
 13  13      47 non-null     object
 14  14      25 non-null     object
 15  15      8 non-null      object
 16  16      4 non-null      object
 17  17      4 non-null      object
 18  18      3 non-null      object
 19  19      1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


In [6]:
transactions = []
for index, row in df.iterrows():
    transaction = [item for item in row if pd.notna(item)]
    transactions.append(transaction)

transactions[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

### Item Set creation

In [7]:

results = list(apriori(transactions, min_support=0.045))

print("Frequent Item Sets:")
for itemset in results:
    support = itemset.support
    items = itemset.items
    print(f"Items: {items}, Support: {support}")



Frequent Item Sets:
Items: frozenset({'burgers'}), Support: 0.0871883748833489
Items: frozenset({'cake'}), Support: 0.08105585921877083
Items: frozenset({'champagne'}), Support: 0.04679376083188908
Items: frozenset({'chicken'}), Support: 0.05999200106652446
Items: frozenset({'chocolate'}), Support: 0.1638448206905746
Items: frozenset({'cookies'}), Support: 0.08038928142914278
Items: frozenset({'cooking oil'}), Support: 0.0510598586855086
Items: frozenset({'eggs'}), Support: 0.17970937208372217
Items: frozenset({'escalope'}), Support: 0.0793227569657379
Items: frozenset({'french fries'}), Support: 0.1709105452606319
Items: frozenset({'frozen smoothie'}), Support: 0.06332489001466471
Items: frozenset({'frozen vegetables'}), Support: 0.09532062391681109
Items: frozenset({'grated cheese'}), Support: 0.0523930142647647
Items: frozenset({'green tea'}), Support: 0.13211571790427942
Items: frozenset({'ground beef'}), Support: 0.09825356619117451
Items: frozenset({'herb & pepper'}), Support: 0.

In [8]:

print("\nCount of Item Sets:", len(results))


Count of Item Sets: 32


In [9]:

top_items = sorted(results, key=lambda x: x.support, reverse=True)[:5]
print("\nTop 5 Most Popular Items/Item Sets:")
for itemset in top_items:
    support = itemset.support
    items = itemset.items
    print(f"Items: {items}\t\t Support: {support}")



Top 5 Most Popular Items/Item Sets:
Items: frozenset({'mineral water'})		 Support: 0.23836821757099053
Items: frozenset({'eggs'})		 Support: 0.17970937208372217
Items: frozenset({'spaghetti'})		 Support: 0.17411011865084655
Items: frozenset({'french fries'})		 Support: 0.1709105452606319
Items: frozenset({'chocolate'})		 Support: 0.1638448206905746


Support represents the *proportion of transactions* that contain a *particular item set*. Since the values are arranged in descending order, the **item at the top** is **likely** to be **present in most of the sets**. Therefore, we can conclude that **items with high support values are included in most sets**, indicating their popularity. If FreshEats is considering restocking its supplies, it should **prioritize these items**, as the probability of these items being selected is higher than others.

In [10]:
from apyori import apriori

association_rules = list(apriori(transactions, min_support=0.01, min_confidence=0.045, min_lift=1.5, min_length=2))

# Association rules along with confidence and lift
print("Association Rules:")
for rule in association_rules:
    antecedent = ', '.join(rule.ordered_statistics[0].items_base)
    consequent = ', '.join(rule.ordered_statistics[0].items_add)
    confidence = rule.ordered_statistics[0].confidence
    lift = rule.ordered_statistics[0].lift
    print(f"Antecedent: {antecedent}, Consequent: {consequent}, Confidence: {confidence}, Lift: {lift}")


Association Rules:
Antecedent: burgers, Consequent: cake, Confidence: 0.1314984709480122, Lift: 1.6223191292451309
Antecedent: burgers, Consequent: eggs, Confidence: 0.33027522935779813, Lift: 1.8378297443715457
Antecedent: burgers, Consequent: green tea, Confidence: 0.2003058103975535, Lift: 1.5161391360161947
Antecedent: burgers, Consequent: milk, Confidence: 0.20489296636085627, Lift: 1.581175041844427
Antecedent: burgers, Consequent: turkey, Confidence: 0.12232415902140673, Lift: 1.9564040870353345
Antecedent: cake, Consequent: pancakes, Confidence: 0.14638157894736845, Lift: 1.5399834834280655
Antecedent: cereals, Consequent: mineral water, Confidence: 0.3989637305699482, Lift: 1.6737287153272828
Antecedent: champagne, Consequent: chocolate, Confidence: 0.24786324786324784, Lift: 1.5127926950546964
Antecedent: chicken, Consequent: milk, Confidence: 0.24666666666666667, Lift: 1.9035459533607684
Antecedent: chicken, Consequent: mineral water, Confidence: 0.38000000000000006, Lift: 1

Lets take a example rule to understand the relationship between antecedent and consequent

Example Rule:

* Antecedent: burgers
* Consequent: cake
* Confidence: 0.1315
* Lift: 1.6223

Interpretation: When a customer buys **burgers**, there is a **13.15% probability** that they **will also buy cake**. The **lift of 1.6223** indicates that this **association is stronger** than what would be expected if the items were purchased independently.

The same way all the Rules can be analysed

With the understanding of Confidence and Lift we can say that

**Confidence**: It measures the probability of the **consequent item being purchased given** that the **antecedent item is purchased**. It helps you assess the strength of the association between two items.

**Lift**: It measures how much **more likely the items are to be bought together** compared to if they were purchased independently. It helps you identify interesting and meaningful associations between items.

* To identify the top 5 popular pairs/sets of items/item_sets purchased, considering the probability of the consequent item being purchased when the antecedent item is bought, follow these steps:

**Sort the list** provided by the apriori function based on the **confidence value**. By doing this, we obtain a set of antecedent and consequent items with high confidence values, indicating a **high probability of the consequent item being bought given the antecedent item is purchased.**

In [11]:
# The top 5 popular pairs/sets considering the probability of the consequent item being purchased
top_confidence_rules = sorted(association_rules, key=lambda x: x.ordered_statistics[0].confidence, reverse=True)[:5]
print("\nTop 5 Popular Pairs/Sets (Considering Confidence):\n")
for rule in top_confidence_rules:
    antecedent = ', '.join(rule.ordered_statistics[0].items_base)
    consequent = ', '.join(rule.ordered_statistics[0].items_add)
    confidence = rule.ordered_statistics[0].confidence
    lift = rule.ordered_statistics[0].lift
    print(f"Antecedent: {antecedent} ||  Consequent: {consequent} ||  Confidence: {confidence:.2f}\n")



Top 5 Popular Pairs/Sets (Considering Confidence):

Antecedent: ground beef ||  Consequent: mineral water ||  Confidence: 0.42

Antecedent: cereals ||  Consequent: mineral water ||  Confidence: 0.40

Antecedent: ground beef ||  Consequent: spaghetti ||  Confidence: 0.40

Antecedent: cooking oil ||  Consequent: mineral water ||  Confidence: 0.39

Antecedent: chicken ||  Consequent: mineral water ||  Confidence: 0.38



* To identify the top 5 popular pairs/sets of items/item_sets bought together, considering the popularity of both consequent and antecedent items, follow these steps:

**Sort the list** provided by the apriori function based on the **lift value**. By doing this, we obtain a set of antecedent and consequent items with high lift values, signifying a strong association and a **high likelihood of the antecedent and consequent items being bought together.**

In [12]:
top_lift_rules = sorted(association_rules, key=lambda x: x.ordered_statistics[0].lift, reverse=True)[:5]
print("\nTop 5 Popular Pairs/Sets (Considering Lift):\n")
for rule in top_lift_rules:
    antecedent = ', '.join(rule.ordered_statistics[0].items_base)
    consequent = ', '.join(rule.ordered_statistics[0].items_add)
    confidence = rule.ordered_statistics[0].confidence
    lift = rule.ordered_statistics[0].lift
    print(f"Antecedent: {antecedent}  ||  Consequent: {consequent} ||  Lift: {lift:.2f} \n")


Top 5 Popular Pairs/Sets (Considering Lift):

Antecedent: ground beef  ||  Consequent: herb & pepper ||  Lift: 3.29 

Antecedent: ground beef  ||  Consequent: spaghetti, mineral water ||  Lift: 2.91 

Antecedent: olive oil  ||  Consequent: spaghetti, mineral water ||  Lift: 2.61 

Antecedent: frozen vegetables  ||  Consequent: tomatoes ||  Lift: 2.47 

Antecedent: frozen vegetables  ||  Consequent: shrimp ||  Lift: 2.45 

