## Market Basket Analysis
    + Apriori Algorithm
    + Association Rule mining

### Prerequisites
- Revise the following concepts
    - Apriori Algorithm
        - Suport
        - Confidence
        - Lift
- Install the following software
    - pandas
    - apyori 

### Marking scheme
1. Problem 1: Preprocessing - 2 marks
2. Problem 2: Item set detection and analaysis - 3 marks
3. Problem 3: Association rule minning - 5 marks

### Context
Welcome to "FreshEats Superstore", a budding supermarket. As a data analyst working with "FreashEats" , your mission is to uncover meaningful patterns within customer transactions to enhance their shopping experience and help us compete with our competitor "Not-So-FreshEats".

### About the dataset
- "FreshEats_transactions.csv"
- Each record of the dataset represents a transaction made by a customer at "FreashEats Superstore".
- The transaction contains the items bought in that transaction

In [1]:
# Install apyori package
# Use % or ! based on the environment to install
%pip install apyori 

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.comNote: you may need to restart the kernel to use updated packages.

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=47a56c9eba982ed332ff79fb9e02bb3a474ae2286c983ab55d6c7fd251feda95
  Stored in directory: C:\Users\sujan\AppData\Local\Temp\pip-ephem-wheel-cache-2oj2c1dm\wheels\77\3d\a6\d317a6fb32be58a602b1e8c6b5d6f31f79322da554cad2a5ea
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


apyori documentation: https://pypi.org/project/apyori/ (Refer API Usage)

In [2]:
import pandas as pd
from apyori import apriori

### Problem 1 - Preprocessing (2 marks)
**Load the transactions data** from the provided csv file. **Transform the data** to a suitable format (Hint: **List of lists**[internal list contains the items of the transaction]). **Make sure to clean the data** (Hint: NA values). 

In [3]:
# Loading dataframe
df = pd.read_csv("74.FreshEats_transactions.csv", header=None)

In [18]:
transactions = []
for row in df.iterrows():
    items = [item for item in row[1] if not pd.isna(item)]
    if items:
        transactions.append(items)
print(transactions[:5])  

[[1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0]]


### Problem 2 - Item sets (3 marks: 1 + 2)
1. Print out the frequent item sets along with their support values also display the count of item sets.(**min_support=0.045**)
2. "FreshEats" wants to replenish its stocks, help find the top 5 most popular(higher buying frequency) items/item_sets to replenish. Explain and justify the process followed to come to the conclusion.

In [19]:
!pip install mlxtend

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [20]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
all_items = set(item for transaction in transactions for item in transaction)
all_items = list(all_items)
transaction_dicts = []

for transaction in transactions:
    transaction_dict = {item: 1 for item in transaction}
    transaction_dicts.append(transaction_dict)
df = pd.DataFrame(transaction_dicts)
df = df.fillna(0)

frequent_item_sets = apriori(df, min_support=0.045, use_colnames=True)

# Count the number of frequent item sets
num_frequent_item_sets = frequent_item_sets.shape[0]
sorted_frequent_item_sets = frequent_item_sets.sort_values(by='support', ascending=False)

top_5_popular_item_sets = sorted_frequent_item_sets.head(5)

print("Frequent Item Sets:")
print(sorted_frequent_item_sets)

print("Count of Frequent Item Sets:", num_frequent_item_sets)

print("Top 5 Most Popular Items/Item Sets:")
print(top_5_popular_item_sets)


Frequent Item Sets:
   support    itemsets
0      1.0       (1.0)
1      1.0       (0.0)
2      1.0  (0.0, 1.0)
Count of Frequent Item Sets: 3
Top 5 Most Popular Items/Item Sets:
   support    itemsets
0      1.0       (1.0)
1      1.0       (0.0)
2      1.0  (0.0, 1.0)




### Problem 3 - Association Rules (5 marks: 1 + 2 + 2)
+ Items on the left side of the association rule are called : **Antecedent items** and the right side of the association rule are called : **Consequent Items**.
1. Print out the association rules along with their confidence and lift. (Analyse the output structure of apriori())
    + **(min_support=0.01, min_confidence = 0.045, min_lift=1.5, min_length=2)**
2. As the Holiday season is approaching, "FreshEats" is considering to provide discounts and offers on some of their products. Help them identify the top 5 popular **pairs/sets** of items/item_sets bought, considering probability of consequent item being purchased when antecedent item is bought.
3. Also help them identify the top 5 popular **pairs/sets** of items/item_sets bought together, considering the popularity of consequent and antecedent items.
+ (Consequent and antecedent items together form the **pairs/sets** specified in the question)


In [21]:
from mlxtend.frequent_patterns import association_rules
association_rules_df = association_rules(frequent_item_sets, metric="confidence", min_threshold=0.045)

sorted_association_rules = association_rules_df.sort_values(by='confidence', ascending=False)

print("Association Rules with Confidence and Lift:")
print(sorted_association_rules)

top_5_popular_pairs = sorted_association_rules.head(5)

print("Top 5 Popular Pairs/Sets of Items/Item Sets (Confidence-based):")
print(top_5_popular_pairs)

sorted_association_rules['item_support'] = sorted_association_rules['consequents'].apply(lambda x: frequent_item_sets.loc[frequent_item_sets['itemsets'] == x].iloc[0]['support'])

sorted_association_rules['antecedent_support'] = sorted_association_rules['antecedents'].apply(lambda x: frequent_item_sets.loc[frequent_item_sets['itemsets'] == x].iloc[0]['support'])

sorted_association_rules['popularity'] = sorted_association_rules['item_support'] * sorted_association_rules['antecedent_support']

sorted_association_rules = sorted_association_rules.sort_values(by='popularity', ascending=False)

top_5_popular_pairs_popularity = sorted_association_rules.head(5)

print("Top 5 Popular Pairs/Sets of Items/Item Sets (Popularity-based):")
print(top_5_popular_pairs_popularity)


Association Rules with Confidence and Lift:
  antecedents consequents  antecedent support  consequent support  support  \
0       (0.0)       (1.0)                 1.0                 1.0      1.0   
1       (1.0)       (0.0)                 1.0                 1.0      1.0   

   confidence  lift  leverage  conviction  zhangs_metric  
0         1.0   1.0       0.0         inf            0.0  
1         1.0   1.0       0.0         inf            0.0  
Top 5 Popular Pairs/Sets of Items/Item Sets (Confidence-based):
  antecedents consequents  antecedent support  consequent support  support  \
0       (0.0)       (1.0)                 1.0                 1.0      1.0   
1       (1.0)       (0.0)                 1.0                 1.0      1.0   

   confidence  lift  leverage  conviction  zhangs_metric  
0         1.0   1.0       0.0         inf            0.0  
1         1.0   1.0       0.0         inf            0.0  
Top 5 Popular Pairs/Sets of Items/Item Sets (Popularity-based):
  an