### What You're Aiming For

- In this checkpoint, we are going to work on 'Customer purchases history' dataset provided by Kaggle

- Dataset description: This dataset contains historical records for customer purchases from a supermarket X. The objective is to find the association rules to help supermarket owners find new marketing plans to improve their sales.

#### Instructions

- toy_dataset = [['Skirt', 'Sneakers', 'Scarf', 'Pants', 'Hat'],

        ['Sunglasses', 'Skirt', 'Sneakers', 'Pants', 'Hat'],

        ['Dress', 'Sandals', 'Scarf', 'Pants', 'Heels'],

        ['Dress', 'Necklace', 'Earrings', 'Scarf', 'Hat', 'Heels', 'Hat'],

      ['Earrings', 'Skirt', 'Skirt', 'Scarf', 'Shirt', 'Pants']]

- Run the apriori algorithm on the provided toy_dataset. Interpret the results.
- Try to explore the checkpoint dataset using Pandas and Plotly.
- Run the apriori algorithm on checkpoint dataset. Interpret the results and suggest a clear business plan to the supermarket owners based on your findings.

### A. Using the transaction sample dataset to run an Apriori Algorithm
- toy_dataset = [['Skirt', 'Sneakers', 'Scarf', 'Pants', 'Hat'],

  ['Sunglasses', 'Skirt', 'Sneakers', 'Pants', 'Hat'],

  ['Dress', 'Sandals', 'Scarf', 'Pants', 'Heels'],

  ['Dress', 'Necklace', 'Earrings', 'Scarf', 'Hat', 'Heels', 'Hat'],

['Earrings', 'Skirt', 'Skirt', 'Scarf', 'Shirt', 'Pants']]


In [None]:
# Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
toy_dataset =[['Skirt', 'Sneakers', 'Scarf', 'Pants', 'Hat'],

  ['Sunglasses', 'Skirt', 'Sneakers', 'Pants', 'Hat'],

  ['Dress', 'Sandals', 'Scarf', 'Pants', 'Heels'],

  ['Dress', 'Necklace', 'Earrings', 'Scarf', 'Hat', 'Heels', 'Hat'],

['Earrings', 'Skirt', 'Skirt', 'Scarf', 'Shirt', 'Pants']]

data = pd.DataFrame(toy_dataset)
data

### Step 1: Data Preprocessing
We need to transform the dataset into a one-hot encoded format (True/False for each item)

In [None]:
# Use TransactionEncoder to convert the list of transactions into a one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit(toy_dataset).transform(toy_dataset)

data = pd.DataFrame(te_data)
data
# Display the output
te_data

In [None]:
# Convert to DataFrame
onehot_df = pd.DataFrame(te_data, columns=te.columns_)

onehot_df

### Step 2: Find Frequent Itemsets using Apriori Algorithm
Now, we apply the Apriori algorithm to find frequent itemsets.

In [None]:
# We'll set a `min_support` value to control the threshold of how frequent an itemset must be to be considered.

frequent_itemsets = apriori(onehot_df, min_support=0.3, use_colnames=True)
# 'min_support=0.2' means that we are only interested in itemsets that appear in at least 20% of the transactions.

frequent_itemsets

### Step 3: Generate Association Rules
Once we have the frequent itemsets, we can generate **association rules**. We'll calculate the confidence and lift for each rule.

In [None]:
# Generate Association Rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# We use 'lift' as the metric to determine how much more likely two items are to be purchased together compared to random chance.
# We use 'min_threshold' to filter the rules to only include those with a lift greater than 1.

rules

### Step 4: Filter and Sort Rules

In [None]:
# We can filter the rules to focus on those with high confidence or lift values.
strong_rules = rules[(rules['lift'] > 1.2) & (rules['confidence'] > 0.5)]

# Sort the rules by lift, in descending order
strong_rules = strong_rules.sort_values(by='confidence', ascending=False)

# Display the filtered rules
strong_rules

### Step 5: Visualizing Results

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Visualize the top 10 rules based on lift
plt.figure(figsize=(10, 6))
sns.barplot(x=strong_rules['lift'].head(10), y=strong_rules['antecedents'].head(10).astype(str))
plt.title('Top 10 Association Rules by Lift')
plt.xlabel('Lift')
plt.ylabel('Itemset')
plt.show()

### B. Using the below csv file to run an Apriori Algrorithm
- Market_Basket_Optimisation.csv

In [None]:
# Import necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
data = pd.read_csv("Market_Basket_Optimisation.csv")
data

In [None]:
data.info()

In [None]:
data.isnull().sum()

In [None]:
data.describe(include= "all")

In [None]:
all_products = data.stack().reset_index(drop=True)


In [None]:
all_products = all_products.dropna()
all_products

In [None]:
# Step 1: Convert each row into a list of items, ignoring NaN values
transactions = data.apply(lambda row: row.dropna().tolist(), axis=1).tolist()
transactions

In [None]:

# Step 2: Apply TransactionEncoder and transform the transactions into a one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 3: Apply the Apriori algorithm
# Use a minimum support of 5% (0.05)
frequent_itemsets = apriori(df, min_support=0.05, use_colnames=True)

# Step 4: Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Step 5: Display the results
frequent_itemsets
rules

### Interpretation of Results Frequent Itemsets:

Top Products: The analysis reveals that products like burgers, cake, chicken, chocolate, cookies, and eggs have the highest support values. This indicates that these items are frequently purchased by customers. Mineral Water: Mineral water appears to be a staple item, as it has the highest support (23.83%) and is often purchased alongside various other products. Association Rules:

Chocolate and Mineral Water: The association between chocolate and mineral water is strong, with a lift of 1.35, suggesting that customers who buy chocolate are likely to purchase mineral water as well. Eggs and Mineral Water: Eggs also have a significant association with mineral water, which could indicate that customers prefer to buy these items together. Spaghetti and Mineral Water: There is a similar trend with spaghetti, which indicates that meal-related items are often bought with mineral water. Customer Preferences:

There is a clear indication that products with higher purchase frequency (like eggs, chocolate, and mineral water) drive customer choices and can be leveraged in marketing strategies. Business Plan Suggestions Product Placement Strategy:

Cross-Merchandising: Position chocolate, eggs, and spaghetti near mineral water displays. Use attractive signage that suggests pairing these items, promoting convenience for customers who often buy them together. End-Cap Displays: Create themed displays at the ends of aisles featuring popular item combinations, like chocolate and mineral water, to draw attention and boost impulse buying. Promotions and Discounts:

Bundle Offers: Develop promotional bundles that combine frequently purchased items (e.g., eggs, mineral water, and chocolate).
Offering a discount for purchasing these items together can encourage larger basket sizes. Loyalty Program: Implement a loyalty program that rewards customers for purchasing frequently bought items. For instance, buying chocolate and mineral water together could earn points for discounts on future purchases. Product Assortment:

Expand Product Range: Based on the data, consider expanding the variety of products in high-support categories like snacks (cookies, chocolate) and beverages (mineral water) to cater to diverse customer preferences. Healthier Alternatives: Introduce healthy options or organic alternatives for popular items like chocolate and snacks to cater to health-conscious consumers, aligning with current trends. Marketing Campaigns:

Targeted Marketing: Utilize data insights to create targeted marketing campaigns that emphasize the benefits of pairing products (e.g., healthy meals with mineral water). Seasonal Promotions: Align promotions with seasonal trends, focusing on high-demand items (like chocolate during holidays) to maximize sales during peak times. In-Store Events:

Sampling Events: Host tasting events or cooking demonstrations featuring recipes that incorporate popular items like eggs, spaghetti, and chocolate, promoting their usage together. Customer Engagement: Create engaging content on social media that highlights meal ideas or snack pairings, directing customers to in-store displays for those products. Feedback Mechanism:

Customer Surveys: Implement regular customer feedback surveys to refine product offerings and discover new customer preferences or shopping habits, ensuring that inventory aligns with demand.