## Predictive Analysis Objectives (Association Rule Mining)

Based on the given dataset, the *potential* objectives for predictive analysis using association rule mining are:

1. **Identifying Cross-Shopping Behavior**
2. **Identifying Amenity Preferences Based on Store Choices**
3. **Identifying the Optimal Store Sequences**

In [105]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth
from mlxtend.frequent_patterns import association_rules



# Load the CSV file
file_path = '../data/dino_mall_cleaned.csv'
df = pd.DataFrame(pd.read_csv(file_path))
df.shape
# df.head()

(113, 34)

In [106]:
# get the categories of stores
unique_values = df.iloc[:, 7].unique()
unique_values

array(['Electronics and Gadgets', 'Entertainment', 'Food and Beverages',
       'Apparel and Fashion', 'Services', 'Specialty Stores',
       'Department Stores',
       'Beauty and Personal Care, Health and Wellness',
       'Home Furnishings and Decor'], dtype=object)

In [107]:
# get store columns (index 7 to 16)
amenities_beside_comfort_rooms = df.iloc[:, 29]
amenities_beside_comfort_rooms

0      Apparel and Fashion, Department Stores, Entert...
1                                     Food and Beverages
2            Entertainment, Food and Beverages, Services
3                                     Food and Beverages
4      Apparel and Fashion, Entertainment, Food and B...
                             ...                        
108    Electronics and Gadgets, Entertainment, Food a...
109    Beauty, Personnel, Health and Wellness, Entert...
110               Beauty, Personnel, Health and Wellness
111                              Electronics and Gadgets
112    Apparel and Fashion, Department Stores, Home F...
Name: amenities_beside_comfort_rooms, Length: 113, dtype: object

In [108]:
unique_set = set(unique_values)
amenities_transactions = []

for amenity in amenities_beside_comfort_rooms:
    if isinstance(amenity, str):
        items = [item.strip() for item in amenity.split(',')]
        filtered_items = [it for it in items if it in unique_set]
        amenities_transactions.append(filtered_items)
    else:
        amenities_transactions.append([])

In [109]:

# Use TransactionEncoder to convert to a one-hot encoded format
te = TransactionEncoder()
te_ary = te.fit(amenities_transactions).transform(amenities_transactions)
amenities_df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply apriori algorithm to find frequent itemsets
# You can adjust min_support as needed
frequent_itemsets = apriori(amenities_df, min_support=0.1, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# Display results as formatted DataFrames
# print("Frequent Itemsets:")
# display(frequent_itemsets)

print("\nAssociation Rules:")
display(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].sort_values(by='lift', ascending=False).head(5))


Association Rules:


Unnamed: 0,antecedents,consequents,support,confidence,lift
13,(Services),"(Food and Beverages, Department Stores)",0.106195,0.75,3.259615
4,(Services),(Department Stores),0.115044,0.8125,2.295313
11,"(Food and Beverages, Services)",(Department Stores),0.106195,0.8,2.26
9,"(Food and Beverages, Department Stores)",(Entertainment),0.150442,0.653846,1.944332
3,(Home Furnishings and Decor),(Department Stores),0.106195,0.666667,1.883333
