# 1. Business Problem

In [3]:
#1.1. What is the business objective?

#The primary objective is to help Kitabi Duniya, a heritage bookstore, 
#regain its lost customer base and increase footfall by utilizing data insights from association rules. 
#The goal is to identify buying patterns and itemsets that customers frequently purchase together to 
#optimize product placement, promotions, or recommend bundles that could increase sales.



In [4]:
#1.2. Are there any constraints?
#Possible constraints may include:

#1.Limited historical data on customer transactions.
#2.Budget constraints for implementing suggested strategies.
#3.Potential competition with online retailers.

# 2. Data Dictionary

In [5]:
#This dataset contains 11 binary categorical features, representing whether or not a customer bought 
#specific types of books.

#These features are classified as discrete categorical data since they can only take two distinct values (0 or 1).



# 3. Data Pre-processing

In [6]:
#3.1. Data Cleaning & Feature Engineering
#The dataset is clean with no missing values. 
#The data is already binary, which suits the requirements for association rule mining using the Apriori algorithm. 

Apriori helps identify frequent combinations of items.
Association rules provide insights into the relationships between those items.

# 4.Model Building

In [7]:
import pandas as pd  # Used for data manipulation and analysis
from mlxtend.frequent_patterns import apriori, association_rules 

In [8]:
# Step 4.1: Load the dataset
file_path = 'book.csv'  
df = pd.read_csv(file_path)

In [9]:
# Step 2: Convert integer columns (0 and 1) to boolean (False and True)
# Since the dataset is already in the form of 0's and 1's, we can convert it to boolean.
df = df.astype(bool)

In [10]:
# Step 3: Apply the Apriori algorithm to find frequent itemsets
# We can set min_support to a reasonable threshold, such as 0.1 (i.e., 10%)
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

In [11]:
# Step 4: Generate association rules from the frequent itemsets
# We use 'confidence' as the metric, with a minimum confidence threshold (e.g., 70%)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)



The generated rules show relationships between books, such as if customers buy "Children's Books," they are also likely to buy "Youth Books."

Lift: If the lift value is greater than 1, it indicates a strong positive association between the antecedents (purchased books) and the consequents (associated books).


Confidence: Confidence indicates the probability of a customer buying the consequent books if they have already purchased the antecedent books. A higher confidence value implies a stronger association

In [12]:
# Step 5: Output the results
print("Frequent Itemsets:")
print(frequent_itemsets)



Frequent Itemsets:
    support                       itemsets
0    0.4230                     (ChildBks)
1    0.2475                     (YouthBks)
2    0.4310                      (CookBks)
3    0.2820                     (DoItYBks)
4    0.2145                       (RefBks)
5    0.2410                       (ArtBks)
6    0.2760                      (GeogBks)
7    0.1135                     (ItalCook)
8    0.1085                     (Florence)
9    0.1650           (ChildBks, YouthBks)
10   0.2560            (ChildBks, CookBks)
11   0.1840           (ChildBks, DoItYBks)
12   0.1515             (ChildBks, RefBks)
13   0.1625             (ChildBks, ArtBks)
14   0.1950            (ChildBks, GeogBks)
15   0.1620            (YouthBks, CookBks)
16   0.1155           (DoItYBks, YouthBks)
17   0.1010             (ArtBks, YouthBks)
18   0.1205            (YouthBks, GeogBks)
19   0.1875            (DoItYBks, CookBks)
20   0.1525              (RefBks, CookBks)
21   0.1670              (ArtBks, C

The **ChildBks** and **CookBks** categories show high support (over 42%), indicating they are popular purchases and potential targets for promotional efforts. Additionally, significant combinations like **(YouthBks, ChildBks)** and **(CookBks, ChildBks)** highlight opportunities for cross-promotions and bundled sales strategies.

In [13]:
print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])



Association Rules:
             antecedents consequents  support  confidence      lift
0               (RefBks)  (ChildBks)   0.1515    0.706294  1.669725
1              (GeogBks)  (ChildBks)   0.1950    0.706522  1.670264
2               (RefBks)   (CookBks)   0.1525    0.710956  1.649549
3             (ItalCook)   (CookBks)   0.1135    1.000000  2.320186
4   (ChildBks, YouthBks)   (CookBks)   0.1290    0.781818  1.813963
5    (YouthBks, CookBks)  (ChildBks)   0.1290    0.796296  1.882497
6   (ChildBks, DoItYBks)   (CookBks)   0.1460    0.793478  1.841017
7    (DoItYBks, CookBks)  (ChildBks)   0.1460    0.778667  1.840820
8     (ChildBks, RefBks)   (CookBks)   0.1225    0.808581  1.876058
9      (RefBks, CookBks)  (ChildBks)   0.1225    0.803279  1.899004
10    (ChildBks, ArtBks)   (CookBks)   0.1265    0.778462  1.806175
11     (ArtBks, CookBks)  (ChildBks)   0.1265    0.757485  1.790745
12   (ChildBks, GeogBks)   (CookBks)   0.1495    0.766667  1.778809
13    (GeogBks, CookBks)  (C

Customers frequently buy **Cookbooks** with other categories like **Youth, Art, and Children's Books**, indicating strong cross-selling opportunities. Additionally, high lift values suggest significant co-purchase behavior between categories like **Reference Books** and **Children's Books**.

In [14]:
#Business Impact:

#Kitabi Duniya can use these insights to better arrange books in the store or recommend items during the checkout process.
#Understanding customer purchase patterns can also help design personalized offers and drive more sales through targeted marketing campaigns.