### **Imports and Initial Setup**
1. **`import pandas as pd`**  
   - Imports the Pandas library for data manipulation and analysis.

2. **`from mlxtend.frequent_patterns import apriori`**  
   - Imports the `apriori` algorithm from the `mlxtend` library for finding frequent itemsets.

3. **`from mlxtend.frequent_patterns import association_rules`**  
   - Imports the `association_rules` function from the `mlxtend` library to generate association rules from frequent itemsets.

---

### **Reading and Exploring Data**
4. **`df = pd.read_csv('../data/Groceries_dataset.csv')`**  
   - Loads the Groceries dataset from the specified file path into a Pandas DataFrame named `df`.

5. **`df`**  
   - Displays the entire DataFrame for inspection.

6. **`df.dtypes`**  
   - Shows the data types of each column in the dataset.

7. **`df['Date'] = pd.to_datetime(df['Date'])`**  
   - Converts the `Date` column to a datetime format for easier time-based analysis.

8. **`df.dtypes`**  
   - Verifies that the `Date` column has been successfully converted to the datetime type.

---

### **Data Preparation**
9. **`df2 = df.set_index(['Date'])`**  
   - Sets the `Date` column as the index of the DataFrame `df` to organize the data by date.

10. **`df2`**  
    - Displays the updated DataFrame with `Date` as the index.

11. **`baskets = df.groupby(['Member_number', 'itemDescription']).count()`**  
    - Groups the dataset by `Member_number` and `itemDescription`, then counts the occurrences of each item for each member.

12. **`baskets.head()`**  
    - Displays the first 5 rows of the grouped DataFrame `baskets`.

---

### **Data Transformation**
13. **`baskets = df.groupby(['Member_number', 'itemDescription']).count().unstack()`**  
    - Transforms the grouped data into a pivot table format where rows are `Member_number` and columns are `itemDescription`.

14. **`baskets.head()`**  
    - Displays the first 5 rows of the pivoted DataFrame.

15. **`baskets = df.groupby(['Member_number', 'itemDescription']).count().unstack().fillna(0).reset_index()`**  
    - Fills missing values with 0 (indicating no purchase of a particular item), then resets the index to remove the multi-index structure.

16. **`baskets.head()`**  
    - Displays the first 5 rows of the updated DataFrame.

---

### **One-Hot Encoding**
17. **`def one_hot_encoder(k):`**  
    - Defines a function to perform one-hot encoding for the data.

18. **`if k <= 0: return 0`**  
    - Returns `0` if the value (item count) is less than or equal to 0.

19. **`if k >= 1: return 1`**  
    - Returns `1` if the value (item count) is greater than or equal to 1.

20. **`baskets_final = baskets.iloc[:, 1:-1].applymap(one_hot_encoder)`**  
    - Applies the `one_hot_encoder` function to the relevant columns of the `baskets` DataFrame, converting item counts into binary values (0 or 1).

21. **`baskets_final.head()`**  
    - Displays the first 5 rows of the one-hot encoded DataFrame.

---

### **Apriori Algorithm**
22. **`frequent_itemsets = apriori(baskets_final, min_support=0.025, use_colnames=True).sort_values(by='support')`**  
    - Finds frequent itemsets with a minimum support of 2.5% and sorts them by the `support` value in ascending order.

23. **`frequent_itemsets.head(25)`**  
    - Displays the top 25 frequent itemsets.

---

### **Association Rule Mining**
24. **`rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1).sort_values('lift', ascending=False)`**  
    - Generates association rules using the `lift` metric with a minimum threshold of 1, then sorts the rules in descending order of `lift`.

25. **`rules = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]`**  
    - Selects and retains only the relevant columns: `antecedents`, `consequents`, `support`, `confidence`, and `lift`.

26. **`rules.head(25)`**  
    - Displays the top 25 association rules based on `lift`.

---

### Summary of Workflow:
1. Import libraries.
2. Load and preprocess the dataset.
3. Transform the data for basket analysis (one-hot encoding).
4. Apply the Apriori algorithm to find frequent itemsets.
5. Generate and evaluate association rules based on the frequent itemsets.
 

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [41]:
df=pd.read_csv('../data/Groceries_dataset.csv')
df

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk
...,...,...,...
38760,4471,08-10-2014,sliced cheese
38761,2022,23-02-2014,candy
38762,1097,16-04-2014,cake bar
38763,1510,03-12-2014,fruit/vegetable juice


In [42]:
df.dtypes

Member_number       int64
Date               object
itemDescription    object
dtype: object

In [43]:
df['Date']=pd.to_datetime(df['Date'])

  df['Date']=pd.to_datetime(df['Date'])


In [44]:
df.dtypes

Member_number               int64
Date               datetime64[ns]
itemDescription            object
dtype: object

In [45]:
df2=df.set_index(['Date'])
df2

Unnamed: 0_level_0,Member_number,itemDescription
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-07-21,1808,tropical fruit
2015-01-05,2552,whole milk
2015-09-19,2300,pip fruit
2015-12-12,1187,other vegetables
2015-02-01,3037,whole milk
...,...,...
2014-10-08,4471,sliced cheese
2014-02-23,2022,candy
2014-04-16,1097,cake bar
2014-12-03,1510,fruit/vegetable juice


In [46]:
baskets=df.groupby(['Member_number','itemDescription']).count()
baskets.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Date
Member_number,itemDescription,Unnamed: 2_level_1
1000,canned beer,1
1000,hygiene articles,1
1000,misc. beverages,1
1000,pastry,1
1000,pickled vegetables,1


In [47]:
# unstack()
# Converts the grouped data into a pivot-like table (reshaping).
# It takes the unique values in the second group-by column (itemDescription) and turns them into columns, while the first group (Member_number) remains as rows.

baskets=df.groupby(['Member_number','itemDescription']).count().unstack()
baskets.head()

Unnamed: 0_level_0,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date
itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
Member_number,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1000,,,,,,,,,,,...,,,,,,,,2.0,1.0,
1001,,,,,,,,,1.0,,...,,,,1.0,,1.0,,2.0,,
1002,,,,,,,,,,,...,,,,,,,,1.0,,
1003,,,,,,,,,,,...,,,,,,,,,,
1004,,,,,,,,,,,...,,,,,,,,3.0,,


In [48]:
baskets=df.groupby(['Member_number','itemDescription']).count().unstack().fillna(0).reset_index()
baskets.head()

Unnamed: 0_level_0,Member_number,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date
itemDescription,Unnamed: 1_level_1,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,1000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0
1,1001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,2.0,0.0,0.0
2,1002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,1003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0


In [49]:
def one_hot_encoder(k):
  if k<=0 :
     return 0
  if k>=1:
      return 1

In [50]:
baskets_final=baskets.iloc[:,1:-1].applymap(one_hot_encoder)
baskets_final.head()

  baskets_final=baskets.iloc[:,1:-1].applymap(one_hot_encoder)


Unnamed: 0_level_0,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date,Date
itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,tropical fruit,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
1,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,1,0,1,0,1,0
2,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0


In [51]:
frequent_itemsets=apriori(baskets_final,min_support=0.025,use_colnames=True).sort_values(by='support')
frequent_itemsets.head(25)




Unnamed: 0,support,itemsets
69,0.025141,"((Date, spread cheese))"
85,0.025141,"((Date, pip fruit), (Date, beef))"
248,0.025141,"((Date, domestic eggs), (Date, shopping bags))"
480,0.025141,"((Date, whole milk), (Date, frankfurter), (Dat..."
467,0.025141,"((Date, whole milk), (Date, citrus fruit), (Da..."
119,0.025141,"((Date, bottled water), (Date, chocolate))"
161,0.025141,"((Date, butter), (Date, shopping bags))"
239,0.025141,"((Date, dessert), (Date, root vegetables))"
524,0.025141,"((Date, whole milk), (Date, pastry), (Date, ro..."
405,0.025141,"((Date, sliced cheese), (Date, whole milk))"


In [52]:
rules=association_rules(frequent_itemsets,metric='lift',min_threshold=1,num_itemsets=len(frequent_itemsets['itemsets'])).sort_values('lift',ascending=False)
rules=rules[['antecedents','consequents','support','confidence','lift']]
rules.head(25)

Unnamed: 0,antecedents,consequents,support,confidence,lift
170,"((Date, rolls/buns), (Date, other vegetables))","((Date, whole milk), (Date, sausage))",0.026167,0.178322,1.666901
171,"((Date, whole milk), (Date, sausage))","((Date, rolls/buns), (Date, other vegetables))",0.026167,0.244604,1.666901
169,"((Date, rolls/buns), (Date, sausage))","((Date, whole milk), (Date, other vegetables))",0.026167,0.317757,1.660344
172,"((Date, whole milk), (Date, other vegetables))","((Date, rolls/buns), (Date, sausage))",0.026167,0.136729,1.660344
882,"((Date, whole milk), (Date, other vegetables))","((Date, rolls/buns), (Date, yogurt))",0.034377,0.179625,1.613311
879,"((Date, rolls/buns), (Date, yogurt))","((Date, whole milk), (Date, other vegetables))",0.034377,0.308756,1.613311
878,"((Date, rolls/buns), (Date, whole milk))","((Date, yogurt), (Date, other vegetables))",0.034377,0.192529,1.600164
883,"((Date, yogurt), (Date, other vegetables))","((Date, rolls/buns), (Date, whole milk))",0.034377,0.285714,1.600164
168,"((Date, rolls/buns), (Date, whole milk))","((Date, other vegetables), (Date, sausage))",0.026167,0.146552,1.578062
173,"((Date, other vegetables), (Date, sausage))","((Date, rolls/buns), (Date, whole milk))",0.026167,0.281768,1.578062


### Conclusion

The output presents **association rules** derived from the dataset using the Apriori algorithm. Let’s break down the results column by column and explain their significance:

---

### **Columns Explanation**
1. **`antecedents`**:  
   - The "if" part of the rule, indicating the item(s) that are present in a transaction.  
   - Example: `((Date, rolls/buns), (Date, other vegetables))` means "if a customer buys `rolls/buns` and `other vegetables`."

2. **`consequents`**:  
   - The "then" part of the rule, representing the item(s) likely to be bought given the antecedents.  
   - Example: `((Date, whole milk), (Date, sausage))` means "then the customer is likely to buy `whole milk` and `sausage`."

3. **`support`**:  
   - The proportion of transactions containing both the antecedents and consequents.  
   - Example: `0.026167` means 2.617% of transactions include both the antecedents and consequents.

4. **`confidence`**:  
   - The likelihood that the consequents are purchased when the antecedents are present.  
   - Example: `0.178322` means there is a 17.832% probability of buying `whole milk` and `sausage` if `rolls/buns` and `other vegetables` are purchased.

5. **`lift`**:  
   - The ratio of the observed confidence to the expected confidence (if the antecedents and consequents were independent).  
   - A lift value > 1 indicates a positive association, meaning the antecedents increase the likelihood of the consequents being purchased.  
   - Example: `1.666901` means that purchasing `rolls/buns` and `other vegetables` makes buying `whole milk` and `sausage` 1.67 times more likely than random chance.

---

### **Insights Derived**
1. **Strong Associations**:  
   - Rules with high `lift` (e.g., > 1.5) indicate strong positive correlations between items. For instance:  
     - `((Date, rolls/buns), (Date, other vegetables)) → ((Date, whole milk), (Date, sausage))` with a lift of `1.666901`.

2. **Frequent Combinations**:  
   - Items like `rolls/buns`, `whole milk`, and `other vegetables` frequently appear together in strong rules. These could be essential combinations to highlight in promotions or store layouts.

3. **Key Products**:  
   - Products such as `whole milk`, `sausage`, and `yogurt` often appear in both antecedents and consequents, indicating they are central to many purchase patterns.  

4. **Strategic Insights**:
   - Bundling: Consider offering discounts or promotions on combinations like `rolls/buns` + `other vegetables` + `whole milk`.  
   - Store Arrangement: Place frequently co-purchased items close to each other to encourage purchases.  

5. **Confidence and Lift Trade-offs**:
   - While some rules have high confidence (e.g., `rolls/buns` → `yogurt` with `confidence = 0.320276`), their lift values may not be very high, suggesting these patterns may not be as strong as others.

---

### Example Interpretation
Rule: `((Date, rolls/buns), (Date, sausage)) → ((Date, whole milk), (Date, other vegetables))`  
- Support: 2.617% of transactions include all four items.  
- Confidence: If a customer buys `rolls/buns` and `sausage`, there's a 31.775% chance they'll also buy `whole milk` and `other vegetables`.  
- Lift: The likelihood of purchasing `whole milk` and `other vegetables` increases by 1.66 times when `rolls/buns` and `sausage` are purchased together.

---

### How to Use These Rules
- **Marketing**: Target customers who purchase certain items with promotions on likely co-purchased items.  
- **Stock Management**: Ensure high-demand combinations are always in stock.  
- **Upselling**: Train staff to recommend items with strong association when a customer purchases specific products.
 

In [53]:
# END