In [1]:
pip install mlxtend

Note: you may need to restart the kernel to use updated packages.


In [3]:
data = [['milk','bread','rice','book'],
       ['bread','jam','book','pen'],
        ['jam','milk','bread','rice','eggs'],
        ['rice','eggs','pen','book'],
        ['eggs','pen','milk','bread','jam'],
        ['eggs','pen','milk','bread','jam'],
        ['eggs','rice','bread','jam']
       ]

In [7]:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_array = te.fit_transform(data)
te_array

array([[ True,  True, False, False,  True, False,  True],
       [ True,  True, False,  True, False,  True, False],
       [False,  True,  True,  True,  True, False,  True],
       [ True, False,  True, False, False,  True,  True],
       [False,  True,  True,  True,  True,  True, False],
       [False,  True,  True,  True,  True,  True, False],
       [False,  True,  True,  True, False, False,  True]])

- This code is used to convert a list of transactions into a format suitable for machine learning models, typically for association rule mining (e.g., Apriori or FP-Growth algorithm).

**üîç Step-by-Step Breakdown**
- 1Ô∏è‚É£ Importing TransactionEncoder

##### from mlxtend.preprocessing import TransactionEncoder
- TransactionEncoder is a data transformation utility from mlxtend, used for handling transactional datasets (like market basket data).

**2Ô∏è‚É£ Creating an Instance of TransactionEncoder**

##### te = TransactionEncoder()
- Initializes an object te to encode transactional data.

**3Ô∏è‚É£ Transforming the Data**

##### te_array = te.fit_transform(data)

- fit_transform(data) converts the transaction data into a Boolean NumPy array.

- data should be a list of lists, where each inner list is a transaction containing items.

- üîπ Input (Raw Transaction Data)

| data =                     |
|----------------------------|
|[['Milk', 'Bread', 'Butter']|
|['Bread', 'Butter']|
|  ['Milk', 'Bread']|
|  ['Milk', 'Butter']|

**üîπ After Applying TransactionEncoder**

- te_array = te.fit_transform(data)
- This converts transactions into a Boolean array, where:

- True (1) means the item is present in the transaction.

- False (0) means the item is absent.

  üîπ Output (te_array)
|Milk|	Bread| |Butter|
|----|--|-----|--------|
|1      |1|	    |1|
|0	    |1|	    |1|
|1	    |1|	    |0|
|1	    |0|	    |1|

**üìå Why Use TransactionEncoder?**

- ‚úÖ Converts unordered transactions into a structured numerical format.
- ‚úÖ Enables easy application of association rule mining algorithms (like Apriori & FP-Growth).
- ‚úÖ Efficient storage using a sparse Boolean ma

In [9]:
import pandas as pd
df = pd.DataFrame(te_array,columns = te.columns_)
df

Unnamed: 0,book,bread,eggs,jam,milk,pen,rice
0,True,True,False,False,True,False,True
1,True,True,False,True,False,True,False
2,False,True,True,True,True,False,True
3,True,False,True,False,False,True,True
4,False,True,True,True,True,True,False
5,False,True,True,True,True,True,False
6,False,True,True,True,False,False,True


**üìå  Convert te_array (Encoded Data) into a DataFrame**

##### df = pd.DataFrame(te_array, columns=te.columns_)
- te_array ‚Üí This is the Boolean NumPy array generated by TransactionEncoder.fit_transform(data).

- te.columns_ ‚Üí This returns the original item names (column names) from the transaction data.

- Creates a DataFrame, where:

- Rows represent transactions.

- Columns represent items (products).

- Values are 1 (True) if the item was in the transaction, 0 (False) if not

 **üõ† Example Execution**

üîπ Input: Transaction Data

data = [['Milk', 'Bread', 'Butter'],
        ['Bread', 'Butter'],
        ['Milk', 'Bread'],
        ['Milk', 'Butter']]
- After applying:


- from mlxtend.preprocessing import TransactionEncoder
- te = TransactionEncoder()
- te_array = te.fit_transform(data)
- df = pd.DataFrame(te_array, columns=te.columns_)
- print(df)

**üîπ Output: Encoded DataFrame**
- Milk	Bread	Butter
- 1	1	1
- 0	1	1
- 1	1	0
- 1	0	1


In [10]:
import pandas as pd
df = pd.DataFrame(te_array,columns = te.columns_).astype(int)
df

Unnamed: 0,book,bread,eggs,jam,milk,pen,rice
0,1,1,0,0,1,0,1
1,1,1,0,1,0,1,0
2,0,1,1,1,1,0,1
3,1,0,1,0,0,1,1
4,0,1,1,1,1,1,0
5,0,1,1,1,1,1,0
6,0,1,1,1,0,0,1


In [18]:
from mlxtend.frequent_patterns import apriori
itemset = apriori(df,min_support = 0.6,use_colnames = True)
itemset



Unnamed: 0,support,itemsets
0,0.857143,(bread)
1,0.714286,(eggs)
2,0.714286,(jam)
3,0.714286,"(bread, jam)"


- This code applies the Apriori algorithm to find frequent itemsets (combinations of items that appear together frequently in transactions).

**üîç Step-by-Step Breakdown**
- 1Ô∏è‚É£ Import Apriori Algorithm

- from mlxtend.frequent_patterns import apriori
- apriori is a function from mlxtend used for frequent itemset mining in association rule learning.

**üõ† Example Output**
- If the dataset contains:


| Milk | Bread | Butter |
|------|-------|--------|
|  1   |   1   |   1    |
|  0   |   1   |   1    |
|  1   |   1   |   0    |
|  1   |   0   |   1    |
- The output itemset might be:


| |support|   itemsets|
|-|------|-----------|
|0| 0.75 | (Milk)    |
|1| 0.75 | (Butter)   |
|2| 0.75 | (Bread)|
|3| 0.50 | (Milk, Butter)|
|4| 0.50 | (Bread, Butter)|
- support ‚Üí Fraction of transactions containing the itemset.

- itemsets ‚Üí Frequent item combinations.

**üìå Why Use Apriori?**
- ‚úÖ Helps find frequent itemsets in market basket analysis.
- ‚úÖ Used in recommendation systems (e.g., Amazon, grocery stores).
- ‚úÖ Forms the basis for association rule mining (e.g., Milk ‚Üí Bread).

In [19]:
from mlxtend.frequent_patterns import association_rules
res = association_rules(itemset,metric = 'confidence',min_threshold = 0.6)
res

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(bread),(jam),0.857143,0.714286,0.714286,0.833333,1.166667,1.0,0.102041,1.714286,1.0,0.833333,0.416667,0.916667
1,(jam),(bread),0.714286,0.857143,0.714286,1.0,1.166667,1.0,0.102041,inf,0.5,0.833333,1.0,0.916667


- This code extracts association rules from the frequent itemsets generated by the Apriori algorithm.

**üîç Step-by-Step Breakdown**
- 1Ô∏è‚É£ Import association_rules Function

##### from mlxtend.frequent_patterns import association_rules
- This function derives association rules from frequent itemsets.

- Association rules help identify relationships like "If a customer buys Milk, they are likely to buy Bread."

- 2Ô∏è‚É£ Generate Association Rules

##### res = association_rules(itemset, metric='confidence', min_threshold=0.6)
- itemset ‚Üí The frequent itemsets generated using apriori().

- metric='confidence' ‚Üí The rule evaluation metric (confidence measures how often the rule is correct).

- min_threshold=0.6 ‚Üí Only rules with a confidence score of at least 60% will be considered.

**üõ† Example Output**

- If itemset contains frequent item combinations, the output res (a DataFrame) might look like this:

|antecedents	|consequents	|support|	confidence|	lift|
|-----------|----------------|-----------|-------------|----------|
|(Milk)	|(Bread)|	0.75	|0.80	|1.2|
|(Butter)|	(Milk)	|0.60	|0.75	|1.5|

- üîπ What Does This Mean?
- (Milk) ‚Üí (Bread)

- Confidence = 0.80 ‚Üí 80% of the time, when people buy Milk, they also buy Bread.

- Lift = 1.2 ‚Üí Buying Milk increases the chance of buying Bread by 20% compared to random chance.

- (Butter) ‚Üí (Milk)

- Confidence = 0.75 ‚Üí 75% of the time, if a customer buys Butter, they also buy Milk.

- Lift = 1.5 ‚Üí This association is strong because it increases the chance of buying Milk by 50%.

**üìå Why Use Association Rules?**

- ‚úÖ Helps identify customer buying patterns (useful in market basket analysis).
- ‚úÖ Used in recommendation systems (e.g., "People who bought this also bought that").
- ‚úÖ Helps optimize product placement in stor

In [23]:
result= res[['antecedents','consequents','support','confidence','lift']]
result

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(bread),(jam),0.714286,0.833333,1.166667
1,(jam),(bread),0.714286,1.0,1.166667


**üîç Explanation of the Code:**

##### result = res[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

-Extracts specific columns from the res DataFrame:

- antecedents ‚Üí Items on the left-hand side (LHS) of the rule (e.g., "Milk").

- consequents ‚Üí Items on the right-hand side (RHS) of the rule (e.g., "Bread").

- support ‚Üí How often the itemset appears in transactions.

- confidence ‚Üí The probability that if the antecedent is bought, the consequent is also bought.

- lift ‚Üí How much the antecedent increases the chance of buying the consequent.

**üõ† Example Output**
|antecedents|	consequents|	support|	confidence|	lift|
|------|----------|---------|-----------|---|
|(Milk)|	(Bread)|	0.75|	0.80	|1.2|
(Butter)|	(Milk)|	0.60|	0.75|	1.5|
