FP-Growth (Frequent-Pattern Growth) is an association-rule mining algorithm used to discover frequent itemsets—sets of items that often appear together in transactional data (like market-basket analysis).
Compared with Apriori, FP-Growth avoids costly candidate generation by compressing transactions into a prefix tree (FP-tree) and mining patterns recursively.

### Create sample data

In [None]:
import pandas as pd

dataset = [
    ["milk", "bread", "eggs"],
    ["milk", "bread"],
    ["milk", "eggs"],
    ["bread", "eggs"],
    ["milk", "bread", "eggs"],
]

transactions_df = pd.DataFrame(dataset, columns=["item1", "item2", "item3"])
transactions_df

Unnamed: 0,item1,item2,item3
0,milk,bread,eggs
1,milk,bread,
2,milk,eggs,
3,bread,eggs,
4,milk,bread,eggs


`mlxtend.frequent_patterns` expects one-hot encoded data (each column = item, each row = transaction, 1 = item present).
Use `TransactionEncoder`:

In [6]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
onehot_df = pd.DataFrame(te_ary, columns=te.columns_)
onehot_df.head()

Unnamed: 0,bread,eggs,milk
0,True,True,True
1,True,False,True
2,False,True,True
3,True,True,False
4,True,True,True


### Minint frequent itemsets with FP-Growth

In [15]:
from mlxtend.frequent_patterns import fpgrowth

freq_itemsets = fpgrowth(
    onehot_df,
    min_support=0.4,  # appear in ≥40% of transactions
    use_colnames=True,  # show item names
)

freq_itemsets

Unnamed: 0,support,itemsets
0,0.8,(milk)
1,0.8,(eggs)
2,0.8,(bread)
3,0.6,"(eggs, milk)"
4,0.6,"(bread, eggs)"
5,0.6,"(bread, milk)"
6,0.4,"(bread, eggs, milk)"


Output columns

- support: proportion of transactions containing that itemset
- itemsets: the frequent item combinations

Interpretation: {milk, bread} appears in 60% of all transactions

### Generating Association Rules

To derive if-then rules with metrics like confidence and lift:

In [29]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(
    freq_itemsets,
    metric="confidence",
    min_threshold=0.7,  # only rules with confidence ≥0.7
)

rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(eggs),(milk),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75
1,(milk),(eggs),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75
2,(bread),(eggs),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75
3,(eggs),(bread),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75
4,(bread),(milk),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75
5,(milk),(bread),0.8,0.8,0.6,0.75,0.9375,1.0,-0.04,0.8,-0.25,0.6,-0.25,0.75


Important columns:
- antecedents – left side of the rule
- consequents – right side
- support – P(A ∪ B)
- confidence – P(B | A)
- lift – ratio of confidence to baseline probability of B (>1 indicates positive association)

| Column               | Meaning                                                                                                                                                                                                                      |
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| antecedents          | The IF side of the rule. Example: {eggs} in row 0 means “IF a basket contains eggs …”                                                                                                |
| consequents          | The THEN side of the rule. Row 0 means “… THEN it is likely to contain milk.”                                                                                                        |
| antecedent support   | Fraction of all transactions that contain the antecedent. 0.8 ⇒ 80% of baskets include eggs (or milk, bread, etc.).                                                                  |
| consequent support   | Fraction of transactions that contain the consequent. 0.8 ⇒ 80% include milk.                                                                                                        |
| support              | Fraction of transactions that contain both antecedent and consequent. 0.6 ⇒ 60% of baskets contain eggs and milk together.                                                           |
| confidence           | support / antecedent_support ⇒ probability the consequent is present when the antecedent is present. Example row 0: 0.6 / 0.8 = 0.75 ⇒ if eggs are bought, 75% of the time milk is also bought. |
| lift                 | confidence / consequent_support. Measures how much more often the pair occurs together than if independent. 0.9375 (< 1) ⇒ eggs and milk co-occur slightly less than chance.           |
| representativity     | How well this rule represents all transactions that contain the consequent; 1.0 means perfect coverage.                                                                              |
| leverage             | support − (antecedent_support × consequent_support). Positive ⇒ items appear together more than random, negative ⇒ less. -0.04 means slightly less than random expectation.           |
| conviction           | (1 − consequent_support) / (1 − confidence). Values > 1 indicate stronger implication. 0.8 < 1 shows the rule is not very “convincing.”                                              |
| zhangs_metric        | Zhang’s measure of association (−1 to 1). Positive means positive correlation; −0.25 is weak negative.                                                                               |
| jaccard              | support / (antecedent_support + consequent_support − support). Similarity between sets; 0.6 means 60% overlap.                                                                       |
| certainty            | Probability of antecedent given consequent (reverse direction) minus baseline; −0.25 indicates slight negative association.                                                          |
| kulczynski           | Average of confidence(A→B) and confidence(B→A). 0.75 indicates moderate symmetrical association.                                                                                     |

#### Why lift?

Confidence alone can be misleading.  
If B is very common (high support), confidence will naturally be high even if A and B are unrelated.

**Example:**  
Suppose 90% of shoppers buy bread (support(B) = 0.9).  
Even if eggs have nothing to do with bread, any rule A→bread will have high confidence.

Lift corrects for this popularity, showing the true strength of association.

| Metric     | What it tells you                                 | How to use it                        |
|------------|--------------------------------------------------|--------------------------------------|
| Confidence | How often B appears when A is present            | Good for measuring predictive power. |
| Lift       | How much more/less likely A & B co-occur than random | Best for judging real association strength. |