# Association Rules

In data science, association rules are used to find correlations and co-occurrences between data sets. They are ideally used to explain patterns in data from seemingly independent information repositories, such as relational databases and transactional databases. The act of using association rules is sometimes referred to as "association rule mining" or "mining associations."

### Medicine
Doctors can use association rules to help diagnose patients. There are many variables to consider when making a diagnosis, as many diseases share symptoms. By using association rules and machine learning-fueled data analysis, doctors can determine the conditional probability of a given illness by comparing symptom relationships in the data from past cases. As new diagnoses get made, the machine learning model can adapt the rules to reflect the updated data.

### Retail
Retailers can collect data about purchasing patterns, recording purchase data as item barcodes are scanned by point-of-sale systems. Machine learning models can look for co-occurrence in this data to determine which products are most likely to be purchased together. The retailer can then adjust marketing and sales strategy to take advantage of this information.

<img src=https://bigdata.go.th/wp-content/uploads/2021/04/table_baskets-1-1024x518.png>

In [None]:
import numpy as np
data = np.loadtxt("market.csv",delimiter=";", dtype=str)
data

In [None]:
data.shape

In [None]:
data[0,:]

In [None]:
data[1,:]

In [None]:
data[-1,:]

In [None]:
products = data[0, :]
products

In [None]:
baskets = np.array(data[1:], dtype=int)
baskets

In [None]:
products.shape

In [None]:
baskets.shape

In [None]:
products[baskets[0,:] == 1]

In [None]:
products[baskets[-1,:] == 1]

### Itemset
A group of products, e.g., {water} or {eggs, butter}
### Association Rule
Itemset LHS => Itemset RHS

For example: Bread => Butter, Eggs


### Support
What fraction of transactions include the itemset <br>
<b># baskets containing the itemset / total baskets</b>


### Confidence
Probability that the RHS items are in the basket after the LHS items are already there. <br>

Confidence(LHS => RHS) = Support(LHS, RHS)/Support(LHS) <br>


### Lift
How many times more (or less) likely RHS items are chosen after LHS compared with buying RHS normally. <br><br>
Lift(LHS=>RHS) = Confidence(LHS=>RHS) / Support(RHS) <br>

Values above 1 increase the likelihood.


### Calcualte Association Rule at Support 0.4

In [None]:
number_of_basket = len(baskets)
number_of_basket

In [None]:
number_of_buying = np.sum(baskets, axis=0)
number_of_buying_each_product = dict(zip(products,number_of_buying))
number_of_buying_each_product

In [None]:
products_support = {key:value/number_of_basket for key, value in number_of_buying_each_product.items()}
products_support

In [None]:
products_support_40 = {key:value for key, value in products_support.items() if value > 0.4}
products_support_40

### Calculate confident of which Bread => Bacon
#### Support(Bread, Bacon)/Support(Bread)

In [None]:
# Support(Bread)
products_support['Bread']

### Calculate Support(Bread, Bacon)

In [None]:
baskets[:,products=='Bread']

In [None]:
baskets[:,products=='Bacon'] 

In [None]:
# Number of customer buying both Bread & Bacon
freq = np.sum(baskets[:,products=='Bread'] * baskets[:,products=='Bacon'])
freq

In [None]:
support_freq = freq / number_of_basket
support_freq

#### Bread -> Bacon

In [None]:
# Confident
confident = support_freq / products_support['Bread']
confident

In [None]:
# Lift
lift = confident /  products_support['Bacon']
lift

In [171]:
import numpy as np

# Load local market data
data = np.loadtxt('market.csv', delimiter=';', dtype=str)
products = data[0, :]
baskets = np.array(data[1:], dtype=int)

# Basic stats
number_of_basket = len(baskets)
number_of_buying = np.sum(baskets, axis=0)
products_support = {k: v / number_of_basket for k, v in zip(products, number_of_buying)}

# Bread -> Bacon rule
idx_bread = products == 'Bread'
idx_bacon = products == 'Bacon'
freq = np.sum(baskets[:, idx_bread] * baskets[:, idx_bacon])
support_freq = freq / number_of_basket
confidence = support_freq / products_support['Bread']
lift = confidence / products_support['Bacon']

lines = []
lines.append(f'Total baskets: {number_of_basket}')
lines.append('')
lines.append('Product support (high to low):')
for name, sup in sorted(products_support.items(), key=lambda x: x[1], reverse=True):
    lines.append(f' - {name:12s} {sup:.2%}')
lines.append('')
lines.append('Rule: Bread => Bacon')
lines.append(f'  Support   : {support_freq:.2%}')
lines.append(f'  Confidence: {confidence:.2%}')
lines.append(f'  Lift      : {lift:.3f}')

print('\n'.join(lines))


Jami savatchalar: 464

Mahsulotlar support (ko'pdan kamga):
 - Banana       44.83%
 - Cheese       44.40%
 - Bacon        43.10%
 - Hazelnut     42.03%
 - Honey        41.59%
 - HeavyCream   41.59%
 - Carrot       41.38%
 - Bread        40.73%
 - Apple        40.52%
 - ShavingFoam  40.52%
 - Egg          40.30%
 - Salt         39.87%
 - Meat         38.79%
 - Flour        38.58%
 - Toothpaste   38.36%
 - Cucumber     38.15%
 - Olive        38.15%
 - Onion        37.93%
 - Butter       37.50%
 - Milk         37.07%
 - Shampoo      36.64%
 - Sugar        36.64%

Qoida: Bread => Bacon
  Support   : 20.26%
  Confidence: 49.74%
  Lift      : 1.154


In [169]:
# Helper functions to inspect rules
import numpy as np


def show_rule(lhs, rhs):
    """Print support/confidence/lift for the products in lhs and rhs lists."""
    lhs_mask = np.isin(products, lhs)
    rhs_mask = np.isin(products, rhs)
    if lhs_mask.sum() == 0:
        print(f"LHS not found: {lhs}")
        return
    if rhs_mask.sum() == 0:
        print(f"RHS not found: {rhs}")
        return

    lhs_hits = np.all(baskets[:, lhs_mask] == 1, axis=1)
    rhs_hits = np.all(baskets[:, rhs_mask] == 1, axis=1)
    both_hits = lhs_hits & rhs_hits

    support = both_hits.mean()
    lhs_support = lhs_hits.mean()
    rhs_support = rhs_hits.mean()
    confidence = support / lhs_support if lhs_support else 0.0
    lift = confidence / rhs_support if rhs_support else 0.0

    lhs_str = ', '.join(lhs)
    rhs_str = ', '.join(rhs)
    print(f"Rule: {lhs_str} => {rhs_str}")
    print(f"  Support   : {support:.2%}")
    print(f"  Confidence: {confidence:.2%}")
    print(f"  Lift      : {lift:.3f}\n")


# Sample usage
show_rule(['Bread'], ['Bacon'])
show_rule(['Milk', 'Butter'], ['Bread'])


Qoida: Bread => Bacon
  Support   : 20.26%
  Confidence: 49.74%
  Lift      : 1.154

Qoida: Milk, Butter => Bread
  Support   : 8.84%
  Confidence: 60.29%
  Lift      : 1.480



In [170]:
# Interactive selection (ipywidgets)
try:
    import ipywidgets as widgets
    from IPython.display import display
except ImportError:
    print("ipywidgets not found. Install with: pip install ipywidgets")
else:
    lhs_select = widgets.SelectMultiple(options=list(products), description='LHS', rows=8)
    rhs_select = widgets.SelectMultiple(options=list(products), description='RHS', rows=8)
    run_button = widgets.Button(description='Calculate', button_style='success')
    out = widgets.Output()

    def _on_click(_):
        out.clear_output()
        lhs = list(lhs_select.value)
        rhs = list(rhs_select.value)
        with out:
            if not lhs or not rhs:
                print("Please select both LHS and RHS.")
                return
            show_rule(lhs, rhs)

    run_button.on_click(_on_click)
    display(widgets.VBox([
        widgets.HBox([lhs_select, rhs_select]),
        run_button,
        out
    ]))


VBox(children=(HBox(children=(SelectMultiple(description='LHS', options=(np.str_('Bread'), np.str_('Honey'), n…

In [None]:
# Multi-item rules (2-3 products) and a quick summary for managers
import itertools as it

# Parameters
min_support = 0.10  # appears in at least 10% of baskets
min_conf = 0.40     # at least 40% confidence
max_lhs = 2         # LHS length 1 or 2
max_rhs = 2         # RHS length 1 or 2

# Precompute masks
a_masks = {p: baskets[:, i].astype(bool) for i, p in enumerate(products)}


rules = []
for lhs_len in range(1, max_lhs + 1):
    for rhs_len in range(1, max_rhs + 1):
        for lhs in it.combinations(products, lhs_len):
            lhs_mask = a_masks[lhs[0]].copy()
            for p in lhs[1:]:
                lhs_mask &= a_masks[p]
            lhs_sup = lhs_mask.mean()
            if lhs_sup < min_support:
                continue

            remaining = [p for p in products if p not in lhs]
            for rhs in it.combinations(remaining, rhs_len):
                rhs_mask = a_masks[rhs[0]].copy()
                for p in rhs[1:]:
                    rhs_mask &= a_masks[p]
                rhs_sup = rhs_mask.mean()
                if rhs_sup == 0:
                    continue

                both = lhs_mask & rhs_mask
                sup = both.mean()
                if sup < min_support:
                    continue

                conf = sup / lhs_sup if lhs_sup else 0
                if conf < min_conf:
                    continue

                lift = conf / rhs_sup if rhs_sup else 0
                rules.append({
                    'lhs': lhs,
                    'rhs': rhs,
                    'support': sup,
                    'confidence': conf,
                    'lift': lift,
                    'lhs_sup': lhs_sup,
                    'rhs_sup': rhs_sup,
                })

# Strongest 12 rules by lift
rules_sorted = sorted(rules, key=lambda r: r['lift'], reverse=True)

print(f"Manager summary: support>={min_support:.0%}, confidence>={min_conf:.0%}, LHS<= {max_lhs}, RHS<= {max_rhs}")
print(f"Rules found: {len(rules_sorted)}")
print("Top 12 strongest rules (by lift):")
for r in rules_sorted[:12]:
    lhs = ', '.join(r['lhs'])
    rhs = ', '.join(r['rhs'])
    print(f"{lhs} => {rhs} | support={r['support']:.2%}, conf={r['confidence']:.2%}, lift={r['lift']:.3f}")


In [168]:
# Store layout suggestions (ipywidgets)
try:
    import ipywidgets as widgets
    from IPython.display import display
except ImportError:
    print("ipywidgets not found. Install with: pip install ipywidgets")
else:
    # Pairwise lift computation (A => B)
    lifts = []
    for i, a in enumerate(products):
        lhs = baskets[:, i].astype(bool)
        lhs_sup = lhs.mean()
        for j, b in enumerate(products):
            if i == j:
                continue
            rhs = baskets[:, j].astype(bool)
            rhs_sup = rhs.mean()
            if rhs_sup == 0:
                continue
            sup = (lhs & rhs).mean()
            if sup == 0:
                continue
            conf = sup / lhs_sup if lhs_sup else 0
            lift = conf / rhs_sup if rhs_sup else 0
            lifts.append((lift, conf, sup, a, b))

    def render(min_sup, min_lift, top_n):
        out_lines = []
        filtered = [r for r in lifts if r[2] >= min_sup and r[0] >= min_lift]
        filtered.sort(reverse=True)  # sort by lift desc
        out_lines.append(f"Top {top_n} pairs (by lift):")
        for lift, conf, sup, a, b in filtered[:top_n]:
            out_lines.append(f"{a} -> {b} | support={sup:.2%}, conf={conf:.2%}, lift={lift:.3f}")
        return '\n'.join(out_lines)

    min_sup_slider = widgets.FloatSlider(value=0.10, min=0.0, max=0.5, step=0.01, description='Min support')
    min_lift_slider = widgets.FloatSlider(value=1.1, min=0.8, max=2.0, step=0.05, description='Min lift')
    top_n_slider = widgets.IntSlider(value=10, min=3, max=30, step=1, description='Top N')
    out = widgets.Output()

    def _on_change(change=None):
        with out:
            out.clear_output()
            txt = render(min_sup_slider.value, min_lift_slider.value, top_n_slider.value)
            print(txt)

    for w in [min_sup_slider, min_lift_slider, top_n_slider]:
        w.observe(_on_change, names='value')

    _on_change()
    display(widgets.VBox([widgets.HBox([min_sup_slider, min_lift_slider, top_n_slider]), out]))


VBox(children=(HBox(children=(FloatSlider(value=0.1, description='Min support', max=0.5, step=0.01), FloatSlid…