(a) The drawback of using confidence is that it ignores the probability of B occurring in the absence of A. This means that a high confidence value may not necessarily imply a strong association between A and B. For example, consider a scenario where a customer who buys bread always buys butter as well. If we consider the rule "bread → butter", the confidence of this rule will be 100%. However, this does not necessarily mean that there is a strong association between bread and butter as it could be possible that most customers buy butter regardless of whether they buy bread or not.

On the other hand, lift and conviction do not suffer from this drawback as they both take into account the probability of B occurring in the absence of A. Lift compares the actual probability of co-occurrence of A and B to the expected probability if they were independent, while conviction compares the actual frequency of A occurring without B to the frequency we would expect if A and B were independent.

(b) Confidence and conviction are not symmetrical measures, while lift is symmetrical.

To show that confidence is not symmetrical, consider two item sets A and B where A is a subset of B. In this case, conf(A → B) ≠ conf(B → A), as conf(A → B) will always be equal to 1 (since B always contains A) while conf(B → A) may not necessarily be 1.

To show that conviction is not symmetrical, consider the same example as above. In this case, conv(A → B) ≠ conv(B → A), as conv(A → B) will always be equal to infinity (since S(B) = 1) while conv(B → A) may not necessarily be infinite.

To show that lift is symmetrical, consider the definition of lift:
lift(A → B) = conf(A → B) / S(B) = conf(B → A) / S(A) = lift(B → A)

(c) Lift and conviction have the property that they reach their maximum achievable value for all perfect implications.

For lift, the maximum value it can take is infinity, which occurs when A and B are always bought together and are never bought separately. This implies that lift will always reach its maximum value of infinity for all perfect implications.

For conviction, the maximum value it can take is infinity, which occurs when S(B) = 0 (i.e., B never occurs) and conf(A → B) = 0 (i.e., A and B never occur together). In this case, conv(A → B) will be equal to infinity, which implies that conviction will always reach its maximum value of infinity for all perfect implications.

In [None]:
from itertools import combinations
from collections import defaultdict

# Load dataset and remove duplicates
data = []
with open('/content/browsing.txt', 'r') as f:
    for line in f:
        items = set(line.strip().split())
        data.append(list(items))

# First pass to find frequent items
min_support = 100
item_counts = defaultdict(int)
for basket in data:
    for item in basket:
        item_counts[item] += 1
frequent_items = set(item for item, count in item_counts.items() if count >= min_support)

# Second pass to find frequent itemsets of size 2
itemset_counts = defaultdict(int)
for basket in data:
    for itemset in combinations(basket, 2):
        if set(itemset).issubset(frequent_items):
            itemset_counts[itemset] += 1
frequent_itemsets = set(itemset for itemset, count in itemset_counts.items() if count >= min_support)

# Generate association rules for pairs of items
rules = []
for itemset in frequent_itemsets:
    for item in itemset:
        antecedent = frozenset([item])
        consequent = frozenset(frozenset(itemset) - antecedent)
        confidence = itemset_counts[itemset] / item_counts[item]
        rules.append((antecedent, consequent, confidence))

# Sort rules by confidence and print top 5
top_rules = sorted(rules, key=lambda x: (-x[2], tuple(x[0])))
for antecedent, consequent, confidence in top_rules[:5]:
    print(f"{tuple(antecedent)} -> {tuple(consequent)} : {confidence}")


('GRO85051',) -> ('FRO40251',) : 0.9983525535420099
('DAI93865',) -> ('FRO40251',) : 0.9182692307692307
('DAI43868',) -> ('SNA82528',) : 0.8040540540540541
('FRO92469',) -> ('FRO40251',) : 0.8032979976442874
('ELE92920',) -> ('DAI62779',) : 0.7326649958228906


In [None]:
from itertools import combinations
from collections import defaultdict

# Load dataset and remove duplicates
data = []
with open('/content/browsing.txt', 'r') as f:
    for line in f:
        items = set(line.strip().split())
        data.append(list(items))

# First pass to find frequent items
min_support = 100
item_counts = defaultdict(int)
for basket in data:
    for item in basket:
        item_counts[item] += 1
frequent_items = set(item for item, count in item_counts.items() if count >= min_support)

# Second pass to find frequent itemsets of size 3
itemset_counts = defaultdict(int)
for basket in data:
    for itemset in combinations(basket, 3):
        if set(itemset).issubset(frequent_items):
            itemset_counts[itemset] += 1
frequent_itemsets = set(itemset for itemset, count in itemset_counts.items() if count >= min_support)

# Generate association rules for pairs of items
rules = []
for itemset in frequent_itemsets:
    for item in itemset:
        antecedent = frozenset([item])
        consequent = frozenset(frozenset(itemset) - antecedent)
        confidence = itemset_counts[itemset] / item_counts[item]
        rules.append((antecedent, consequent, confidence))

# Sort rules by confidence and print top 5
top_rules = sorted(rules, key=lambda x: (-x[2], tuple(x[0])))
for antecedent, consequent, confidence in top_rules[:5]:
    print(f"{tuple(antecedent)} -> {tuple(consequent)} : {confidence}")


('GRO85051',) -> ('FRO40251', 'SNA80324') : 0.3871499176276771
('FRO92469',) -> ('FRO40251', 'SNA80324') : 0.3498233215547703
('SNA18336',) -> ('ELE92920', 'DAI62779') : 0.3342736248236953
('GRO85051',) -> ('FRO40251', 'DAI62779') : 0.3039538714991763
('GRO85051',) -> ('DAI75645', 'FRO40251') : 0.3014827018121911
