## 1  Evaluation Metrics (20 %)

### 1.1  


#### Answer:<br>因為信賴度的公式為 P(A ∩ B)/P(A)，但我們並沒有考慮到B獨立發生的情況，若P(B)本身就很大同時也會造成信賴度變大，換句話說，我們無法從信賴度中看出A、B是否真的有關係，或只是因為P(B)很大所造成的巧合。

### 1.2  

#### Answer:<br>提升度與確信度不受影響的原因在於，提升度的公式有考慮到P(B)的獨立機率，當B、A兩者獨立的時候提升度為1，所以當我們看到提升度大於1時，表示A確實能有效提升，確信度的部分，從定義來看確信度是指當A發生而B沒有發生與B本就不發生的情況相比，它是有考慮到B的獨立機率的(從1-P(B))，有就是當P(B)很高時，信賴度會連帶被提高，但確信度的分母(1-confi(A->B))也會同時變小，因為B本來就很常發生，只有當「B 發生的機率因 A 的出現而大幅改變」時，它才會變大。

### 1.3 

#### Answer:<br>1.信賴度為非對稱的，從公式推導可知，confidence(A->B)=P(A ∩ B)/P(A) confidence(B->A)=P(A ∩ B)/P(B)，也就是說除非P(A)=P(B)的特例之外都不會相等。<br>2.提升度為對稱的，從公式推導可知，lift(A->B)=P(A ∩ B)/(P(B)*P(A))，而lift(B->A)的分母因為乘法的交換律所以是相等的<br>3.確信度是非對稱的，從公式推導可知，分子在算conviction(A->B)時為1-P(B)而(B->A)時為1-P(A)，分母用到的信賴度也並非對稱的，因此除非P(A)=P(B)的特例之外都不會相等。

### 1.4  

#### Answer:<br>1.信賴度有達到最大值，因為信賴度的定義本身值域就是0->1。<br>2.提升度有達到最大值，當confidence為1時，Lift=1/P(B)，當P(B)非常小時Lift可以趨近無限大，也就是雖然B不常發生，但A發生時總伴隨B這樣的強規則。<br>3.確信度有達到最大值，從公式的分母來看1-confidence(A->B)為0，所以確信度趨近無限大，這代表當A發生時B一定發生且不可能違反(conviction=無限大)。

## 2  Application in Recommending Items (30 %)

In [1]:
#from apyori import apriori 
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

### 2.1  

In [52]:
transaction=[]
browsing=open('browsing.txt','r')
for line in browsing:
    item=line.strip().split()
    transaction.append(item)

associationRule=apriori(
    transaction,
    min_support=100/len(transaction),
    min_length=1,
    max_length=2)
associationResults=list(associationRule)
rules={}
for rule in associationResults[647:]:
    max_confidence=0
    best_items=None
    for ordered_statistic in rule.ordered_statistics[1:3]:
        base=set(ordered_statistic.items_base)
        add=set(ordered_statistic.items_add)

        confidence=ordered_statistic.confidence

        if(confidence>=max_confidence):
            max_confidence=confidence
            best_items = tuple(sorted(base.union(add)))  
    
    if best_items is not None:
        rules[best_items] = max_confidence

for items, confidence in sorted(rules.items(), key=lambda x: (-x[1], x[0]))[:5]:
    print(f"Items: {items[0]} -> {items[1]}, Confidence: {confidence}")

browsing.close()

Items: DAI93865 -> FRO40251, Confidence: 1.0
Items: FRO40251 -> GRO85051, Confidence: 0.9991762767710051
Items: FRO40251 -> GRO38636, Confidence: 0.9906542056074765
Items: ELE12951 -> FRO40251, Confidence: 0.9905660377358491
Items: DAI88079 -> FRO40251, Confidence: 0.9867256637168142


In [4]:
transaction=[]
browsing=open('browsing.txt','r')
for line in browsing:
    item=line.strip().split()
    transaction.append(item)

encoder = TransactionEncoder()
encoded_transaction = encoder.fit_transform(transaction)

# 将转换后的数据转换为DataFrame
df = pd.DataFrame(encoded_transaction, columns=encoder.columns_)

frequent_itemsets = apriori(df, min_support=100/len(transaction),max_len=2,use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
filtered_itemsets = frequent_itemsets[(frequent_itemsets['length'] == 2)]

# 3. 生成关联规则
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# 5. 提取並排序結果
# 根据信賴度降序排序，信賴度相同的按字典順序排序
sorted_rules = rules.sort_values(by=["confidence", "antecedents"], ascending=[False, True])

# 6. 打印前五個規則及其信賴度
for index, row in sorted_rules.head(5).iterrows():
    antecedents = list(row["antecedents"])
    consequents = list(row["consequents"])
    confidence = row["confidence"]
    print(f"Items: {antecedents} => {consequents}, Confidence: {confidence}")

Items: ['DAI93865'] => ['FRO40251'], Confidence: 1.0
Items: ['GRO85051'] => ['FRO40251'], Confidence: 0.9991762767710051
Items: ['GRO38636'] => ['FRO40251'], Confidence: 0.9906542056074765
Items: ['ELE12951'] => ['FRO40251'], Confidence: 0.9905660377358491
Items: ['DAI88079'] => ['FRO40251'], Confidence: 0.9867256637168142


### 2.2  

In [54]:
transaction=[]
browsing=open('browsing.txt','r')
for line in browsing:
    item=line.strip().split()
    transaction.append(item)

associationRule=apriori(
    transaction,
    min_support=100/len(transaction),
    min_length=2,
    max_length=3)
associationResults=list(associationRule)

rules={}
for rule in associationResults[1981:]:
    max_confidence=0
    best_items=None
    for ordered_statistic in rule.ordered_statistics[4:7]:
        base=set(ordered_statistic.items_base)
        add=set(ordered_statistic.items_add)

        confidence=ordered_statistic.confidence

        if(confidence>=max_confidence):
            max_confidence=confidence
            best_items = tuple(sorted(base.union(add)))  
    
    if best_items is not None:
        rules[best_items] = max_confidence

for items, confidence in sorted(rules.items(), key=lambda x: (-x[1], x[0]))[:5]:
    print(f"Items: {items[0:2]} -> {items[2]}, Confidence: {confidence}")
        

browsing.close()

Items: ('DAI23334', 'DAI62779') -> ELE92920, Confidence: 1.0
Items: ('DAI31081', 'FRO40251') -> GRO85051, Confidence: 1.0
Items: ('DAI55911', 'FRO40251') -> GRO85051, Confidence: 1.0
Items: ('DAI62779', 'DAI88079') -> FRO40251, Confidence: 1.0
Items: ('DAI75645', 'FRO40251') -> GRO85051, Confidence: 1.0


In [5]:
transaction=[]
browsing=open('browsing.txt','r')
for line in browsing:
    item=line.strip().split()
    transaction.append(item)

encoder = TransactionEncoder()
encoded_transaction = encoder.fit_transform(transaction)

# 将转换后的数据转换为DataFrame
df = pd.DataFrame(encoded_transaction, columns=encoder.columns_)

frequent_itemsets = apriori(df, min_support=100/len(transaction),max_len=3,use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
filtered_itemsets = frequent_itemsets[(frequent_itemsets['length'] == 3)]

# 3. 生成关联规则
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# 5. 提取並排序結果
# 根据信賴度降序排序，信賴度相同的按字典順序排序
sorted_rules = rules.sort_values(by=["confidence", "antecedents"], ascending=[False, True])

# 6. 打印前五個規則及其信賴度
for index, row in sorted_rules.head(5).iterrows():
    antecedents = list(row["antecedents"])
    consequents = list(row["consequents"])
    confidence = row["confidence"]
    print(f"Items: {antecedents} => {consequents}, Confidence: {confidence}")

Items: ['ELE26917', 'GRO85051'] => ['FRO40251'], Confidence: 1.0
Items: ['FRO53271', 'GRO85051'] => ['FRO40251'], Confidence: 1.0
Items: ['GRO21487', 'GRO85051'] => ['FRO40251'], Confidence: 1.0
Items: ['GRO38814', 'GRO85051'] => ['FRO40251'], Confidence: 1.0
Items: ['GRO73461', 'GRO85051'] => ['FRO40251'], Confidence: 1.0


## 3  Scalability Comparisons (40 %)

### 3.1  

### 3.2

### 3.3  