# 金融科技專業科目能力課程-大數據分析

> 關聯規則：關聯與相關性探勘 <https://bit.ly/3Fn7jvR>

[數聚點](https://www.datainpoint.com/) | 郭耀仁 <https://linktr.ee/yaojenkuo>

## 安裝模組

```bash
!pip install mlxtend
```

## 關聯規則

## 什麼是關聯規則

- 關聯規則（Association rules）屬於機器學習中「非監督式學習（Unsupervised learning）」的一個分支。
- 關聯規則的目標是要辨識出變數之間的關係，藉此進行預測或者決策。

## 什麼是非監督式學習

- 機器學習可以粗略分為兩種模型：監督式（Supervised）與非監督式（Unsupervised）。
- 兩者的最大差異就是監督式模型具備目標陣列（Target array）$y$；但是非監督式模型並不具備目標陣列。

## 簡單比較監督式與非監督式

||監督式學習|非監督式學習|
|:-|:-----|:---------|
|目標|逼近一個能夠讓預測目標陣列和實際目標陣列之間差異最小化的函數。|找出變數之間的特徵、關係。|
|評估|以特定的成本函數評估。|以多元化、解釋性的指標評估。|
|目標陣列類別|連續或者離散。|無。|

## 常見的非監督式學習應用場景

- 聚類（Clustering）。
- 降維（Dimensionality reduction）。
- 關聯規則（Association rules）。

## 常見的關聯規則應用場景

- 購物籃分析（Market basket analysis）。
- 顧客區隔（Customer segmentation）。
- 詐騙偵測（Fraud detection）。
- 社群網路分析（Social network analysis）。
- 推薦系統（Recommendation system）。

## 購物籃分析

- 家喻戶曉的關聯規則應用場景、津津樂道的「啤酒與尿布」故事。
- 暸解顧客通常會「一起」購買哪些產品，進而調整貨架的陳列或促銷，進而制定交叉銷售（Cross-selling）或追加銷售（Up-selling）的策略。

## 顧客區隔

- 瞭解顧客通常會「一起」購買哪些產品，可以描繪出特定幾個樣態的購買行為。
- 基於購買行為就能將相似的顧客群體作為特定的區隔。

## 詐騙偵測

- 使用關聯規則辨識釣魚網址的集中點擊或者偽冒商品的集中購買。
- 偵測短時間內在特定商店的大量消費行為，進而制定詐騙預警保護顧客。

## 社群網路分析

- 使用關聯規則辨識哪些主題、Hashtag 以及內容是具有前後關係與關聯的。
- 基於發表、喜歡或分享的行為就能夠描繪出特定樣態的社群行為。

## 推薦系統

- 基於購物籃分析與顧客區隔，發展出交叉銷售（Cross-selling）或追加銷售（Up-selling）的推薦策略。
- 基於社群網路分析，發展主題、討論與追蹤對象的推薦策略。

## 銷售資料的預處理

## 銷售資料外觀

In [1]:
import pandas as pd

transactions = ["VT", "VTI,VXUS", "VTI,VGK,VPL,VWO", "VTI,VPL,VWO,VGK", "VT", "VTI,VXUS", "VT"]
df = pd.DataFrame()
df["transaction"] = transactions
df = df.reset_index()
df = df.rename({"index": "id"}, axis='columns')
df["id"] = df["id"] + 1
df

Unnamed: 0,id,transaction
0,1,VT
1,2,"VTI,VXUS"
2,3,"VTI,VGK,VPL,VWO"
3,4,"VTI,VPL,VWO,VGK"
4,5,VT
5,6,"VTI,VXUS"
6,7,VT


## 銷售資料的預處理

In [2]:
ser = df["transaction"].map(lambda x: x.split(","))
ser = ser.map(set)
ser

0                    {VT}
1             {VTI, VXUS}
2    {VGK, VWO, VTI, VPL}
3    {VWO, VGK, VTI, VPL}
4                    {VT}
5             {VTI, VXUS}
6                    {VT}
Name: transaction, dtype: object

## 摘要預處理後的銷售資料

In [3]:
ser.value_counts()

transaction
{VT}                    3
{VTI, VXUS}             2
{VGK, VWO, VTI, VPL}    2
Name: count, dtype: int64

In [4]:
ser.value_counts() / ser.count()

transaction
{VT}                    0.428571
{VTI, VXUS}             0.285714
{VGK, VWO, VTI, VPL}    0.285714
Name: count, dtype: float64

## 關聯規則的評估指標

## 常用來評估關聯規則的指標

- Support 支持度。
- Confidence 信賴度。
- Lift 提升度。

## Support 支持度

- 關聯規則中最基礎、重要的評估指標。

$$
\text{Support}(\{X\} \rightarrow \{Y\}) = \frac{\text{Transactions containing both X and Y}}{\text{Total number of transactions}}
$$

In [5]:
transactions_containing_both_vti_vxus = 0
for itemset in ser:
    compare_set = {"VTI", "VXUS"}
    if compare_set.issubset(itemset):
        transactions_containing_both_vti_vxus += 1
support_vti_vxus = transactions_containing_both_vti_vxus / ser.size
print(support_vti_vxus)

0.2857142857142857


In [6]:
def calculate_support(x: set, transaction_series: pd.core.series.Series) -> float:
    total_number_of_transactions = transaction_series.size
    transactions_containing_x = 0
    for itemset in transaction_series:
        if x.issubset(itemset):
            transactions_containing_x += 1
    support = transactions_containing_x / total_number_of_transactions
    return support

print(calculate_support({"VTI", "VXUS"}, ser))

0.2857142857142857


## Confidence 信賴度

- 用來評估兩個商品之間的關聯強度，在 $X$ 出現的情況下，$Y$ 出現的機率。
- 信賴度愈高，表示前一個商品與後一個商品互為前項（Antecedents）/後項（Consequents）商品。

$$
\text{Confidence}(\{X\} \rightarrow \{Y\}) = \frac{\text{Transactions containing both X and Y}}{\text{Transactions containing X}}
$$

In [7]:
transactions_containing_both_vti_vxus = 0
transactions_containing_vti = 0
for itemset in ser:
    compare_set = {"VTI", "VXUS"}
    antecedents_set = {"VTI"}
    if antecedents_set.issubset(itemset):
        transactions_containing_vti += 1
    if compare_set.issubset(itemset):
        transactions_containing_both_vti_vxus += 1
confidence_vti_vxus = transactions_containing_both_vti_vxus / transactions_containing_vti
print(confidence_vti_vxus)

0.5


In [8]:
def calculate_confidence(antecedents_set: set, x: set, transaction_series: pd.core.series.Series) -> float:
    transactions_containing_x = 0
    transactions_containing_antecedents_set = 0
    for itemset in transaction_series:
        if antecedents_set.issubset(itemset):
            transactions_containing_antecedents_set += 1
        if x.issubset(itemset):
            transactions_containing_x += 1
    confidence = transactions_containing_x / transactions_containing_antecedents_set
    return confidence

print(calculate_confidence({"VTI"}, {"VTI", "VXUS"}, ser))

0.5


## Lift 提升度

- 在考慮兩個商品單獨出現的頻率之下評估兩個商品之間的關聯強度。
- 計算商品組合銷售與單獨銷售的比例，大於 1 的提升度表示強關聯，小於 1 的提升度表示弱關聯，等於 1 的提升度表示無關聯。

\begin{aligned}
\text{Lift}(\{X\} \rightarrow \{Y\}) &= \frac{\text{Transactions containing both X and Y} \; / \; \text{Transactions containing X}}{\text{Fraction of transactions containing Y}} \\
&= \frac{\text{Confidence}(\{X\} \rightarrow \{Y\})}{\text{Fraction of transactions containing Y}}
\end{aligned}

In [9]:
transactions_containing_vxus = 0
for itemset in ser:
    consequents_set = {"VXUS"}
    if consequents_set.issubset(itemset):
        transactions_containing_vxus += 1
fraction_of_transactions_containing_vxus = transactions_containing_vxus / ser.size
lift_vti_vxus = confidence_vti_vxus / fraction_of_transactions_containing_vxus
print(lift_vti_vxus)

1.75


In [10]:
def calculate_lift(antecedents_set: set, consequents_set: set, x: set, transaction_series: pd.core.series.Series) -> float:
    transactions_containing_consequents_set = 0
    for itemset in transaction_series:
        if consequents_set.issubset(itemset):
            transactions_containing_consequents_set += 1
    total_number_of_transactions = ser.size
    fraction_of_transactions_containing_consequents_set = transactions_containing_consequents_set / total_number_of_transactions
    confidence = calculate_confidence({"VTI"}, {"VTI", "VXUS"}, transaction_series)
    lift = confidence / fraction_of_transactions_containing_consequents_set
    return lift

print(calculate_lift({"VTI"}, {"VXUS"}, {"VTI", "VXUS"}, ser))

1.75


## 關聯規則的演算法

## 為什麼需要關聯規則的演算法

- 關聯規則的可能組數很多
    - 單一前後項的規則：{VTI} -> {VXUS}
    - 多個前項的規則：{VTI, VGK, VWO} -> {VPL}
    - 多個後項的規則：{VTI} -> {VGK, VWO, VPL}
- 多數的規則組數是無用的、必須被忽略的。

In [11]:
from itertools import combinations

etfs = []
for itemset in ser:
    for item in itemset:
        etfs.append(item)
etfs = set(etfs)
print(etfs)

{'VGK', 'VWO', 'VPL', 'VXUS', 'VT', 'VTI'}


In [12]:
print(list(combinations(etfs, 1)))
print(list(combinations(etfs, 2)))
print(list(combinations(etfs, 3)))
print(list(combinations(etfs, 4)))
print(list(combinations(etfs, 5)))
print(list(combinations(etfs, 6)))

[('VGK',), ('VWO',), ('VPL',), ('VXUS',), ('VT',), ('VTI',)]
[('VGK', 'VWO'), ('VGK', 'VPL'), ('VGK', 'VXUS'), ('VGK', 'VT'), ('VGK', 'VTI'), ('VWO', 'VPL'), ('VWO', 'VXUS'), ('VWO', 'VT'), ('VWO', 'VTI'), ('VPL', 'VXUS'), ('VPL', 'VT'), ('VPL', 'VTI'), ('VXUS', 'VT'), ('VXUS', 'VTI'), ('VT', 'VTI')]
[('VGK', 'VWO', 'VPL'), ('VGK', 'VWO', 'VXUS'), ('VGK', 'VWO', 'VT'), ('VGK', 'VWO', 'VTI'), ('VGK', 'VPL', 'VXUS'), ('VGK', 'VPL', 'VT'), ('VGK', 'VPL', 'VTI'), ('VGK', 'VXUS', 'VT'), ('VGK', 'VXUS', 'VTI'), ('VGK', 'VT', 'VTI'), ('VWO', 'VPL', 'VXUS'), ('VWO', 'VPL', 'VT'), ('VWO', 'VPL', 'VTI'), ('VWO', 'VXUS', 'VT'), ('VWO', 'VXUS', 'VTI'), ('VWO', 'VT', 'VTI'), ('VPL', 'VXUS', 'VT'), ('VPL', 'VXUS', 'VTI'), ('VPL', 'VT', 'VTI'), ('VXUS', 'VT', 'VTI')]
[('VGK', 'VWO', 'VPL', 'VXUS'), ('VGK', 'VWO', 'VPL', 'VT'), ('VGK', 'VWO', 'VPL', 'VTI'), ('VGK', 'VWO', 'VXUS', 'VT'), ('VGK', 'VWO', 'VXUS', 'VTI'), ('VGK', 'VWO', 'VT', 'VTI'), ('VGK', 'VPL', 'VXUS', 'VT'), ('VGK', 'VPL', 'VXUS', 'VT

## 常用來探索關聯規則的演算法

- Apriori 演算法。
- FP-Growth(Frequent Pattern Growth) 演算法。
- ECLAT(Equivalence Class Clustering and bottom-up Lattice Traversal) 演算法。

## Apriori 演算法

- 家喻戶曉的關聯規則演算法。
- 首先辨識在銷售資料中頻繁出現的商品，再依據頻繁出現的商品建立關聯規則。
- 採用 Bottom-up 的演算法，從單一商品推算至商品組合。

## FP-Growth(Frequent Pattern Growth) 演算法

- 採用樹狀結構演算法，建立 FP-tree，再透過 FP-tree 生成關聯規則。
- 可以視為一種演算速度更快、更適合在大型資料集運用的 Apriori 演算法。

## ECLAT(Equivalence Class Clustering and bottom-up Lattice Traversal) 演算法

- Apriori 演算法的變化型態，改採 Top-down 的演算法，而非 Bottom-up
- 建立格狀結構生成關聯規則。
- 同樣可以視為一種演算速度更快、更適合在大型資料集運用的 Apriori 演算法。

## 手動走一遍 Apriori 演算法

1. 計算單一商品的支持度。
2. 設定最小支持度門檻。
3. 將未滿最小支持度門檻的單一商品剔除後，再計算兩項商品組合的支持度。
4. 將未滿最小支持度門檻的兩項商品組合剔除後，再計算三項商品組合的支持度。
5. 以此類推，直到 n 項商品組合的支持度未滿最小支持度門檻為止。
6. 計算 n 項商品各種組合的信賴度。

## 計算單一商品的支持度

In [13]:
supports_n1 = {etf: calculate_support({etf}, ser) for etf in etfs}
print(supports_n1)

{'VGK': 0.2857142857142857, 'VWO': 0.2857142857142857, 'VPL': 0.2857142857142857, 'VXUS': 0.2857142857142857, 'VT': 0.42857142857142855, 'VTI': 0.5714285714285714}


## 設定最小支持度門檻

In [14]:
min_support_threshold = 0.2

## 將未滿最小支持度門檻的單一商品剔除後，再計算兩項商品組合的支持度

In [15]:
removed_supports_n1 = {k: v for k, v in supports_n1.items() if v > min_support_threshold}
print(removed_supports_n1)
combinations_etfs_n2 = []
for combi in combinations(etfs, 2):
    for k in removed_supports_n1.keys():
        set_combi = set(combi)
        set_k = {k}
        if set_k.issubset(set_combi):
            combinations_etfs_n2.append(combi)
print(combinations_etfs_n2)

{'VGK': 0.2857142857142857, 'VWO': 0.2857142857142857, 'VPL': 0.2857142857142857, 'VXUS': 0.2857142857142857, 'VT': 0.42857142857142855, 'VTI': 0.5714285714285714}
[('VGK', 'VWO'), ('VGK', 'VWO'), ('VGK', 'VPL'), ('VGK', 'VPL'), ('VGK', 'VXUS'), ('VGK', 'VXUS'), ('VGK', 'VT'), ('VGK', 'VT'), ('VGK', 'VTI'), ('VGK', 'VTI'), ('VWO', 'VPL'), ('VWO', 'VPL'), ('VWO', 'VXUS'), ('VWO', 'VXUS'), ('VWO', 'VT'), ('VWO', 'VT'), ('VWO', 'VTI'), ('VWO', 'VTI'), ('VPL', 'VXUS'), ('VPL', 'VXUS'), ('VPL', 'VT'), ('VPL', 'VT'), ('VPL', 'VTI'), ('VPL', 'VTI'), ('VXUS', 'VT'), ('VXUS', 'VT'), ('VXUS', 'VTI'), ('VXUS', 'VTI'), ('VT', 'VTI'), ('VT', 'VTI')]


In [16]:
supports_n2 = {combi: calculate_support(set(combi), ser) for combi in combinations_etfs_n2}
print(supports_n2)

{('VGK', 'VWO'): 0.2857142857142857, ('VGK', 'VPL'): 0.2857142857142857, ('VGK', 'VXUS'): 0.0, ('VGK', 'VT'): 0.0, ('VGK', 'VTI'): 0.2857142857142857, ('VWO', 'VPL'): 0.2857142857142857, ('VWO', 'VXUS'): 0.0, ('VWO', 'VT'): 0.0, ('VWO', 'VTI'): 0.2857142857142857, ('VPL', 'VXUS'): 0.0, ('VPL', 'VT'): 0.0, ('VPL', 'VTI'): 0.2857142857142857, ('VXUS', 'VT'): 0.0, ('VXUS', 'VTI'): 0.2857142857142857, ('VT', 'VTI'): 0.0}


## 將未滿最小支持度門檻的兩項商品組合剔除後，再計算三項商品組合的支持度

In [17]:
removed_supports_n2 = {k: v for k, v in supports_n2.items() if v > min_support_threshold}
print(removed_supports_n2)
combinations_etfs_n3 = []
for combi in combinations(etfs, 3):
    for k in removed_supports_n2.keys():
        set_combi = set(combi)
        set_k = set(k)
        if set_k.issubset(set_combi):
            combinations_etfs_n3.append(combi)
print(combinations_etfs_n3)

{('VGK', 'VWO'): 0.2857142857142857, ('VGK', 'VPL'): 0.2857142857142857, ('VGK', 'VTI'): 0.2857142857142857, ('VWO', 'VPL'): 0.2857142857142857, ('VWO', 'VTI'): 0.2857142857142857, ('VPL', 'VTI'): 0.2857142857142857, ('VXUS', 'VTI'): 0.2857142857142857}
[('VGK', 'VWO', 'VPL'), ('VGK', 'VWO', 'VPL'), ('VGK', 'VWO', 'VPL'), ('VGK', 'VWO', 'VXUS'), ('VGK', 'VWO', 'VT'), ('VGK', 'VWO', 'VTI'), ('VGK', 'VWO', 'VTI'), ('VGK', 'VWO', 'VTI'), ('VGK', 'VPL', 'VXUS'), ('VGK', 'VPL', 'VT'), ('VGK', 'VPL', 'VTI'), ('VGK', 'VPL', 'VTI'), ('VGK', 'VPL', 'VTI'), ('VGK', 'VXUS', 'VTI'), ('VGK', 'VXUS', 'VTI'), ('VGK', 'VT', 'VTI'), ('VWO', 'VPL', 'VXUS'), ('VWO', 'VPL', 'VT'), ('VWO', 'VPL', 'VTI'), ('VWO', 'VPL', 'VTI'), ('VWO', 'VPL', 'VTI'), ('VWO', 'VXUS', 'VTI'), ('VWO', 'VXUS', 'VTI'), ('VWO', 'VT', 'VTI'), ('VPL', 'VXUS', 'VTI'), ('VPL', 'VXUS', 'VTI'), ('VPL', 'VT', 'VTI'), ('VXUS', 'VT', 'VTI')]


In [18]:
supports_n3 = {combi: calculate_support(set(combi), ser) for combi in combinations_etfs_n3}
print(supports_n3)

{('VGK', 'VWO', 'VPL'): 0.2857142857142857, ('VGK', 'VWO', 'VXUS'): 0.0, ('VGK', 'VWO', 'VT'): 0.0, ('VGK', 'VWO', 'VTI'): 0.2857142857142857, ('VGK', 'VPL', 'VXUS'): 0.0, ('VGK', 'VPL', 'VT'): 0.0, ('VGK', 'VPL', 'VTI'): 0.2857142857142857, ('VGK', 'VXUS', 'VTI'): 0.0, ('VGK', 'VT', 'VTI'): 0.0, ('VWO', 'VPL', 'VXUS'): 0.0, ('VWO', 'VPL', 'VT'): 0.0, ('VWO', 'VPL', 'VTI'): 0.2857142857142857, ('VWO', 'VXUS', 'VTI'): 0.0, ('VWO', 'VT', 'VTI'): 0.0, ('VPL', 'VXUS', 'VTI'): 0.0, ('VPL', 'VT', 'VTI'): 0.0, ('VXUS', 'VT', 'VTI'): 0.0}


## 將未滿最小支持度門檻的三項商品組合剔除後，再計算四項商品組合的支持度

In [19]:
removed_supports_n3 = {k: v for k, v in supports_n3.items() if v > min_support_threshold}
print(removed_supports_n3)
combinations_etfs_n4 = []
for combi in combinations(etfs, 4):
    for k in removed_supports_n3.keys():
        set_combi = set(combi)
        set_k = set(k)
        if set_k.issubset(set_combi):
            combinations_etfs_n4.append(combi)
print(combinations_etfs_n4)

{('VGK', 'VWO', 'VPL'): 0.2857142857142857, ('VGK', 'VWO', 'VTI'): 0.2857142857142857, ('VGK', 'VPL', 'VTI'): 0.2857142857142857, ('VWO', 'VPL', 'VTI'): 0.2857142857142857}
[('VGK', 'VWO', 'VPL', 'VXUS'), ('VGK', 'VWO', 'VPL', 'VT'), ('VGK', 'VWO', 'VPL', 'VTI'), ('VGK', 'VWO', 'VPL', 'VTI'), ('VGK', 'VWO', 'VPL', 'VTI'), ('VGK', 'VWO', 'VPL', 'VTI'), ('VGK', 'VWO', 'VXUS', 'VTI'), ('VGK', 'VWO', 'VT', 'VTI'), ('VGK', 'VPL', 'VXUS', 'VTI'), ('VGK', 'VPL', 'VT', 'VTI'), ('VWO', 'VPL', 'VXUS', 'VTI'), ('VWO', 'VPL', 'VT', 'VTI')]


In [20]:
supports_n4 = {combi: calculate_support(set(combi), ser) for combi in combinations_etfs_n4}
print(supports_n4)

{('VGK', 'VWO', 'VPL', 'VXUS'): 0.0, ('VGK', 'VWO', 'VPL', 'VT'): 0.0, ('VGK', 'VWO', 'VPL', 'VTI'): 0.2857142857142857, ('VGK', 'VWO', 'VXUS', 'VTI'): 0.0, ('VGK', 'VWO', 'VT', 'VTI'): 0.0, ('VGK', 'VPL', 'VXUS', 'VTI'): 0.0, ('VGK', 'VPL', 'VT', 'VTI'): 0.0, ('VWO', 'VPL', 'VXUS', 'VTI'): 0.0, ('VWO', 'VPL', 'VT', 'VTI'): 0.0}


## 將未滿最小支持度門檻的三項商品組合剔除後，再計算四項商品組合的支持度

In [21]:
removed_supports_n4 = {k: v for k, v in supports_n4.items() if v > min_support_threshold}
combinations_etfs_n5 = []
for combi in combinations(etfs, 5):
    for k in removed_supports_n4.keys():
        set_combi = set(combi)
        set_k = set(k)
        if set_k.issubset(set_combi):
            combinations_etfs_n5.append(combi)
print(combinations_etfs_n5)

[('VGK', 'VWO', 'VPL', 'VXUS', 'VTI'), ('VGK', 'VWO', 'VPL', 'VT', 'VTI')]


## 直到 n 項商品組合的支持度未滿最小支持度門檻為止

In [22]:
supports_n5 = {combi: calculate_support(set(combi), ser) for combi in combinations_etfs_n5}
print(supports_n5)

{('VGK', 'VWO', 'VPL', 'VXUS', 'VTI'): 0.0, ('VGK', 'VWO', 'VPL', 'VT', 'VTI'): 0.0}


## 計算 n-1 項商品各種組合的信賴度

In [23]:
final_etf_set = set(list(removed_supports_n4.keys())[0])
print(final_etf_set)

{'VPL', 'VGK', 'VTI', 'VWO'}


In [24]:
print(calculate_confidence({"VTI"}, {"VGK"}, ser))
print(calculate_confidence({"VTI"}, {"VXUS"}, ser))

0.5
0.5


In [25]:
print(calculate_confidence({"VTI", "VGK"}, {"VWO", "VPL"}, ser))

1.0


In [26]:
print(calculate_confidence({"VTI", "VGK", "VWO"}, {"VPL"}, ser))

1.0


## 關聯規則模組

## 常見的 Python 關聯規則模組

- `apyori`: <https://github.com/ymoch/apyori>
- `mlxtend`: <https://rasbt.github.io/mlxtend>
- `pycaret`: <https://github.com/pycaret/pycaret>

## 載入交易資料

In [27]:
df = pd.read_csv("https://raw.githubusercontent.com/datainpoint/classroom-tabf-2023/main/etf_retail.csv")
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID
0,536370,22728,VGK,24,12/1/2010 8:45,3.75,12583
1,536370,22727,VPL,24,12/1/2010 8:45,3.75,12583
2,536370,22726,VWO,12,12/1/2010 8:45,3.75,12583
3,536370,22629,VT,24,12/1/2010 8:45,1.95,12583
4,537065,22727,VPL,4,12/5/2010 11:57,3.75,12567


## 預處理交易資料

In [28]:
def return_true(x):
    return True
pivoted_df = pd.pivot_table(df, values='Quantity', index=['InvoiceNo'],
                            columns=['Description'], fill_value=False, aggfunc=return_true)
pivoted_df

Description,VGK,VPL,VT,VTI,VWO,VXUS
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
536370,True,True,True,False,True,False
537065,True,True,False,False,True,False
538008,False,False,True,False,False,False
540455,True,True,False,False,True,False
540642,True,True,True,False,True,False
...,...,...,...,...,...,...
580756,False,False,False,True,False,True
581001,True,False,False,False,True,False
581587,True,True,True,True,True,True
C537893,False,False,True,False,False,False


## 使用 Apriori 演算法

In [29]:
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(pivoted_df, min_support=0.1, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.350877,(VGK)
1,0.324561,(VPL)
2,0.535088,(VT)
3,0.236842,(VTI)
4,0.342105,(VWO)
5,0.254386,(VXUS)
6,0.254386,"(VGK, VPL)"
7,0.114035,"(VGK, VT)"
8,0.254386,"(VGK, VWO)"
9,0.114035,"(VT, VPL)"


## 建立關聯規則模型

In [30]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.51)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(VGK),(VPL),0.350877,0.324561,0.254386,0.725,2.233784,0.140505,2.45614,0.850885
1,(VPL),(VGK),0.324561,0.350877,0.254386,0.783784,2.233784,0.140505,3.002193,0.817734
2,(VGK),(VWO),0.350877,0.342105,0.254386,0.725,2.119231,0.134349,2.392344,0.813607
3,(VWO),(VGK),0.342105,0.350877,0.254386,0.74359,2.119231,0.134349,2.531579,0.802759
4,(VWO),(VPL),0.342105,0.324561,0.27193,0.794872,2.449064,0.160896,3.292763,0.899355
5,(VPL),(VWO),0.324561,0.342105,0.27193,0.837838,2.449064,0.160896,4.057018,0.875995
6,(VTI),(VXUS),0.236842,0.254386,0.219298,0.925926,3.639847,0.159049,10.065789,0.950345
7,(VXUS),(VTI),0.254386,0.236842,0.219298,0.862069,3.639847,0.159049,5.532895,0.972706
8,"(VGK, VT)",(VPL),0.114035,0.324561,0.105263,0.923077,2.844075,0.068252,8.780702,0.731848
9,"(VPL, VT)",(VGK),0.114035,0.350877,0.105263,0.923077,2.630769,0.065251,8.438596,0.69967
