# データセット取得（Online Retail Dataset）
Online Retail Dataset：英国を拠点とする登録オンライン小売業者の2010年12月1日から2011年9月12日の間に発生するレコードを含むトランザクションデータセット。

以下のようなデータが含まれます。

* InvoiceNo：請求書番号。各トランザクションに一意に割り当てられる6桁の整数。<br>　このコードが文字「c」で始まる場合、キャンセルを示す。
* StockCode：製品コード。各製品に一意に割り当てられる5桁の整数。
* Description：製品名。
* Quantity：トランザクションごとの各製品の数量。
* InvoiceData：請求書の日付と時刻。各トランザクションが生成された日時。
* UnitPrice：単価。
* CustomerID：顧客番号。各顧客に一意に割り当てられる5桁の整数。
* Country：各顧客が居住する国の名前。

In [1]:
# PyCaretチュートリアル用データセット取得
# 詳細は<https://pycaret.org/get-data/>を参照
from pycaret.datasets import get_data

dataset = get_data('france')
dataset.to_csv('./dataset.csv')

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536370,22728,ALARM CLOCK BAKELIKE PINK,24,12/1/2010 8:45,3.75,12583.0,France
1,536370,22727,ALARM CLOCK BAKELIKE RED,24,12/1/2010 8:45,3.75,12583.0,France
2,536370,22726,ALARM CLOCK BAKELIKE GREEN,12,12/1/2010 8:45,3.75,12583.0,France
3,536370,21724,PANDA AND BUNNIES STICKER SHEET,12,12/1/2010 8:45,0.85,12583.0,France
4,536370,21883,STARS GIFT TAPE,24,12/1/2010 8:45,0.65,12583.0,France


In [2]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8557 entries, 0 to 8556
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   InvoiceNo    8557 non-null   object 
 1   StockCode    8557 non-null   object 
 2   Description  8557 non-null   object 
 3   Quantity     8557 non-null   int64  
 4   InvoiceDate  8557 non-null   object 
 5   UnitPrice    8557 non-null   float64
 6   CustomerID   8491 non-null   float64
 7   Country      8557 non-null   object 
dtypes: float64(2), int64(1), object(5)
memory usage: 534.9+ KB


In [3]:
print('データ      ：' + str(dataset.shape) + ' ' + str(dataset.index))

データ      ：(8557, 8) RangeIndex(start=0, stop=8557, step=1)


# PyCaretでのデータセットアップ

In [4]:
# 相関ルールマイニング用インポート
from pycaret.arules import *

In [5]:
# session_idを指定することで欄数シードを固定
# セットアップが完了するとデータの情報や前処理のパイプラインの情報が表示される
# 詳細は<https://pycaret.org/setup/>を参照
exp = setup(data=dataset, transaction_id='InvoiceNo', item_id='Description', session_id=42)

Description,Value
session_id,42.0
# Transactions,461.0
# Items,1565.0
Ignore Items,


In [6]:
print(exp)

(     InvoiceNo StockCode                      Description  Quantity  \
0       536370     22728        ALARM CLOCK BAKELIKE PINK        24   
1       536370     22727        ALARM CLOCK BAKELIKE RED         24   
2       536370     22726       ALARM CLOCK BAKELIKE GREEN        12   
3       536370     21724  PANDA AND BUNNIES STICKER SHEET        12   
4       536370     21883                 STARS GIFT TAPE         24   
...        ...       ...                              ...       ...   
8552    581587     22613      PACK OF 20 SPACEBOY NAPKINS        12   
8553    581587     22899     CHILDREN'S APRON DOLLY GIRL          6   
8554    581587     23254    CHILDRENS CUTLERY DOLLY GIRL          4   
8555    581587     23255  CHILDRENS CUTLERY CIRCUS PARADE         4   
8556    581587     22138    BAKING SET 9 PIECE RETROSPOT          3   

          InvoiceDate  UnitPrice  CustomerID Country  
0      12/1/2010 8:45       3.75     12583.0  France  
1      12/1/2010 8:45       3.75    

# モデル生成
相関ルールマイニングのcreate_model()では必須パラメーターは無く、以下の4つのオプションある。

<b>metric</b>：ルールが重要かどうかを評価するためのメトリック。<br>　デフォルト設定は'confidence'<br>　その他の指定可能なオプションは、'support', 'lift', 'leverage', 'conviction'があります。

<b>threshold</b>：metric候補ルールが対象かどうかを決定するための最小しきい値。デフォルトは0.5

<b>min_support</b>：返されるアイテムセットの最小サポートの値(0.0 ～ 0.1)<br>　transactions_where_item(s)_occur / total_transactionsで計算されデフォルト設定は0.05

<b>round</b>：スコアグリッドの小数点以下の指定桁数を四捨五入

In [7]:
# 引数で作成するモデルを指定
arule_model = create_model()

In [8]:
print(arule_model)

                                           antecedents  \
0                         (JUMBO BAG WOODLAND ANIMALS)   
1    (SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...   
2    (SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...   
3    (SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...   
4    (SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...   
..                                                 ...   
136                (STRAWBERRY LUNCH BOX WITH CUTLERY)   
137                           (LUNCH BAG APPLE DESIGN)   
138                           (LUNCH BAG APPLE DESIGN)   
139          (PLASTERS IN TIN CIRCUS PARADE , POSTAGE)   
140                 (LUNCH BAG RED RETROSPOT, POSTAGE)   

                             consequents  antecedent support  \
0                              (POSTAGE)              0.0651   
1          (SET/6 RED SPOTTY PAPER CUPS)              0.0868   
2        (SET/6 RED SPOTTY PAPER PLATES)              0.0868   
3          (SET/6 RED SPOTTY PAPER CUPS)       

In [9]:
arule_model.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(JUMBO BAG WOODLAND ANIMALS),(POSTAGE),0.0651,0.6746,0.0651,1.0,1.4823,0.0212,inf
1,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",(SET/6 RED SPOTTY PAPER CUPS),0.0868,0.1171,0.0846,0.975,8.3236,0.0744,35.3145
2,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0868,0.1085,0.0846,0.975,8.9895,0.0752,35.6616
3,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",(SET/6 RED SPOTTY PAPER CUPS),0.0716,0.1171,0.0694,0.9697,8.2783,0.061,29.1345
4,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0716,0.1085,0.0694,0.9697,8.9406,0.0617,29.4208


# セットアップ(ignore_items指定)
先の実施例のPOSTAGE(送料)が第1位の相関であることは明白なため、以下の例ではPOSTAGEを無視してセットアップを行う。

In [10]:
exp_arul101 = setup(data=dataset, transaction_id='InvoiceNo', item_id='Description', ignore_items=['POSTAGE'],session_id=42) 

Description,Value
session_id,42
# Transactions,461
# Items,1565
Ignore Items,['POSTAGE']


In [11]:
arule_model2 = create_model()

In [12]:
print(arule_model2)

                                          antecedents  \
0   (SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...   
1   (SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...   
2                     (SET/6 RED SPOTTY PAPER PLATES)   
3                       (CHILDRENS CUTLERY SPACEBOY )   
4                       (SET/6 RED SPOTTY PAPER CUPS)   
5                     (CHILDRENS CUTLERY DOLLY GIRL )   
6   (ALARM CLOCK BAKELIKE PINK, ALARM CLOCK BAKELI...   
7   (ALARM CLOCK BAKELIKE GREEN, ALARM CLOCK BAKEL...   
8                         (ALARM CLOCK BAKELIKE RED )   
9   (SET/6 RED SPOTTY PAPER PLATES, SET/6 RED SPOT...   
10  (ALARM CLOCK BAKELIKE GREEN, ALARM CLOCK BAKEL...   
11                    (SET/6 RED SPOTTY PAPER PLATES)   
12                       (ALARM CLOCK BAKELIKE GREEN)   
13                        (ALARM CLOCK BAKELIKE RED )   
14                    (SET/6 RED SPOTTY PAPER PLATES)   
15              (SET/20 RED RETROSPOT PAPER NAPKINS )   
16              (SET/20 RED RET

In [13]:
arule_model2.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.0868,0.1085,0.0846,0.975,8.9895,0.0752,35.6616
1,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",(SET/6 RED SPOTTY PAPER CUPS),0.0868,0.1171,0.0846,0.975,8.3236,0.0744,35.3145
2,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.1085,0.1171,0.1041,0.96,8.1956,0.0914,22.0716
3,(CHILDRENS CUTLERY SPACEBOY ),(CHILDRENS CUTLERY DOLLY GIRL ),0.0586,0.0629,0.0542,0.9259,14.719,0.0505,12.6508
4,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.1171,0.1085,0.1041,0.8889,8.1956,0.0914,8.0239


# プロットモデル

In [14]:
plot_model(arule_model2)

In [15]:
plot_model(arule_model2, plot='3d')