In [11]:
#data set 사이의 연관성을 찾는 과정
import numpy as np
from itertools import combinations, groupby
from collections import Counter

# Sample data
orders = np.array([[1,'apple'], [1,'egg'], [1,'milk'], [2,'egg'], [2,'milk']], dtype=object)
print('orders print\m',orders)
# Generator that yields item pairs, one at a time
def get_item_pairs(order_item):
    
    # For each order, generate a list of items in that order
    for order_id, order_object in groupby(orders, lambda x: x[0]):
    #orders의 [0]번쨰가 같은 것 끼리 item_list에 넣어라
        item_list = [item[1] for item in order_object]
        print('item_list print\n',item_list)
        
        # For each item list, generate item pairs, one at a time
        for item_pair in combinations(item_list, 2):
        #item_list에 들어있는 값을 2개로 combination 시킴
            yield item_pair 
            #vector type 의 instance로 변환
            print('item_pair print',item_pair)
            
# Counter iterates through the item pairs returned by our generator and keeps a tally of their occurrence
Counter(get_item_pairs(orders))
#조합이 모두 몇개가 있는지

orders print\m [[1 'apple']
 [1 'egg']
 [1 'milk']
 [2 'egg']
 [2 'milk']]
item_list print
 ['apple', 'egg', 'milk']
item_pair print ('apple', 'egg')
item_pair print ('apple', 'milk')
item_pair print ('egg', 'milk')
item_list print
 ['egg', 'milk']
item_pair print ('egg', 'milk')


Counter({('apple', 'egg'): 1, ('apple', 'milk'): 1, ('egg', 'milk'): 2})

In [None]:
**아프리오리 알고리즘: 연관성을 찾음**
order 1: apple, egg, milk  
order 2: carrot, milk  
order 3: apple, egg, carrot
order 4: apple, egg
order 5: apple, carrot

5개의 order을 하나씩 분석해서 최소 3개 이상인 물품들만 골라냄
Iteration 1:  Count the number of times each item occurs   
item set      occurrence count    
{apple}              4   
{egg}                3   
{milk}               2   
{carrot}             2   

{milk} and {carrot} are eliminated because they do not meet the minimum occurrence threshold.

그래서 나온 apple과 egg를 하나의 set으로 만들고, 이 쌍이 총 3개가 존재한다는 것을 확인
Iteration 2: Build item sets of size 2 using the remaining items from Iteration 1 
             (ie: apple, egg)  
item set           occurence count  
{apple, egg}             3  

Only {apple, egg} remains and the algorithm stops since there are no more items to add.


아프리오리 알고리즘은 3단계로 이루어져 있음
1) support
- 전체 중 특정 set의 비율
위의 예는 60% (apple,egg / 전체)
2) confidence
-  set의 하나가 set와 연관되어 있을 확률
confidence{A->B} = support{A,B} / support{A}   
confidence{B->A} = support{A,B} / support{B}   
confidence{apple->egg} = support{apple,egg} / support{apple}
                                    = (3/5) / (4/5)
                                    = 0.75 or 75%
confidence{egg->apple} = support{apple,egg} / support{egg}
                                    = (3/5) / (3/5)
                                    = 1 or 100%  
3) lift
lift{A,B} = lift{B,A} = support{A,B} / (support{A} * support{B})   
lift{apple,egg} = lift{egg,apple} = support{apple,egg} / (support{apple} * support{egg})
                  = (3/5) / (4/5 * 3/5) 
                  = 1.25    
 * lift = 1 implies no relationship between A and B. 
   (ie: A and B occur together only by chance)

 * lift > 1 implies that there is a positive relationship between A and B.
   (ie:  A and B occur together more often than random)

 * lift < 1 implies that there is a negative relationship between A and B.
   (ie:  A and B occur together less often than random)

결론 : we conclude that there exists a positive relationship between them.