# 🛒 **연관 분석 (Association Rule Analysis) 과제**

### <span style="color:black; background-color:#F5F5F5;"> **Q1. 연관 규칙 {우유} → {쿠키}가 도출되었을 때, 다음 용어들이 각각 무엇을 의미하는지 설명하시오.**
#### <span style="color:black; background-color:#F5F5F5;">**① 지지도(support) ② 신뢰도(confidence) ③ 향상도(lift)** </span>

답: 
- 지지도: 전체 항목중 우유와 쿠키가 동시에 포함되는 거래의 비율
- 신뢰도: 우유가 포함된 거래 중 우유와 쿠키가 동시에 포함되는 거래의 비율
- 향상도: 우유를 샀을 때 쿠키도 함께 살 확률이, 전체 고객이 쿠키를 살 확률보다 얼마나 더 높은지를 나타냄

### <span style="color:black; background-color:#F5F5F5;"> **Q2. Apriori 알고리즘이 처리해야 할 후보 항목 수가 기하급수적으로 증가하는 이유와, FP-Growth가 이를 어떻게 해결하는지 설명하시오.**

답: 단계별로 가능한 조합을 생성하고 하나하나 확인해야하기 때문에 데이터가 커지면 처리해야할 후보항목수가 기하급수적으로 증가한다<br>FP-Growth는 Tree를 생성 후 최소 support 이상의 패턴만 추출하는 알고리즘이다. 이를 위해서 데이터셋을 두번만 훑으면 된다는 장점이 있어서 apriori보다 훨씬 빠르다

# <span style="color:black; background-color:#F5F5F5;"> 💸 **연관 분석을 활용한 잉마트(Ing-Mart) 고객 장바구니 패턴 분석 및 비즈니스 전략 수립** </span>

<strong>죽지도 않고 다시 돌아온 잉마트..! 🤣🫥😫🙃<br>
이번 연관 분석 심화 세션의 과제는 잉마트의 고객 장바구니 분석과 전략 수립입니다~ <strong>



<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 지난 심화 세션에서 배운 개념과 실습 내용을 바탕으로 아래 빈칸을 채워주시고, 해당 장바구니 결과를 분석하여 이에 적합한 전략을 제시해주시면 됩니다! <strong>
</span>

넵

## **1️⃣ 데이터 불러오기 및 전처리**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth
import warnings
warnings.simplefilter(action='ignore', category=DeprecationWarning)

In [2]:
# !pip install mlxtend

- 데이터 불러오기
  - 1차 인사이콘 때 활용하셨던 데이터 원본을 활용해주시면 됩니다!
  - 알맞게 경로 지정해주세요~

In [2]:
transaction_data = pd.read_csv('transaction_data.csv')
product = pd.read_csv('product.csv')

In [3]:
transaction_data.head()

Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match)
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.0,0.0,0.0
1,2299,33768622588,1073244,446,456,1,1.0,2030,66,-1.59,0.0,0.0
2,158,30202616809,7025114,343,225,1,1.5,1246,33,-0.5,-1.0,0.0
3,2347,42076926172,1064299,438,695,1,2.5,1430,100,-2.89,0.0,0.0
4,1430,31625201009,1040197,31742,312,1,2.29,1423,45,0.0,0.0,0.0


- 이상치 처리
  - 예시로 제시해드리지만 추가적으로 필요한 부분은 전처리 처리해주세요!

In [4]:
# Quantity 열에 대해 Z-score를 계산한 뒤, 절댓값을 취해 새로운 열 'z_score'에 저장
transaction_data["z_score"] = np.abs(stats.zscore(transaction_data["Quantity"]))

# Z-score가 3을 초과하는 이상치(즉, 평균에서 3표준편차 이상 벗어난 값)를 추출
outliers_zscore = transaction_data[transaction_data["z_score"] > 3]

- 고객 - 상품 행렬 생성

In [7]:
# 고객-상품 pivot_table 생성 (행: 고객, 열: 상품, 값: 총 구매금액)
user_item_matrix = transaction_data.pivot_table(
    index='Household_ID',     # 가구 ID 기준
    columns='Product_ID',     # 상품 ID 기준
    values='Sales_Value',     # 구매 금액
    aggfunc='sum',            # 상품별 총 구매금액
    fill_value=0              # 구매 이력 없으면 0
)
user_item_matrix

Product_ID,25671,26081,26093,26190,26355,26426,26540,26601,26636,26691,...,18273019,18273051,18273115,18273133,18292005,18293142,18293439,18293696,18294080,18316298
Household_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2497,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2498,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# 행렬의 크기 확인 (고객 수 × 상품 수)
user_item_matrix.shape

(2500, 92339)

- 구매가 적은 사용자/상품 필터링

In [10]:
# 필터링 기준 정의
min_product_purchases = 10   # 최소 10명 이상이 구매한 상품만 사용
min_user_purchases = 2       # 최소 2개 이상 상품 구매한 사용자만 사용

# 상품별 구매된 고객 수 계산
product_purchase_count = (user_item_matrix > 0).sum()

# 고객별 구매한 상품 수 계산
user_purchase_count = (user_item_matrix > 0).sum(axis=1)

# 기준을 만족하는 상품과 사용자 필터링
filtered_products = product_purchase_count[product_purchase_count >= min_product_purchases].index
filtered_users = user_purchase_count[user_purchase_count >= min_user_purchases].index

# 필터링된 행렬 추출
filtered_matrix = user_item_matrix.loc[filtered_users, filtered_products]
print(f"\n2. Filtered Matrix Shape: {filtered_matrix.shape}")


2. Filtered Matrix Shape: (2500, 23326)


- 이상치 및 음수 제거한 트랜잭션 데이터 생성

In [19]:
# Z-score 기준으로 이상치 제거 (±3 이상) + 구매 수량이 양수인 데이터만 남김
transaction_data_cleaned = transaction_data[
    (transaction_data['z_score'] < 3) & 
    (transaction_data['z_score'] > -3) & 
    (transaction_data['Quantity'] > 0)
]
print(transaction_data_cleaned.shape)

# 데이터 샘플 확인
transaction_data_cleaned.head()

(2559321, 13)


Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.0,0.0,0.0,0.086202
1,2299,33768622588,1073244,446,456,1,1.0,2030,66,-1.59,0.0,0.0,0.086202
2,158,30202616809,7025114,343,225,1,1.5,1246,33,-0.5,-1.0,0.0,0.086202
3,2347,42076926172,1064299,438,695,1,2.5,1430,100,-2.89,0.0,0.0,0.086202
4,1430,31625201009,1040197,31742,312,1,2.29,1423,45,0.0,0.0,0.0,0.086202


- 정제된 데이터로 다시 사용자-상품 행렬 생성 및 필터링

In [21]:
# 더 정확한 연관 규칙 도출을 위해 이상치 제거 후 재생성
# 정제된 데이터를 기반으로 사용자-상품 매트릭스 다시 생성
user_item_matrix = transaction_data_cleaned.pivot_table(
    index='Household_ID',     # 가구 ID 기준
    columns='Product_ID',     # 상품 ID 기준
    values='Sales_Value',     # 구매 금액
    aggfunc='sum',            # 상품별 총 구매금액
    fill_value=0              # 구매 이력 없으면 0
)

# 필터링 기준 재사용
min_product_purchases = 10  
min_user_purchases = 2     

# 상품/사용자별 구매 횟수 계산
product_purchase_count = (user_item_matrix > 0).sum()
user_purchase_count = (user_item_matrix > 0).sum(axis=1)

# 조건에 맞는 상품과 사용자 필터
filtered_products = product_purchase_count[product_purchase_count >= min_product_purchases].index
filtered_users = user_purchase_count[user_purchase_count >= min_user_purchases].index

# 최종 필터링된 행렬 생성
filtered_matrix = user_item_matrix.loc[filtered_users, filtered_products]
filtered_matrix

Product_ID,27658,34873,43020,43871,59666,138619,197681,201704,215923,244960,...,18005913,18005929,18022252,18055205,18055329,18105264,18106286,18119016,18147612,18203921
Household_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.19,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2497,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2498,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- 상품 정보 조인하여 제품 타입 단위로 분석 준비

In [22]:
# Product_ID을 기준으로 데이터 product와 inner join하여 Product_type 정보 추가
merged_data = pd.merge(transaction_data_cleaned, product, on='Product_ID', how='inner')
merged_data

Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Manufacturer,Brand,Category,Subcategory,Product_type,Curr_Size_of_Product
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,ADULT ANALGESICS,
1,1388,30253393814,1065887,375,228,1,9.99,1748,33,0.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,ADULT ANALGESICS,
2,1268,32269557669,1065887,31742,362,1,9.99,2014,52,0.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,ADULT ANALGESICS,
3,868,33944722714,1065887,323,467,1,7.99,1829,67,-2.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,ADULT ANALGESICS,
4,2374,40727257930,1065887,367,602,1,9.99,1350,87,0.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,ADULT ANALGESICS,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2559316,2224,30579241929,928870,315,236,1,4.50,1813,34,0.00,0.0,0.0,0.086202,504,National,DRUG GM,MAGAZINE,AUTOMOBILE-MAGAZINE,
2559317,2184,28344932057,9803768,309,114,1,7.99,2119,17,0.00,0.0,0.0,0.086202,5014,National,COSMETICS,MAKEUP AND TREATMENT,FACE MAKE UP AND TREATMENT,
2559318,57,31344122693,6039807,298,293,1,6.04,1845,43,0.00,0.0,0.0,0.086202,5016,National,COSMETICS,MAKEUP AND TREATMENT,REVLON,
2559319,1835,27469178389,1037872,375,41,1,6.99,1830,7,0.00,0.0,0.0,0.086202,2239,National,DRUG GM,SUNTAN,SUNTAN,18 OZ


- 장바구니(Basket_ID)별로 구매한 상품타입 목록 정리

In [23]:
# 각 거래(Basket_ID)마다 구매한 상품 유형(Product_type)의 리스트 생성
transactions = merged_data.groupby('Basket_ID')['Product_type'].unique().reset_index()
transactions

Unnamed: 0,Basket_ID,Product_type
0,26984851472,"[ORGANIC CARROTS, BANANAS, ONIONS SWEET (BULK&..."
1,26984851516,"[HAMBURGER BUNS, TRAY PACK/CHOC CHIP COOKIES, ..."
2,26984896261,"[EGGS - X-LARGE, GRANOLA BARS, LINKS - RAW, SN..."
3,26984905972,"[MAINSTREAM WHITE BREAD, RAMEN NOODLES/RAMEN C..."
4,26984945254,"[INSIDE FROST BULBS, CHEWING GUM, SEASONAL CAN..."
...,...,...
254725,42302712006,"[SFT DRNK 2 LITER BTL CARB INCL, TORTILLA/NACH..."
254726,42302712189,"[PAPER NAPKINS, REFRIG DIPS, BEERALEMALT LIQUO..."
254727,42302712298,"[BUTTER, MUSHROOMS WHITE SLICED PKG, BREAD:ITA..."
254728,42305362497,"[SFT DRNK 2 LITER BTL CARB INCL, SEASONAL MISC..."


- 트랜잭션 리스트로 변환

In [24]:
transaction_list = transactions['Product_type'].tolist()
transaction_list = [list(item) for item in transaction_list]
print(transaction_list[:5])

[['ORGANIC CARROTS', 'BANANAS', 'ONIONS SWEET (BULK&BAG)', 'POTATOES RUSSET (BULK&BAG)', 'CELERY'], ['HAMBURGER BUNS', 'TRAY PACK/CHOC CHIP COOKIES', 'PEANUT BUTTER', 'SPONGES: BATH HOUSEHOLD', 'GRAHAM CRACKERS'], ['EGGS - X-LARGE', 'GRANOLA BARS', 'LINKS - RAW', 'SNACK CRACKERS', 'GRND/PATTY - ROUND'], ['MAINSTREAM WHITE BREAD', 'RAMEN NOODLES/RAMEN CUPS'], ['INSIDE FROST BULBS', 'CHEWING GUM', 'SEASONAL CANDY BOX NON-CHOCOLA']]


- 트랜잭션 통계

In [25]:
transactions['num_products'] = transactions['Product_type'].apply(len)
average_products_per_order = transactions['num_products'].mean()
max_products_per_order = transactions['num_products'].max()
min_products_per_order = transactions['num_products'].min()

print(f"Average number of products per order: {average_products_per_order}")
print(f"Maximum number of products per order: {max_products_per_order}")
print(f"Minimum number of products per order: {min_products_per_order}")

Average number of products per order: 8.520123267773721
Maximum number of products per order: 128
Minimum number of products per order: 1


<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 지금부턴 지난 실습 때 했던 과정의 반복! <strong>
</span>

## **2️⃣ 연관 분석 - TransactionEncoder로 이진 행렬로 변환**

In [26]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transaction_list).transform(transaction_list)  # 학습과 변환을 따로따로!

df_encoded = pd.DataFrame(te_ary, columns=te.columns_).astype(int)
df_encoded.head()

Unnamed: 0,*ATH ACCES:TOWEL BARS/SOAP D,*ATTERIES:CAMERA/FLASH/WATCH,*BOYS/GIRLS MISC TOYS,*MISC. LOBBY ITEMS,*PURSES UMBRELLAS,*SCRAPBOOK,*SPORT NOVELTIES,*SPORTS APPAREL,*SPRING/SUMMER MISC,20 BIKES,...,WRITING INSTRUMENTS,XMAS PLUSH,YARDLEY,YEAST: DRY,YELLOW JACKET,YELLOW SUMMER SQUASH,YNG MEN SCREEN PRINT T-SHIRTS,YOGURT,YOGURT MULTI-PACKS,YOGURT NOT MULTI-PACKS
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## **3️⃣ 연관 분석 - Apriori 알고리즘 활용**

- Apriori 알고리즘으로 빈발 항목 집합 도출 (지지도 0.05% 이상)

In [27]:
from mlxtend.frequent_patterns import apriori

filtered_onehot = df_encoded.loc[:, df_encoded.sum(axis=0) > 20]

# apriori로 frequent_itemsets 추출 (최소지지도는 0.005, use_colnames=True, low_memory=True)
frequent_itemsets = apriori(filtered_onehot, min_support=0.005, use_colnames=True, low_memory=True)
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.011899,(ADULT ANALGESICS)
1,0.03002,(ADULT CEREAL)
2,0.005374,(AIR CARE - CONTINUOUS ACTION)
3,0.008024,(ALKALINE BATTERIES)
4,0.040266,(ALL FAMILY CEREAL)


- `association_rules`로 연관 규칙 도출 및 필터링(신뢰도 40% 이상)

In [28]:
num_itemsets = len(frequent_itemsets)

# confidence 기준으로 연관 규칙(rules)을 추출하세요.
# 조건: min_threshold=0.4, metric="***",num_itemsets = num_itemsets
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4, num_itemsets = num_itemsets)

rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head()

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(ADULT CEREAL),(BANANAS),0.012119,0.403688,3.393439
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.65087
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946
3,(APPLE JUICE & CIDER (OVER 50%),(FLUID MILK WHITE ONLY),0.006874,0.588769,2.446211
4,(APPLES GALA (BULK&BAG)),(BANANAS),0.006996,0.507837,4.268928


In [29]:
print(rules.shape)

(445, 14)


- 연관 분석용 리스트 구조 정리

In [30]:
transaction_list = [list(t) if isinstance(t, (list, np.ndarray)) else [t] for t in transaction_list]
transaction_list = [t.tolist() if isinstance(t, np.ndarray) else list(t) if isinstance(t, list) else [t] for t in transaction_list]

- 불필요한 지표 제거

In [31]:
apriori = rules.drop(columns=[
    "antecedent support", 
    "consequent support", 
    "representativity", 
    "conviction", 
    "zhangs_metric", 
    "jaccard", 
    "certainty", 
    "kulczynski"
])

- 유의미한 규칙 필터링(향상도 1 이상)

In [32]:
apriori = apriori[apriori['lift'] >= 1]

- 결과 확인

In [33]:
apriori

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(ADULT CEREAL),(BANANAS),0.012119,0.403688,3.393439,0.008547
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.650870,0.011928
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946,0.016446
3,(APPLE JUICE & CIDER (OVER 50%),(FLUID MILK WHITE ONLY),0.006874,0.588769,2.446211,0.004064
4,(APPLES GALA (BULK&BAG)),(BANANAS),0.006996,0.507837,4.268928,0.005357
...,...,...,...,...,...,...
440,"(KIDS CEREAL, SHREDDED CHEESE)",(MAINSTREAM WHITE BREAD),0.005076,0.442202,4.200561,0.003868
441,"(KIDS CEREAL, SOFT DRINKS 12/18&15PK CAN CAR)",(MAINSTREAM WHITE BREAD),0.005135,0.454798,4.320211,0.003946
442,"(POTATO CHIPS, SHREDDED CHEESE)",(MAINSTREAM WHITE BREAD),0.005602,0.405974,3.856424,0.004149
443,"(SNACK CAKE - MULTI PACK, SOFT DRINKS 12/18&15...",(MAINSTREAM WHITE BREAD),0.005429,0.478878,4.548949,0.004236


## **[ 참고 ] 연관 분석 - FP-Growth 알고리즘 활용**

- 코드를 돌릴 때 조심해주세요!

In [23]:
filtered_onehot

Unnamed: 0,ABRASIVES,ACNE MEDICATIONS,ACTIVITY,ADDITIVES/FLUIDS,ADHESIVES/CAULK,ADULT ANALGESICS,ADULT CEREAL,ADULT INCONTINENCE BRIEFS,ADULT INCONTINENCE MISC PRODUC,ADULT INCONTINENCE PADS,...,WRAP,WREATHS,WREATHS/TINSEL/GARLAND,WRITING INSTRUMENTS,XMAS PLUSH,YEAST: DRY,YELLOW SUMMER SQUASH,YOGURT,YOGURT MULTI-PACKS,YOGURT NOT MULTI-PACKS
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254725,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254726,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254727,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254728,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🥹 실습 때 다루었던 FP-Growth 알고리즘이 참고 자료가 된 이유는 다음 셀 때문이에요... 돌릴 때 조심해주세요... 30분씩 걸릴 때도 있거든요... <strong>
</span>

<span style="color:black; background-color:#fff5b1; padding:2px 4px; border-radius:4px">
<strong> 🤔 "엥? 근데 FP-Growth 알고리즘이 Apriori 알고리즘보다 계산이 빠르다고 하지 않았나?"<strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 네! 공부를 열심히 하셨군요!? 맞습니다! 이론상 FP-Growth 알고리즘이 Apriori 알고리즘이 계산이 더 빠릅니다!<strong>
</span>

<span style="color:black; background-color:#fff5b1; padding:2px 4px; border-radius:4px">
<strong> 🤔 엥 그럼 왜...? <strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 사용한 라이브러리의 차이입니다! 저희는 mlxtend 라이브러리를 사용했습니다! <strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 mlxtend 라이브러리의 경우 Apriori는 Cython으로 최적화 되어 매우 빠르게 작동하지만, FP-Growth는 순수 Python으로 구현하기 때문에 오히려 느릴 수 있습니다. 이 경우 fpgrowth_py 라이브러리를 활용한다면 더 빠르게 작동할 수 있어요~ <strong>
</span>

- FP-Growth 알고리즘으로 빈발 항목 집합 도출 (지지도 0.05% 이상)

In [24]:
from mlxtend.frequent_patterns import fpgrowth

# FP-Growth 알고리즘을 사용해 frequent_itemsets_fp을 생성하세요.
# 조건: 최소지지도: 0.005, use_colnames=True, 입력 데이터는 boolean 타입(astype(bool))
frequent_itemsets_fp = fpgrowth(filtered_onehot.astype(bool), min_support=0.005, use_colnames=True)
frequent_itemsets_fp.head()

Unnamed: 0,support,itemsets
0,0.118961,(BANANAS)
1,0.038044,(POTATOES RUSSET (BULK&BAG))
2,0.023715,(ONIONS SWEET (BULK&BAG))
3,0.018816,(CELERY)
4,0.040133,(HAMBURGER BUNS)


- `association_rules`로 연관 규칙 도출 및 필터링(신뢰도 40% 이상)

In [25]:
# confidence 기준으로 연관 규칙(rules)을 추출하세요.
# 조건: min_threshold=0.4, metric="***"
rules_fp = association_rules(frequent_itemsets_fp, metric="confidence", min_threshold=0.4)

rules_fp.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(BANANAS),(FLUID MILK WHITE ONLY),0.118961,0.240686,0.061402,0.516154,2.144508,1.0,0.03277,1.569328,0.605754,0.205878,0.362785,0.385633
1,(POTATOES RUSSET (BULK&BAG)),(FLUID MILK WHITE ONLY),0.038044,0.240686,0.019236,0.505624,2.100759,1.0,0.010079,1.535903,0.544704,0.074129,0.348917,0.292773
2,"(BANANAS, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.010788,0.240686,0.006776,0.628093,2.609593,1.0,0.004179,2.041677,0.623525,0.02769,0.510207,0.328123
3,"(POTATOES RUSSET (BULK&BAG), MAINSTREAM WHITE ...",(FLUID MILK WHITE ONLY),0.011463,0.240686,0.007577,0.660959,2.746144,1.0,0.004818,2.239592,0.643226,0.030979,0.55349,0.346219
4,"(SHREDDED CHEESE, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.008519,0.240686,0.005606,0.658065,2.734118,1.0,0.003556,2.220635,0.639701,0.023013,0.549678,0.340678


In [26]:
from IPython.display import display

print("🔥 Frequent Itemsets:")
display(frequent_itemsets_fp)

print("\n🔥 Association Rules:")
display(rules_fp)

🔥 Frequent Itemsets:


Unnamed: 0,support,itemsets
0,0.118961,(BANANAS)
1,0.038044,(POTATOES RUSSET (BULK&BAG))
2,0.023715,(ONIONS SWEET (BULK&BAG))
3,0.018816,(CELERY)
4,0.040133,(HAMBURGER BUNS)
...,...,...
1475,0.005221,"(FLUID MILK WHITE ONLY, PAPER NAPKINS)"
1476,0.007267,"(FLUID MILK WHITE ONLY, STRING CHEESE)"
1477,0.006866,"(FLUID MILK WHITE ONLY, CANDY BARS (MULTI PACK))"
1478,0.006572,"(FLUID MILK WHITE ONLY, ISOTONIC DRINKS SINGLE..."



🔥 Association Rules:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(BANANAS),(FLUID MILK WHITE ONLY),0.118961,0.240686,0.061402,0.516154,2.144508,1.0,0.032770,1.569328,0.605754,0.205878,0.362785,0.385633
1,(POTATOES RUSSET (BULK&BAG)),(FLUID MILK WHITE ONLY),0.038044,0.240686,0.019236,0.505624,2.100759,1.0,0.010079,1.535903,0.544704,0.074129,0.348917,0.292773
2,"(BANANAS, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.010788,0.240686,0.006776,0.628093,2.609593,1.0,0.004179,2.041677,0.623525,0.027690,0.510207,0.328123
3,"(POTATOES RUSSET (BULK&BAG), MAINSTREAM WHITE ...",(FLUID MILK WHITE ONLY),0.011463,0.240686,0.007577,0.660959,2.746144,1.0,0.004818,2.239592,0.643226,0.030979,0.553490,0.346219
4,"(SHREDDED CHEESE, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.008519,0.240686,0.005606,0.658065,2.734118,1.0,0.003556,2.220635,0.639701,0.023013,0.549678,0.340678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
440,(FACIAL TISSUE & PAPER HANDKE),(FLUID MILK WHITE ONLY),0.018141,0.240686,0.008919,0.491668,2.042778,1.0,0.004553,1.493737,0.519902,0.035690,0.330538,0.264363
441,(PAPER NAPKINS),(FLUID MILK WHITE ONLY),0.010717,0.240686,0.005221,0.487179,2.024127,1.0,0.002642,1.480662,0.511441,0.021209,0.324626,0.254436
442,(STRING CHEESE),(FLUID MILK WHITE ONLY),0.014133,0.240686,0.007267,0.514167,2.136253,1.0,0.003865,1.562910,0.539515,0.029353,0.360168,0.272179
443,(ISOTONIC DRINKS SINGLE SERVE),(FLUID MILK WHITE ONLY),0.015762,0.240686,0.006572,0.416936,1.732282,1.0,0.002778,1.302283,0.429497,0.026300,0.232118,0.222120


In [27]:
fp_growth = rules_fp.drop(columns=[
    "antecedent support", 
    "consequent support", 
    "representativity", 
    "conviction", 
    "zhangs_metric", 
    "jaccard", 
    "certainty", 
    "kulczynski"
])

- 유의미한 규칙 필터링(향상도 1 이상)

In [30]:
fp_growth = fp_growth[fp_growth['lift'] >= 1]
fp_growth

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(BANANAS),(FLUID MILK WHITE ONLY),0.061402,0.516154,2.144508,0.032770
1,(POTATOES RUSSET (BULK&BAG)),(FLUID MILK WHITE ONLY),0.019236,0.505624,2.100759,0.010079
2,"(BANANAS, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.006776,0.628093,2.609593,0.004179
3,"(POTATOES RUSSET (BULK&BAG), MAINSTREAM WHITE ...",(FLUID MILK WHITE ONLY),0.007577,0.660959,2.746144,0.004818
4,"(SHREDDED CHEESE, POTATOES RUSSET (BULK&BAG))",(FLUID MILK WHITE ONLY),0.005606,0.658065,2.734118,0.003556
...,...,...,...,...,...,...
440,(FACIAL TISSUE & PAPER HANDKE),(FLUID MILK WHITE ONLY),0.008919,0.491668,2.042778,0.004553
441,(PAPER NAPKINS),(FLUID MILK WHITE ONLY),0.005221,0.487179,2.024127,0.002642
442,(STRING CHEESE),(FLUID MILK WHITE ONLY),0.007267,0.514167,2.136253,0.003865
443,(ISOTONIC DRINKS SINGLE SERVE),(FLUID MILK WHITE ONLY),0.006572,0.416936,1.732282,0.002778


## **4️⃣ 연관 분석 - 결과 해석**

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 결과를 해석하고 전략을 세우는 게 해당 과제의 핵심이니 꼭!!!! 성의있게 깊게 고민한 흔적을 남겨주세요! <strong>
</span>

### **[연관분석] 지지도 0.9% 이상, 신뢰도 55% 이상, 향상도 1 이상 연관 분석**

In [34]:
results = apriori[(apriori['confidence']>0.55)&(apriori['support']>0.009)&(apriori['lift']>1)]
results

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.65087,0.011928
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946,0.016446
48,(CHOCOLATE MILK),(FLUID MILK WHITE ONLY),0.021615,0.654308,2.71851,0.013664
51,(CORN),(FLUID MILK WHITE ONLY),0.013379,0.555682,2.308742,0.007584
57,(DAIRY CASE 100% PURE JUICE - O),(FLUID MILK WHITE ONLY),0.037451,0.588163,2.443691,0.022126
62,(EGGS - LARGE),(FLUID MILK WHITE ONLY),0.029667,0.571547,2.374658,0.017174
64,(EGGS - X-LARGE),(FLUID MILK WHITE ONLY),0.021752,0.583263,2.423334,0.012776
70,(FRUIT BOWL AND CUPS),(FLUID MILK WHITE ONLY),0.01171,0.572773,2.379748,0.00679
94,(KIDS CEREAL),(FLUID MILK WHITE ONLY),0.033,0.654316,2.718544,0.020861
100,(MACARONI & CHEESE DNRS),(FLUID MILK WHITE ONLY),0.0171,0.572706,2.379471,0.009914


- 위 결과를 지지도, 신뢰도, 향상도 값을 바탕으로 해석해주세요! (두 가지 이상)

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 [ 예시 ] 'FLUID MILK WHITE ONLY'(우유)는 다양한 품목과 높은 결합 구매 패턴을 보이며, 30개 이상의 제품과 lift 2 이상으로 강한 연관성을 나타낸다. <br> 특히 시리얼 계열인 'ALL FAMILY CEREAL', 'KIDS CEREAL', 'ADULT CEREAL'은 각각 65% 이상의 신뢰도와 2.6~2.7의 lift를 기록해 눈에 띄는 결합 소비가 확인된다. <br> 전반적으로 우유는 시리얼, 과일, 베이커리, 아침 식사류 제품들과 자주 함께 구매되며, 이는 소비자의 식사 준비 맥락과 밀접하게 연결된 구매 경향을 보여준다. <strong>  
</span>

- 해석 1: MWB과 조합된 상품들('EGGS', 'KIDS CEREAL', 'SHREDDED CHEESE')은 모두 lift 2.6~3.1 수준으로, 우유와 함께 아침 식사 준비 목적으로 구매되는 경향이 뚜렷하다.

- 해석 2:우유는 ‘바나나(BANANAS)’와 함께 구성된 다양한 품목군에서도 지속적으로 높은 연관성을 보이며, ‘EGGS’, ‘CEREAL’, ‘POTATO CHIPS’, ‘YOGURT’, ‘SOFT DRINKS’ 등과의 조합에서도 lift 2.5~2.9, 신뢰도 60% 이상의 지표를 기록했다.
바나나가 건강식·간편식·간식 제품 전반에 걸쳐 우유와 함께 소비될 수 있는 재료임을 보여준다.

## **5️⃣ 연관 분석 - 비즈니스 전략 수립**

- 위 결과해석에 따라 비즈니스 전략을 수립해주세요! (2가지 이상) -> 냅다 GPT만 패서 쓰지 말아주세요 . . .. . 

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 [ 예시 ] 우유와 함께하는 식탁 큐레이션 존 구성 <strong>  
</span>

**🎯 목적**

* 우유와 자주 함께 구매되는 상품을 모아 **직관적인 구매 유도**
* **객단가 상승**, **편리한 쇼핑 경험 제공**

**🛒 구성 품목**

* **시리얼류**: KIDS / ALL FAMILY / ADULT CEREAL
* **베이커리**: 식빵, 비스킷, 토스터 페이스트리
* **과일류**: 바나나, 컵과일
* **유제품/간편식**: 요거트, 달걀, 마카로니 등

**📍 운영 방법**

* 우유 냉장고 인근에 **"우유와 최고의 궁합!"** 테마존 설치
* POP/QR코드로 **추천 식단**이나 **할인 쿠폰** 제공
* 계절별 테마 구성 (예: 여름=냉과일, 겨울=오트밀)

**📈 기대 효과**

* 우유 결합 구매율 상승
* 연관 제품 매출 증가
* ‘고민 없는 조합’으로 **고객 만족도 향상**



## 전략 1: 흰 우유 못먹는 우리 아이를 위하여~

**🎯 목적**

* 일반 가정에서 우유를 구매할 때 아이는 흰 우유를 싫어하고 초코우유를 좋아해서 두 제품을 같이 구매하는 정보를 입수
* **가족단위 매출 증진**, **가족의 평화**

**🛒 구성 품목**

* **시리얼류**: KIDS CEREAL, ALL FAMILY CEREAL
* **베이커리**: MWB
* **과일류**: BANANAS
* **유제품**: CHOCOLATE MILK

**📍 운영 방법**

* 우유 코너에 전용 존 구성: “우리 가족 우유 세트” 테마존 설치
* 아이 눈높이에 맞춘 시각적 구성 (캐릭터 스티커, 키즈 우유 컵 등)
* 초코우유 + 흰 우유 동시 구매 시 즉시 할인 또는 포인트 추가 적립
* 유아용 시리얼과 같이 많이 팔리는 제품인 바나나, MWB에 할인 적용

**📈 기대 효과**

* 우유 결합 구매율 상승
* 가족 단위 고객 충성도 강화
* 브랜드 이미지 강화: “가족을 생각하는 브랜드”

## 전략 2: 저속노화 식단 추천 전략

**🎯 목적**

* 건강에 좋지 않은 음식이 많은 요즘 시대에 저속노화에 관심 있는 2030세대 증가
* 저속노화식단 추천을 통해 유제품 뿐 아니라 샐러드 등 다른 제품과도 연계 매출 증진
* 건강한 사회

**🛒 구성 품목**

* 바나나, 블루베리 등 
* 계란, 무당 요거트 
* 아보카도, 토메이토, 샐러드 등 다른 건강 상품도 결합

**📍 운영 방법**

* "내 몸을 위한 아침 루틴" 테마존 구성
* 식단 예시 & 전문가 추천 루틴 제공 -> 효과 및 영양도 표기
* 시즌별 테마화: 봄/가을 = 건강 회복, 여름 = 가벼운 식단, 겨울 = 면역력 강화

**📈 기대 효과**

* 건강에 관심 높은 고객층의 충성도 향상
* 마트 일대 고객들의 건강한 삶을 통한 지역사회 균형 유지 및 발전
* 아침을 챙겨먹기 귀찮아하는 사람들의 고민거리 해결

# **🤓 기가 막힌 전략을 제시하는 분께는 행운이 찾아옵니다~🍀**