# 🛒 **연관 분석 (Association Rule Analysis) 과제**

### <span style="color:black; background-color:#F5F5F5;"> **Q1. 연관 규칙 {우유} → {쿠키}가 도출되었을 때, 다음 용어들이 각각 무엇을 의미하는지 설명하시오.**
#### <span style="color:black; background-color:#F5F5F5;">**① 지지도(support) ② 신뢰도(confidence) ③ 향상도(lift)** </span>

답: <br>
1. **지지도(support)**: 전체 거래 항목 중, 우유 품목과 쿠키 품목을 동시에 포함하는 거래의 비율 (엄밀하게는, 전체 거래 항목 중 우유가 구매되는 거래의 비율)
2. **신뢰도(confidence)**: 우유 품목을 포함하는 거래 수 중, 우유 품목과 쿠키 품목을 동시에 포함하고 있는 거래의 비율
3. **향상도(fit)**: 전체 거래 항목 중, 쿠키 품목을 포함하는 거래의 비율 대비, 우유 품목을 포함하는 거래 중 우유 품목과 쿠키 품목을 동시에 포함하고 있는 거래 비율의 비율

### <span style="color:black; background-color:#F5F5F5;"> **Q2. Apriori 알고리즘이 처리해야 할 후보 항목 수가 기하급수적으로 증가하는 이유와, FP-Growth가 이를 어떻게 해결하는지 설명하시오.**

답:
Apriori 알고리즘은 최소 지지도 임곗값 이상을 갖는 집합인 빈발 품목 집합을 추출함으로써 모든 품목 집합에 대해서가 아니라 일부 품목 집합에 대해서만 계산하여 불필요한 계산량을 줄이는 알고리즘이다. 그럼에도 불구하고 상품 수가 많아질수록 어쩔 수 없이 처리해야 할 후보 품목 수가 기하급수적으로 증가하게 된다. 후보 항목 집합을 검증할 때 모든 가능한 조합을 다 생성해서 검증하기 때문이다.

FP-Growth(Frequent Pattern Growth) 알고리즘은 빈발 품목 집합을 효율적으로 찾아내는 알고리즘이다. (1) 모든 거래에서 각 품목마다 지지도를 계산하고, (2) 최소 지지도 임곗값 이상의 아이템만 선택하여 빈도 순으로 정렬하고 FP-tree를 생성한다. 이렇게 데이터를 2번만 스캔하면 되기 때문에 Apriori 알고리즘에 비해 효율적인 계산이 가능하다.

# <span style="color:black; background-color:#F5F5F5;"> 💸 **연관 분석을 활용한 잉마트(Ing-Mart) 고객 장바구니 패턴 분석 및 비즈니스 전략 수립** </span>

<strong>죽지도 않고 다시 돌아온 잉마트..! 🤣🫥😫🙃<br>
이번 연관 분석 심화 세션의 과제는 잉마트의 고객 장바구니 분석과 전략 수립입니다~ <strong>



<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 지난 심화 세션에서 배운 개념과 실습 내용을 바탕으로 아래 빈칸을 채워주시고, 해당 장바구니 결과를 분석하여 이에 적합한 전략을 제시해주시면 됩니다! <strong>
</span>

## **1️⃣ 데이터 불러오기 및 전처리**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth
import warnings
warnings.simplefilter(action='ignore', category=DeprecationWarning)

In [2]:
# !pip install mlxtend

- 데이터 불러오기
  - 1차 인사이콘 때 활용하셨던 데이터 원본을 활용해주시면 됩니다!
  - 알맞게 경로 지정해주세요~

In [3]:
transaction_data = pd.read_csv('transaction_data.csv')
product = pd.read_csv('product.csv')

In [4]:
transaction_data.head()

Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match)
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.0,0.0,0.0
1,2299,33768622588,1073244,446,456,1,1.0,2030,66,-1.59,0.0,0.0
2,158,30202616809,7025114,343,225,1,1.5,1246,33,-0.5,-1.0,0.0
3,2347,42076926172,1064299,438,695,1,2.5,1430,100,-2.89,0.0,0.0
4,1430,31625201009,1040197,31742,312,1,2.29,1423,45,0.0,0.0,0.0


### 📌 `product` EDA 및 전처리

In [5]:
product.isnull().sum()

Product_ID              0
Manufacturer            0
Brand                   0
Category                0
Subcategory             0
Product_type            0
Curr_Size_of_Product    0
dtype: int64

In [6]:
product.nunique()

Product_ID              92353
Manufacturer             6476
Brand                       2
Category                   44
Subcategory               308
Product_type             2383
Curr_Size_of_Product     4345
dtype: int64

In [7]:
print("Product_ID에 있는 ' ' 레코드 개수:", len(product[product['Product_ID']==' ']))
print("Manufacturer에 있는 ' ' 레코드 개수:", len(product[product['Manufacturer']==' ']))
print("Brand에 있는 ' ' 레코드 개수:", len(product[product['Brand']==' ']))
print("Category에 있는 ' ' 레코드 개수:", len(product[product['Category']==' ']))
print("Subcategory에 있는 ' ' 레코드 개수:", len(product[product['Subcategory']==' ']))
print("Product_type에 있는 ' ' 레코드 개수:", len(product[product['Product_type']==' ']))
print("Curr_Size_of_Product에 있는 ' ' 레코드 개수:", len(product[product['Curr_Size_of_Product']==' ']))



Product_ID에 있는 ' ' 레코드 개수: 0
Manufacturer에 있는 ' ' 레코드 개수: 0
Brand에 있는 ' ' 레코드 개수: 0
Category에 있는 ' ' 레코드 개수: 15
Subcategory에 있는 ' ' 레코드 개수: 15
Product_type에 있는 ' ' 레코드 개수: 15
Curr_Size_of_Product에 있는 ' ' 레코드 개수: 30607


💡 `product.csv`에 'Category', 'Subcategory', 'Product_type', 'Curr_Size_of_Product'에 빈 문자열이 NaN으로 존재하지 않고 ' '로 존재하기 때문에, ' '를 모두 NaN으로 바꾸겠습니다.

In [8]:
product.replace(' ', np.nan, inplace=True)

In [9]:
product.nunique()

Product_ID              92353
Manufacturer             6476
Brand                       2
Category                   43
Subcategory               307
Product_type             2382
Curr_Size_of_Product     4344
dtype: int64

In [10]:
product[product['Category'].isnull()]

Unnamed: 0,Product_ID,Manufacturer,Brand,Category,Subcategory,Product_type,Curr_Size_of_Product
7143,5993055,1,National,,,,
10228,5978657,1,National,,,,
20626,5978656,1,National,,,,
23971,5126107,1,National,,,,
30653,5977100,1,National,,,,
37714,5978659,1,National,,,,
53934,6693056,1,National,,,,
54150,5126106,1,National,,,,
60239,5993054,1,National,,,,
79177,5126087,1,National,,,,


Category가 NaN인 품목을 보니, <br>
Category, Subcategory, Product_type 심지어 Curr_Size_of_Product까지 모두 결측치입니다.
대체 무슨 품목일까요... <br>
1번 제조사에 문제가 좀 있어 보이네요. (찍어본 결과, 어차피 Quantity<0 이라서 지워지는 값임!)

product type이 너무 많아서 유사한 단어는 좀 묶고 싶어요.. 근데 tf-idf 기반 텍스트 클러스터링 (실루엣 계수 보고..) 어쩌고저쩌고 해봤는데 좀 애매한 것 같아서 안 하겠습니다..

In [90]:
product['Product_type'].value_counts()

Product_type
CARDS EVERYDAY                1005
BEERALEMALT LIQUORS            833
SPICES & SEASONINGS            629
GIFT-WRAP EVERYDAY             547
POTATO CHIPS                   531
                              ... 
SPLINTS/DOSAGE/ACCESSORIES       1
GROWER/HANGING PLANTERS          1
SMOKE DETECTORS                  1
FTD SERVICE CHARGES              1
LADIES CASUAL/BOOT SOCK          1
Name: count, Length: 2382, dtype: int64

product type의 텍스트를 좀 전처리해보겠습니다

In [133]:
def clean_product_type(text):
    if pd.isnull(text):
        return text
    
    # 1. (괄호 포함 이후 문자열 제거)
    text = re.sub(r'\(.*$', '', text)

    # 2. - 조건부 제거: 앞이 더 길 때만 자르기
    if '-' in text:
        parts = text.split('-', 1)
        before, after = parts[0].strip(), parts[1].strip()
        if len(before) >= len(after):
            text = before  # 앞쪽만 남김

    # 1. 숫자-특수문자-숫자 패턴 제거 (12/18 → 12 18 → 아예 없애기)
    text = re.sub(r'\d+[^\w\s]*\d+', '', text)

    # 2. 남은 숫자 전체 제거 (10, 500 이런 것들)
    text = re.sub(r'\d+', '', text)

    # 4. 의미 없는 단위 제거
    meaningless_units = r'\b(pk|ct|can|btl|oz|ml|l|lb|g|pcs|count|each|pack|case|car|pkts|pkg|large|x|small|everyday|)\b'
    text = re.sub(meaningless_units, '', text, flags=re.IGNORECASE)

    # 5. 특수문자 처리 (슬래시는 살리고 나머지 제거)
    text = re.sub(r'[^A-Za-z0-9\s/]', ' ', text)
    


    # 6. 다중 공백 정리
    text = re.sub(r'\s+', ' ', text).strip()

    return text


# 적용
product['Cleaned_Product_type'] = product['Product_type'].apply(clean_product_type)

# 결과 확인
product[['Product_type', 'Cleaned_Product_type']].head(30)


Unnamed: 0,Product_type,Cleaned_Product_type
0,SOFT DRINKS 12/18&15PK CAN CAR,SOFT DRINKS
1,NON-CARB WATER FLVR - DRNK/MNR,NON CARB WATER FLVR DRNK/MNR
2,SPICES & SEASONINGS,SPICES SEASONINGS
3,SANDWICHES&HANDHELDS,SANDWICHES HANDHELDS
4,CORN,CORN
5,MINUET,MINUET
6,TURKEY BREAST BONELESS,TURKEY BREAST BONELESS
7,CANDY BOXED CHOCOLATES,CANDY BOXED CHOCOLATES
8,AUSTRALIAN/NZ WINES,AUSTRALIAN/NZ WINES
9,LIGHTING ACCESSORIES,LIGHTING ACCESSORIES


In [134]:
product['Cleaned_Product_type'].nunique()

2257

단위를 정규화시켜서 Quantity와 함께 본다면 좋을 것 같아요

In [138]:
print(product['Curr_Size_of_Product'].unique())

['12 PK' '24 OZ' '1.58 OZ' ... 'P   8.5 OZ' '2CT/2.8CM' '940593 4OZ']


### 상품 단위 환산 기준표 (OZ 중심 기준화 - feat gpt) 
| 원 단위     | 단위 의미                 | 기준 단위 | 환산 비율      | 비고                  |
| -------- | --------------------- | ----- | ---------- | ------------------- |
| **OZ**   | 무게 (온스, ounce)        | OZ    | 1          | 기준 단위               |
| **LB**   | 무게 (파운드, pound)       | OZ    | 16         | 1 LB = 16 OZ        |
| **G**    | 무게 (그램, gram)         | OZ    | 0.03527396 | 1 G = 0.03527396 OZ |
| **L**    | 부피 (리터, liter)        | OZ    | 33.814     | 1 L = 33.814 OZ     |
| **PT**   | 부피 (파인트, pint)        | OZ    | 16         | 1 PT = 16 OZ        |
| **QT**   | 부피 (쿼트, quart)        | OZ    | 32         | 1 QT = 32 OZ        |
| **GAL**  | 부피 (갤런, gallon)       | OZ    | 128        | 1 GAL = 128 OZ      |
| **PK**   | 개수 (pack, 묶음 단위)      | PK    | -          | 개수 개념, 환산 없음        |
| **CT**   | 개수 (count, 개별 수량)     | CT    | -          | 개수 개념, 환산 없음        |
| **IN**   | 길이 (인치, inch)         | IN    | 1          | 길이 단위, 무게와 무관       |
| **CM**   | 길이 (센티미터, centimeter) | IN    | 0.393701   | 1 CM = 0.393701 IN  |
| **FT**   | 길이 (피트, feet)         | IN    | 12         | 1 FT = 12 IN        |
| **SIZE** | 규격 (Small, Medium 등)  | SIZE  | -          | 규격/옵션 값             |


In [139]:
import pandas as pd
import re

# 단위 표준화 (normalize_unit)
def normalize_unit(unit):
    if not isinstance(unit, str):
        return 'UNKNOWN'

    unit = unit.upper()

    synonyms = {
        'OZ': ['OUNCE', 'OZS', 'OUNCES', 'OZASST'],
        'LB': ['LBS', 'POUND'],
        'L': ['LTR', 'LITER', 'LITRE', 'LITERS', 'MLTR', 'LTRS'],
        'PK': ['PKT', 'PKTS', 'PKG', 'PKGS', 'PCK'],
        'CT': ['COUNT', 'PIECE', 'PIECES', 'EACH', 'EA'],
        'GAL': ['GALON', 'GALLON'],
        'PT': ['PINT', 'PTS'],
        'QT': ['QUART'],
        'G': ['GM', 'GRAM', 'GRAMS', 'GMT'],
        'IN': ['INCH', 'INCHES', 'INX'],
        'FT': ['FEET', 'FOOT', 'FTX'],
        'SIZE': ['SMALL', 'MEDIUM', 'LARGE', 'XLG', 'XL', 'LRG', 'MED', 'SML', 'ONESIZE'],
    }

    meaningless = ['HELLO', 'GOING', 'OVER', 'ASSTD', 'ABCD', 'SPD', 'SCAN', 'ABD']
    if unit in meaningless:
        return 'UNKNOWN'

    for norm, variants in synonyms.items():
        if unit in variants or unit == norm:
            return norm

    return unit

# 환산 비율 테이블
conversion_rates = {
    ('CM', 'IN'): 0.393701,
    ('LB', 'OZ'): 16,
    ('G', 'OZ'): 0.03527396,
    ('L', 'OZ'): 33.814,
    ('QT', 'OZ'): 32,
    ('PT', 'OZ'): 16,
    ('GAL', 'OZ'): 128
}

# 환산 함수
def convert_unit(value, unit, target_units=('OZ', 'IN')):
    for target_unit in target_units:
        key = (unit, target_unit)
        if key in conversion_rates:
            return value * conversion_rates[key], target_unit
    return value, unit

# 복합단위 포함 파싱 함수 (리스트형 반환)
def parse_all_units(size_str):
    if pd.isnull(size_str):
        return [], []

    parts = re.split(r'[/,]', size_str)
    values = []
    units = []

    for part in parts:
        match = re.search(r'(\d+\.?\d*)\s*([A-Za-z]+)', part.strip())
        if match:
            value = float(match.group(1))
            unit = normalize_unit(match.group(2))
            value_converted, unit_converted = convert_unit(value, unit)
            values.append(round(value_converted, 4))
            units.append(unit_converted)

    return values, units

# 적용
product[['Size_Values', 'Size_Units']] = product['Curr_Size_of_Product'].apply(
    lambda x: pd.Series(parse_all_units(x))
)

# 결과 출력
product

Unnamed: 0,Product_ID,Manufacturer,Brand,Category,Subcategory,Product_type,Curr_Size_of_Product,Size_Values,Size_Units,Cleaned_Product_type
0,6534074,2224,National,GROCERY,SOFT DRINKS,SOFT DRINKS 12/18&15PK CAN CAR,12 PK,[12.0],[PK],SOFT DRINKS
1,1016702,1551,National,NUTRITION,WATER,NON-CARB WATER FLVR - DRNK/MNR,24 OZ,[24.0],[OZ],NON CARB WATER FLVR DRNK/MNR
2,8155397,1759,National,GROCERY,SPICES & EXTRACTS,SPICES & SEASONINGS,1.58 OZ,[1.58],[OZ],SPICES SEASONINGS
3,5791664,1071,National,GROCERY,FROZEN PIZZA,SANDWICHES&HANDHELDS,9 OZ,[9.0],[OZ],SANDWICHES HANDHELDS
4,891772,165,National,GROCERY,VEGETABLES - SHELF STABLE,CORN,15 OZ,[15.0],[OZ],CORN
...,...,...,...,...,...,...,...,...,...,...
92348,1335125,69,Private,GROCERY,BLEACH,LIQUID BLEACH,128 OZ,[128.0],[OZ],LIQUID BLEACH
92349,9527153,1364,National,COSMETICS,MAKEUP AND TREATMENT,COVERGIRL,,[],[],COVERGIRL
92350,1765274,317,National,GROCERY,SALD DRSNG/SNDWCH SPRD,SEMI-SOLID SALAD DRESSING MAY,18 OZ,[18.0],[OZ],SEMI SOLID SALAD DRESSING MAY
92351,924889,1761,National,GROCERY,DRY BN/VEG/POTATO/RICE,RICE SIDE DISH MIXES DRY,5.9 OZ,[5.9],[OZ],RICE SIDE DISH MIXES DRY


### 📌 transaction 데이터 EDA

In [98]:
transaction_data.groupby(['Household_ID', 'Day', 'Store_ID']).value_counts()

Household_ID  Day  Store_ID  Basket_ID    Product_ID  Quantity  Sales_Value  Trans_time  Week_no  Disc(retail)  Disc(coupon)  Disc(coupon_match)  z_score 
1             51   436       27601281299  825123      1         3.99         1456        8         0.00          0.0          0.0                 0.086202    1
                                          831447      1         2.99         1456        8         0.00          0.0          0.0                 0.086202    1
                                          840361      1         1.09         1456        8        -0.30          0.0          0.0                 0.086202    1
                                          845307      1         3.71         1456        8        -0.62          0.0          0.0                 0.086202    1
                                          852014      1         2.79         1456        8        -1.20          0.0          0.0                 0.086202    1
                                             

In [99]:
# 1. Household_ID, Day, Store_ID, Product_ID별 Basket_ID 유니크 개수 세기
basket_counts = transaction_data.groupby(['Household_ID', 'Day', 'Store_ID', 'Product_ID'])['Basket_ID'].nunique()

# 2. Basket_ID가 2개 이상인 그룹 필터링
basket_counts_filtered = basket_counts[basket_counts >= 2].reset_index()

# 3. 원본 데이터에서 해당 그룹만 추출
filtered_data = transaction_data.merge(
    basket_counts_filtered, 
    on=['Household_ID', 'Day', 'Store_ID', 'Product_ID'], 
    how='inner'
)

# 4. 그룹별로 묶인 느낌으로 정렬 (Household_ID > Day > Store_ID > Product_ID 순서로)
filtered_data_sorted = filtered_data.sort_values(
    by=['Household_ID', 'Day', 'Store_ID', 'Product_ID']
)

# 5. 결과 출력
filtered_data_sorted


Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
1306,1,33409625841,5978648,436,436,0,0.00,1227,63,0.00,0.0,0.0,0.087069,2
2556,1,33409625858,5978648,436,436,0,0.00,1232,63,0.00,0.0,0.0,0.087069,2
4840,5,29019192573,1073042,374,154,1,0.89,1855,23,-0.60,0.0,0.0,0.086202,2
12369,5,29019192576,1073042,374,154,1,0.89,1855,23,-0.60,0.0,0.0,0.086202,2
2224,5,33656876934,874972,374,449,1,7.73,1759,65,-1.29,0.0,0.0,0.086202,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13465,2500,33768100995,962950,447,455,1,1.00,1446,66,-1.59,0.0,0.0,0.086202,2
3531,2500,40510346200,875118,447,585,1,3.49,1352,84,0.00,0.0,0.0,0.086202,2
11776,2500,40510346202,875118,447,585,1,3.49,1353,84,0.00,0.0,0.0,0.086202,2
7890,2500,41453101036,6979474,447,649,1,1.00,1315,93,-0.19,0.0,0.0,0.086202,2


같은 품목에 대해 같은 가구가 같은 날짜, 비슷한 시간에 거래한 데이터가 꽤 많이 보입니다. 잘못된 데이터는 아닐까? 의심이 마구마구 듭니다.

In [100]:
filtered_data_sorted[filtered_data_sorted['Disc(retail)']>0]

Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
2585,44,33769749631,12385844,412,458,0,0.0,2120,66,1.387779e-17,0.0,0.0,0.087069,2
3198,716,34837205710,874025,296,506,0,0.0,1354,73,2.220446e-16,0.0,0.0,0.087069,2
3918,2413,40727407101,980823,374,602,0,0.0,1735,87,1.110223e-16,0.0,0.0,0.087069,2
7937,2491,32478901200,13945244,389,370,0,4.440892e-16,1315,54,1.110223e-16,0.0,0.0,0.087069,2


같은 거래인 것으로 추정이 되는데 거래가 2번 이상 찍힌 경우를 필터링 했을 때,
Disc(retail)이 이상치인 (양수인) 레코드를 살펴볼 수 있었습니다.

In [101]:
filtered_data_sorted[filtered_data_sorted['Household_ID']==44]

Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
2585,44,33769749631,12385844,412,458,0,0.0,2120,66,1.387779e-17,0.0,0.0,0.087069,2
13835,44,33769749635,12385844,412,458,1,1.25,2120,66,-0.04,0.0,0.0,0.086202,2


In [102]:
filtered_data_sorted[filtered_data_sorted['Household_ID']==716]

Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
3198,716,34837205710,874025,296,506,0,0.0,1354,73,2.220446e-16,0.0,0.0,0.087069,2
10859,716,34837205714,874025,296,506,12,3.0,1354,73,-8.88,0.0,0.0,0.076665,2


In [103]:
filtered_data_sorted[filtered_data_sorted['Household_ID']==2413]

Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
3918,2413,40727407101,980823,374,602,0,0.0,1735,87,1.110223e-16,0.0,0.0,0.087069,2
5845,2413,40727407230,980823,374,602,1,3.34,1754,87,0.0,0.0,0.0,0.086202,2


In [104]:
filtered_data_sorted[filtered_data_sorted['Household_ID']==2491]

Unnamed: 0,Household_ID,Basket_ID_x,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Basket_ID_y
1211,2491,31770085381,1108168,389,321,1,2.09,1024,47,0.0,0.0,0.0,0.086202,2
5951,2491,31770085387,1108168,389,321,1,2.09,1025,47,0.0,0.0,0.0,0.086202,2
7937,2491,32478901200,13945244,389,370,0,4.440892e-16,1315,54,1.110223e-16,0.0,0.0,0.087069,2
8194,2491,32478901204,13945244,389,370,3,5.0,1316,54,-0.97,0.0,0.0,0.084468,2
969,2491,32630435758,866573,389,380,1,1.0,1141,55,-0.05,0.0,0.0,0.086202,2
2641,2491,32630435762,866573,389,380,1,1.0,1143,55,-0.05,0.0,0.0,0.086202,2
215,2491,32630435758,5978648,389,380,0,0.0,1141,55,0.0,0.0,0.0,0.087069,2
13856,2491,32630435762,5978648,389,380,0,0.0,1143,55,0.0,0.0,0.0,0.087069,2
7153,2491,33217993924,928036,389,424,0,0.0,1925,61,0.0,0.0,0.0,0.087069,2
12558,2491,33217993926,928036,389,424,1,1.0,1925,61,-0.69,0.0,0.0,0.086202,2


4개 가구의 거래를 보니까, 확실히 Disc(retail)이 양수인 것은 이상치인 것 같습니다. <br>
일반화해서 ... Disc(retail)이 양수로 나타나는 경우는 버리는 것으로 처리하겠습니다. (2조 참고했어요ㅎㅎ 다른 Disc에는 양수가 없다고 하네요)

- 이상치 처리
  - 예시로 제시해드리지만 추가적으로 필요한 부분은 전처리 처리해주세요!

In [105]:
# Quantity 열에 대해 Z-score를 계산한 뒤, 절댓값을 취해 새로운 열 'z_score'에 저장
transaction_data["z_score"] = np.abs(stats.zscore(transaction_data["Quantity"]))

# Z-score가 3을 초과하는 이상치(즉, 평균에서 3표준편차 이상 벗어난 값)를 추출
outliers_zscore = transaction_data[transaction_data["z_score"] > 3]

- 고객 - 상품 행렬 생성

In [106]:
# 고객-상품 pivot_table 생성 (행: 고객, 열: 상품, 값: 총 구매금액)
user_item_matrix = transaction_data.pivot_table(
    index='Household_ID',     # 가구 ID 기준
    columns='Product_ID',     # 상품 ID 기준
    values='Sales_Value',     # 구매 금액
    aggfunc='sum',            # 상품별 총 구매금액
    fill_value=0              # 구매 이력 없으면 0
)
user_item_matrix

Product_ID,25671,26081,26093,26190,26355,26426,26540,26601,26636,26691,...,18273019,18273051,18273115,18273133,18292005,18293142,18293439,18293696,18294080,18316298
Household_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2497,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2498,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [107]:
# 행렬의 크기 확인 (고객 수 × 상품 수)
user_item_matrix.shape

(2500, 92339)

- 구매가 적은 사용자/상품 필터링

In [108]:
# 필터링 기준 정의
min_product_purchases = 10   # 최소 10명 이상이 구매한 상품만 사용
min_user_purchases = 2       # 최소 2개 이상 상품 구매한 사용자만 사용

# 상품별 구매된 고객 수 계산
product_purchase_count = (user_item_matrix > 0).sum()

# 고객별 구매한 상품 수 계산
user_purchase_count = (user_item_matrix > 0).sum(axis=1)

# 기준을 만족하는 상품과 사용자 필터링
filtered_products = product_purchase_count[product_purchase_count >= min_product_purchases].index
filtered_users = user_purchase_count[user_purchase_count >= min_user_purchases].index

# 필터링된 행렬 추출
filtered_matrix = user_item_matrix.loc[filtered_users, filtered_products]
print(f"\n2. Filtered Matrix Shape: {filtered_matrix.shape}")


2. Filtered Matrix Shape: (2500, 23326)


- 이상치 및 음수 제거한 트랜잭션 데이터 생성

In [109]:
# Z-score 기준으로 이상치 제거 (±3 이상) + 구매 수량이 양수인 데이터만 남김
# + retail 할인율이 양수가 아닌 데이터만 남김
transaction_data_cleaned = transaction_data[
    (transaction_data['z_score'] < 3) & 
    (transaction_data['z_score'] > -3) & 
    (transaction_data['Quantity'] > 0) &
    (transaction_data['Disc(retail)'] <= 0)
]
print(transaction_data_cleaned.shape)

# 데이터 샘플 확인
transaction_data_cleaned.head()

(2559315, 13)


Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.0,0.0,0.0,0.086202
1,2299,33768622588,1073244,446,456,1,1.0,2030,66,-1.59,0.0,0.0,0.086202
2,158,30202616809,7025114,343,225,1,1.5,1246,33,-0.5,-1.0,0.0,0.086202
3,2347,42076926172,1064299,438,695,1,2.5,1430,100,-2.89,0.0,0.0,0.086202
4,1430,31625201009,1040197,31742,312,1,2.29,1423,45,0.0,0.0,0.0,0.086202


- 정제된 데이터로 다시 사용자-상품 행렬 생성 및 필터링

In [110]:
# 더 정확한 연관 규칙 도출을 위해 이상치 제거 후 재생성
# 정제된 데이터를 기반으로 사용자-상품 매트릭스 다시 생성
user_item_matrix = transaction_data_cleaned.pivot_table(
    index='Household_ID',     # 가구 ID 기준
    columns='Product_ID',     # 상품 ID 기준
    values='Sales_Value',     # 구매 금액
    aggfunc='sum',            # 상품별 총 구매금액
    fill_value=0              # 구매 이력 없으면 0
)

# 필터링 기준 재사용
min_product_purchases = 10  
min_user_purchases = 2     

# 상품/사용자별 구매 횟수 계산
product_purchase_count = (user_item_matrix > 0).sum()
user_purchase_count = (user_item_matrix > 0).sum(axis=1)

# 조건에 맞는 상품과 사용자 필터
filtered_products = product_purchase_count[product_purchase_count >= min_product_purchases].index
filtered_users = user_purchase_count[user_purchase_count >= min_user_purchases].index

# 최종 필터링된 행렬 생성
filtered_matrix = user_item_matrix.loc[filtered_users, filtered_products]
filtered_matrix

Product_ID,27658,34873,43020,43871,59666,138619,197681,201704,215923,244960,...,18005913,18005929,18022252,18055205,18055329,18105264,18106286,18119016,18147612,18203921
Household_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.19,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2497,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2498,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- 상품 정보 조인하여 제품 타입 단위로 분석 준비

In [140]:
product_cleaned = product.drop(columns=['Product_type', 'Curr_Size_of_Product'])

In [141]:
# Product_ID을 기준으로 데이터 product와 inner join하여 Product_type 정보 추가
merged_data = pd.merge(transaction_data_cleaned, product_cleaned, on='Product_ID', how='inner')
merged_data

Unnamed: 0,Household_ID,Basket_ID,Product_ID,Store_ID,Day,Quantity,Sales_Value,Trans_time,Week_no,Disc(retail),Disc(coupon),Disc(coupon_match),z_score,Manufacturer,Brand,Category,Subcategory,Size_Values,Size_Units,Cleaned_Product_type
0,1803,30780785930,1065887,338,252,1,8.99,1419,37,-1.00,0.0,0.0,0.086202,69,Private,DRUG GM,ANALGESICS,[],[],ADULT ANALGESICS
1,2299,33768622588,1073244,446,456,1,1.00,2030,66,-1.59,0.0,0.0,0.086202,2110,National,GROCERY,ICE CREAM/MILK/SHERBTS,[],[],PREMIUM PINTS
2,158,30202616809,7025114,343,225,1,1.50,1246,33,-0.50,-1.0,0.0,0.086202,1838,National,GROCERY,BAKED BREAD/BUNS/ROLLS,[20.0],[OZ],MAINSTREAM WHEAT/MULTIGRAIN BR
3,2347,42076926172,1064299,438,695,1,2.50,1430,100,-2.89,0.0,0.0,0.086202,2193,National,GROCERY,ICE CREAM/MILK/SHERBTS,[48.0],[OZ],PREMIUM
4,1430,31625201009,1040197,31742,312,1,2.29,1423,45,0.00,0.0,0.0,0.086202,869,National,GROCERY,TEAS,[20.0],[CT],TEA BAGS HERBAL FLAVORED
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2559310,225,33655617762,6463742,31642,447,1,2.29,1942,65,-1.70,0.0,0.0,0.086202,69,Private,GROCERY,CHEESE,[16.0],[OZ],SHREDDED CHEESE
2559311,266,40097402794,968269,367,551,1,1.99,2255,79,0.00,0.0,0.0,0.086202,1091,National,GROCERY,HOUSEHOLD CLEANG NEEDS,[24.0],[OZ],TOILET BOWL MANUAL
2559312,2364,30443976140,2041688,673,230,2,1.50,1757,34,-0.62,0.0,0.0,0.085335,69,Private,GROCERY,MEAT - SHELF STABLE,[15.0],[OZ],CHILI CANNED
2559313,1048,31145396229,1082185,436,280,1,0.74,1445,41,0.00,0.0,0.0,0.086202,2,National,PRODUCE,TROPICAL FRUIT,[640.0],[OZ],BANANAS


- 장바구니(Basket_ID)별로 구매한 상품타입 목록 정리

In [144]:
# 각 거래(Basket_ID)마다 구매한 상품 유형(Subcategory)의 리스트 생성
transactions_subcategory = merged_data.groupby('Basket_ID')['Subcategory'].unique().reset_index()
transactions_subcategory

Unnamed: 0,Basket_ID,Subcategory
0,26984851472,"[POTATOES, VEGETABLES - ALL OTHERS, ONIONS, OR..."
1,26984851516,"[BAKED BREAD/BUNS/ROLLS, COOKIES/CONES, PNT BT..."
2,26984896261,"[CONVENIENT BRKFST/WHLSM SNACKS, EGGS, BREAKFA..."
3,26984905972,"[SOUP, BAKED BREAD/BUNS/ROLLS]"
4,26984945254,"[CANDY - PACKAGED, ELECTRICAL SUPPPLIES, CANDY..."
...,...,...
254725,42302712006,"[BAG SNACKS, SOFT DRINKS, CHEESE, HISPANIC]"
254726,42302712189,"[PAPER HOUSEWARES, MILK BY-PRODUCTS, FACIAL TI..."
254727,42302712298,"[BUTTER, HAIR CARE PRODUCTS, MUSHROOMS, MISC. ..."
254728,42305362497,"[CANDY - PACKAGED, SOFT DRINKS, FROZEN PIZZA, ..."


In [145]:
# 각 거래(Basket_ID)마다 구매한 상품 유형(Product_type)의 리스트 생성
transactions = merged_data.groupby('Basket_ID')['Cleaned_Product_type'].unique().reset_index()
transactions

Unnamed: 0,Basket_ID,Cleaned_Product_type
0,26984851472,"[POTATOES RUSSET, CELERY, ONIONS SWEET, ORGANI..."
1,26984851516,"[HAMBURGER BUNS, TRAY /CHOC CHIP COOKIES, PEAN..."
2,26984896261,"[GRANOLA BARS, EGGS, LINKS, SNACK CRACKERS, GR..."
3,26984905972,"[RAMEN NOODLES/RAMEN CUPS, MAINSTREAM WHITE BR..."
4,26984945254,"[SEASONAL CANDY BOX NON, INSIDE FROST BULBS, C..."
...,...,...
254725,42302712006,"[TORTILLA/NACHO CHIPS, SFT DRNK LITER CARB INC..."
254726,42302712189,"[PLSTC CTLRYTBLCLTHSTTHPKSST, REFRIG DIPS, PAP..."
254727,42302712298,"[BUTTER, HAIR CONDITIONERS AND RINSES, SHAMPOO..."
254728,42305362497,"[SEASONAL MISCELLANEOUS, SFT DRNK LITER CARB I..."


- 트랜잭션 리스트로 변환

In [147]:
transaction_sub_list = transactions_subcategory['Subcategory'].tolist()
transaction_sub_list = [list(item) for item in transaction_sub_list]
print(transaction_sub_list[:5])

[['POTATOES', 'VEGETABLES - ALL OTHERS', 'ONIONS', 'ORGANICS FRUIT & VEGETABLES', 'TROPICAL FRUIT'], ['BAKED BREAD/BUNS/ROLLS', 'COOKIES/CONES', 'PNT BTR/JELLY/JAMS', 'BROOMS AND MOPS'], ['CONVENIENT BRKFST/WHLSM SNACKS', 'EGGS', 'BREAKFAST SAUSAGE/SANDWICHES', 'CRACKERS/MISC BKD FD', 'BEEF'], ['SOUP', 'BAKED BREAD/BUNS/ROLLS'], ['CANDY - PACKAGED', 'ELECTRICAL SUPPPLIES', 'CANDY - CHECKLANE']]


In [148]:
transaction_list = transactions['Cleaned_Product_type'].tolist()
transaction_list = [list(item) for item in transaction_list]
print(transaction_list[:5])

[['POTATOES RUSSET', 'CELERY', 'ONIONS SWEET', 'ORGANIC CARROTS', 'BANANAS'], ['HAMBURGER BUNS', 'TRAY /CHOC CHIP COOKIES', 'PEANUT BUTTER', 'SPONGES BATH HOUSEHOLD', 'GRAHAM CRACKERS'], ['GRANOLA BARS', 'EGGS', 'LINKS', 'SNACK CRACKERS', 'GRND/PATTY'], ['RAMEN NOODLES/RAMEN CUPS', 'MAINSTREAM WHITE BREAD'], ['SEASONAL CANDY BOX NON', 'INSIDE FROST BULBS', 'CHEWING GUM']]


- 트랜잭션 통계

In [149]:
transactions_subcategory['num_subcategory'] = transactions_subcategory['Subcategory'].apply(len)
average_subcategory_per_order = transactions_subcategory['num_subcategory'].mean()
max_subcategory_per_order = transactions_subcategory['num_subcategory'].max()
min_subcategory_per_order = transactions_subcategory['num_subcategory'].min()

print(f"Average number of subcategories per order: {average_subcategory_per_order}")
print(f"Maximum number of subcategories per order: {max_subcategory_per_order}")
print(f"Minimum number of subcategories per order: {min_subcategory_per_order}")

Average number of subcategories per order: 7.391127860872296
Maximum number of subcategories per order: 85
Minimum number of subcategories per order: 1


In [150]:
transactions['num_products'] = transactions['Cleaned_Product_type'].apply(len)
average_products_per_order = transactions['num_products'].mean()
max_products_per_order = transactions['num_products'].max()
min_products_per_order = transactions['num_products'].min()

print(f"Average number of products per order: {average_products_per_order}")
print(f"Maximum number of products per order: {max_products_per_order}")
print(f"Minimum number of products per order: {min_products_per_order}")

Average number of products per order: 8.504436069563852
Maximum number of products per order: 127
Minimum number of products per order: 1


흠.. 막 그렇게 달라지진 않았네요...ㅠㅠ

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 지금부턴 지난 실습 때 했던 과정의 반복! <strong>
</span>

## **2️⃣ 연관 분석 - TransactionEncoder로 이진 행렬로 변환**

In [151]:
from mlxtend.preprocessing import TransactionEncoder

te_sub = TransactionEncoder()
te_ary_sub = te_sub.fit(transaction_sub_list).transform(transaction_sub_list)  # 학습과 변환을 따로따로!

df_encoded_sub = pd.DataFrame(te_ary_sub, columns=te_sub.columns_).astype(int)
df_encoded_sub.head()

Unnamed: 0,(CORP USE ONLY),ADULT INCONTINENCE,AIR CARE,ANALGESICS,ANTACIDS,APPAREL,APPLES,AUDIO/VIDEO PRODUCTS,AUTOMOTIVE PRODUCTS,BABY FOODS,...,VEAL,VEGETABLES - ALL OTHERS,VEGETABLES - SHELF STABLE,VEGETABLES SALAD,VITAMINS,WAREHOUSE SNACKS,WATCHES/CALCULATORS/LOBBY,WATER,WATER - CARBONATED/FLVRD DRINK,YOGURT
0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [152]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transaction_list).transform(transaction_list)  # 학습과 변환을 따로따로!

df_encoded = pd.DataFrame(te_ary, columns=te.columns_).astype(int)
df_encoded.head()

Unnamed: 0,Unnamed: 1,/ BEVERAGE,ABRASIVES,ACCENT FURNITURE,ACCENT RUGS,ACCESS COLD WEATHER,ACCESS LADIES BELTS,ACCESS LADIES GLOVES,ACCESS UMBRELLAS,ACCESSORIES,...,WRITING INSTRUMENTS,XMAS PLUSH,YARDLEY,YEAST DRY,YELLOW JACKET,YELLOW SUMMER SQUASH,YNG MEN SCREEN PRINT T,YOGURT,YOGURT MULTI,YOGURT NOT MULTI
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## **3️⃣ 연관 분석 - Apriori 알고리즘 활용**

- Apriori 알고리즘으로 빈발 항목 집합 도출 (지지도 0.05% 이상)

In [153]:
from mlxtend.frequent_patterns import apriori

filtered_onehot_sub = df_encoded_sub.loc[:, df_encoded_sub.sum(axis=0) > 20]

# apriori로 frequent_itemsets 추출 (최소지지도는 0.005, use_colnames=True, low_memory=True)
frequent_itemsets_sub = apriori(filtered_onehot_sub, min_support=0.005, use_colnames=True, low_memory=True)
frequent_itemsets_sub.head()

Unnamed: 0,support,itemsets
0,0.014949,(AIR CARE)
1,0.016649,(ANALGESICS)
2,0.006874,(ANTACIDS)
3,0.051886,(APPLES)
4,0.016606,(BABY FOODS)


In [154]:
from mlxtend.frequent_patterns import apriori

filtered_onehot = df_encoded.loc[:, df_encoded.sum(axis=0) > 20]

# apriori로 frequent_itemsets 추출 (최소지지도는 0.005, use_colnames=True, low_memory=True)
frequent_itemsets = apriori(filtered_onehot, min_support=0.005, use_colnames=True, low_memory=True)
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.011899,(ADULT ANALGESICS)
1,0.03002,(ADULT CEREAL)
2,0.007282,(AIR CARE)
3,0.005374,(AIR CARE CONTINUOUS ACTION)
4,0.008024,(ALKALINE BATTERIES)


- `association_rules`로 연관 규칙 도출 및 필터링(신뢰도 40% 이상)

In [155]:
num_itemsets_sub = len(frequent_itemsets_sub)

# confidence 기준으로 연관 규칙(rules)을 추출하세요.
# 조건: min_threshold=0.4, metric="***",num_itemsets = num_itemsets
rules_sub = association_rules(frequent_itemsets_sub, metric="confidence", min_threshold=0.4, num_itemsets = num_itemsets_sub)

rules_sub[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head()

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(AIR CARE),(BAKED BREAD/BUNS/ROLLS),0.006383,0.426996,1.804899
1,(AIR CARE),(FLUID MILK PRODUCTS),0.006603,0.441702,1.625841
2,(AIR CARE),(SOFT DRINKS),0.00634,0.424107,1.509977
3,(APPLES),(BAKED BREAD/BUNS/ROLLS),0.027527,0.530529,2.242531
4,(APPLES),(CHEESE),0.022373,0.431187,2.344124


In [156]:
num_itemsets = len(frequent_itemsets)

# confidence 기준으로 연관 규칙(rules)을 추출하세요.
# 조건: min_threshold=0.4, metric="***",num_itemsets = num_itemsets
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4, num_itemsets = num_itemsets)

rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head()

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(ADULT CEREAL),(BANANAS),0.012119,0.403688,3.393439
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.65087
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946
3,(APPLE JUICE CIDER),(FLUID MILK WHITE ONLY),0.007639,0.574719,2.387837
4,(APPLES GALA),(BANANAS),0.006996,0.507837,4.268928


In [157]:
print(rules_sub.shape)

(49767, 14)


In [158]:
print(rules.shape)

(502, 14)


- 연관 분석용 리스트 구조 정리

In [159]:
transaction_sub_list = [list(t) if isinstance(t, (list, np.ndarray)) else [t] for t in transaction_sub_list]
transaction_sub_list = [t.tolist() if isinstance(t, np.ndarray) else list(t) if isinstance(t, list) else [t] for t in transaction_sub_list]

In [160]:
transaction_list = [list(t) if isinstance(t, (list, np.ndarray)) else [t] for t in transaction_list]
transaction_list = [t.tolist() if isinstance(t, np.ndarray) else list(t) if isinstance(t, list) else [t] for t in transaction_list]

- 불필요한 지표 제거

In [161]:
apriori_sub = rules_sub.drop(columns=[
    "antecedent support", 
    "consequent support", 
    "representativity", 
    "conviction", 
    "zhangs_metric", 
    "jaccard", 
    "certainty", 
    "kulczynski"
])

In [162]:
apriori = rules.drop(columns=[
    "antecedent support", 
    "consequent support", 
    "representativity", 
    "conviction", 
    "zhangs_metric", 
    "jaccard", 
    "certainty", 
    "kulczynski"
])

- 유의미한 규칙 필터링(향상도 1 이상)

In [163]:
apriori_sub = apriori_sub[apriori_sub['lift'] >= 1]

In [164]:
apriori = apriori[apriori['lift'] >= 1]

- 결과 확인

In [165]:
apriori_sub

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(AIR CARE),(BAKED BREAD/BUNS/ROLLS),0.006383,0.426996,1.804899,0.002847
1,(AIR CARE),(FLUID MILK PRODUCTS),0.006603,0.441702,1.625841,0.002542
2,(AIR CARE),(SOFT DRINKS),0.006340,0.424107,1.509977,0.002141
3,(APPLES),(BAKED BREAD/BUNS/ROLLS),0.027527,0.530529,2.242531,0.015252
4,(APPLES),(CHEESE),0.022373,0.431187,2.344124,0.012829
...,...,...,...,...,...,...
49762,"(FLUID MILK PRODUCTS, SOUP, TROPICAL FRUIT, CH...","(VEGETABLES - SHELF STABLE, BAKED BREAD/BUNS/R...",0.005033,0.414753,8.896085,0.004467
49763,"(SOUP, BAKED BREAD/BUNS/ROLLS, TROPICAL FRUIT,...","(VEGETABLES - SHELF STABLE, FLUID MILK PRODUCTS)",0.005033,0.419503,8.916137,0.004468
49764,"(VEGETABLES - SHELF STABLE, FLUID MILK PRODUCT...","(SOUP, BAKED BREAD/BUNS/ROLLS)",0.005033,0.419091,8.615536,0.004449
49765,"(VEGETABLES - SHELF STABLE, BAKED BREAD/BUNS/R...","(FLUID MILK PRODUCTS, SOUP)",0.005033,0.427333,8.744748,0.004457


In [166]:
apriori

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(ADULT CEREAL),(BANANAS),0.012119,0.403688,3.393439,0.008547
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.650870,0.011928
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946,0.016446
3,(APPLE JUICE CIDER),(FLUID MILK WHITE ONLY),0.007639,0.574719,2.387837,0.004440
4,(APPLES GALA),(BANANAS),0.006996,0.507837,4.268928,0.005357
...,...,...,...,...,...,...
497,"(EGGS, BANANAS, SHREDDED CHEESE)",(FLUID MILK WHITE ONLY),0.005853,0.777778,3.231501,0.004042
498,"(EGGS, FLUID MILK WHITE ONLY, SHREDDED CHEESE)",(BANANAS),0.005853,0.416248,3.499021,0.004180
499,"(BANANAS, FLUID MILK WHITE ONLY, SHREDDED CHEESE)",(EGGS),0.005853,0.417413,4.627167,0.004588
500,"(SOFT DRINKS, MAINSTREAM WHITE BREAD, BANANAS)",(FLUID MILK WHITE ONLY),0.005013,0.721877,2.999244,0.003342


## **[ 참고 ] 연관 분석 - FP-Growth 알고리즘 활용**

- 코드를 돌릴 때 조심해주세요!

In [126]:
filtered_onehot

Unnamed: 0,/ BEVERAGE,ABRASIVES,ACNE MEDICATIONS,ACTIVITY,ADDITIVES/FLUIDS,ADHESIVES/CAULK,ADULT ANALGESICS,ADULT CEREAL,ADULT INCONTINENCE BRIEFS,ADULT INCONTINENCE MISC PRODUC,...,WRAP,WREATHS,WREATHS/TINSEL/GARLAND,WRITING INSTRUMENTS,XMAS PLUSH,YEAST DRY,YELLOW SUMMER SQUASH,YOGURT,YOGURT MULTI,YOGURT NOT MULTI
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254725,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254726,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254727,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
254728,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🥹 실습 때 다루었던 FP-Growth 알고리즘이 참고 자료가 된 이유는 다음 셀 때문이에요... 돌릴 때 조심해주세요... 30분씩 걸릴 때도 있거든요... <strong>
</span>

<span style="color:black; background-color:#fff5b1; padding:2px 4px; border-radius:4px">
<strong> 🤔 "엥? 근데 FP-Growth 알고리즘이 Apriori 알고리즘보다 계산이 빠르다고 하지 않았나?"<strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 네! 공부를 열심히 하셨군요!? 맞습니다! 이론상 FP-Growth 알고리즘이 Apriori 알고리즘이 계산이 더 빠릅니다!<strong>
</span>

<span style="color:black; background-color:#fff5b1; padding:2px 4px; border-radius:4px">
<strong> 🤔 엥 그럼 왜...? <strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 사용한 라이브러리의 차이입니다! 저희는 mlxtend 라이브러리를 사용했습니다! <strong>
</span>

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 mlxtend 라이브러리의 경우 Apriori는 Cython으로 최적화 되어 매우 빠르게 작동하지만, FP-Growth는 순수 Python으로 구현하기 때문에 오히려 느릴 수 있습니다. 이 경우 fpgrowth_py 라이브러리를 활용한다면 더 빠르게 작동할 수 있어요~ <strong>
</span>

- FP-Growth 알고리즘으로 빈발 항목 집합 도출 (지지도 0.05% 이상)

In [127]:
from mlxtend.frequent_patterns import fpgrowth

# FP-Growth 알고리즘을 사용해 frequent_itemsets_fp을 생성하세요.
# 조건: 최소지지도: 0.005, use_colnames=True, 입력 데이터는 boolean 타입(astype(bool))
frequent_itemsets_fp = fpgrowth(filtered_onehot.astype(bool), min_support=0.005, use_colnames=True)
frequent_itemsets_fp.head()

Unnamed: 0,support,itemsets
0,0.118961,(BANANAS)
1,0.038044,(POTATOES RUSSET)
2,0.023715,(ONIONS SWEET)
3,0.018816,(CELERY)
4,0.040133,(HAMBURGER BUNS)


- `association_rules`로 연관 규칙 도출 및 필터링(신뢰도 40% 이상)

In [128]:
# confidence 기준으로 연관 규칙(rules)을 추출하세요.
# 조건: min_threshold=0.4, metric="***"
rules_fp = association_rules(frequent_itemsets_fp, metric="confidence", min_threshold=0.4)

rules_fp.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(BANANAS),(FLUID MILK WHITE ONLY),0.118961,0.240686,0.061402,0.516154,2.144508,1.0,0.03277,1.569328,0.605754,0.205878,0.362785,0.385633
1,(POTATOES RUSSET),(FLUID MILK WHITE ONLY),0.038044,0.240686,0.019236,0.505624,2.100759,1.0,0.010079,1.535903,0.544704,0.074129,0.348917,0.292773
2,"(BANANAS, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.010788,0.240686,0.006776,0.628093,2.609593,1.0,0.004179,2.041677,0.623525,0.02769,0.510207,0.328123
3,"(MAINSTREAM WHITE BREAD, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.011463,0.240686,0.007577,0.660959,2.746144,1.0,0.004818,2.239592,0.643226,0.030979,0.55349,0.346219
4,"(POTATOES RUSSET, SHREDDED CHEESE)",(FLUID MILK WHITE ONLY),0.008519,0.240686,0.005606,0.658065,2.734118,1.0,0.003556,2.220635,0.639701,0.023013,0.549678,0.340678


In [129]:
from IPython.display import display

print("🔥 Frequent Itemsets:")
display(frequent_itemsets_fp)

print("\n🔥 Association Rules:")
display(rules_fp)

🔥 Frequent Itemsets:


Unnamed: 0,support,itemsets
0,0.118961,(BANANAS)
1,0.038044,(POTATOES RUSSET)
2,0.023715,(ONIONS SWEET)
3,0.018816,(CELERY)
4,0.040133,(HAMBURGER BUNS)
...,...,...
1552,0.005221,"(PAPER NAPKINS, FLUID MILK WHITE ONLY)"
1553,0.007267,"(STRING CHEESE, FLUID MILK WHITE ONLY)"
1554,0.006572,"(ISOTONIC DRINKS SINGLE SERVE, FLUID MILK WHIT..."
1555,0.005064,"(FLUID MILK WHITE ONLY, STUFFING MIXES)"



🔥 Association Rules:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(BANANAS),(FLUID MILK WHITE ONLY),0.118961,0.240686,0.061402,0.516154,2.144508,1.0,0.032770,1.569328,0.605754,0.205878,0.362785,0.385633
1,(POTATOES RUSSET),(FLUID MILK WHITE ONLY),0.038044,0.240686,0.019236,0.505624,2.100759,1.0,0.010079,1.535903,0.544704,0.074129,0.348917,0.292773
2,"(BANANAS, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.010788,0.240686,0.006776,0.628093,2.609593,1.0,0.004179,2.041677,0.623525,0.027690,0.510207,0.328123
3,"(MAINSTREAM WHITE BREAD, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.011463,0.240686,0.007577,0.660959,2.746144,1.0,0.004818,2.239592,0.643226,0.030979,0.553490,0.346219
4,"(POTATOES RUSSET, SHREDDED CHEESE)",(FLUID MILK WHITE ONLY),0.008519,0.240686,0.005606,0.658065,2.734118,1.0,0.003556,2.220635,0.639701,0.023013,0.549678,0.340678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
465,(PAPER NAPKINS),(FLUID MILK WHITE ONLY),0.010717,0.240686,0.005221,0.487179,2.024127,1.0,0.002642,1.480662,0.511441,0.021209,0.324626,0.254436
466,(STRING CHEESE),(FLUID MILK WHITE ONLY),0.014133,0.240686,0.007267,0.514167,2.136253,1.0,0.003865,1.562910,0.539515,0.029353,0.360168,0.272179
467,(ISOTONIC DRINKS SINGLE SERVE),(FLUID MILK WHITE ONLY),0.015762,0.240686,0.006572,0.416936,1.732282,1.0,0.002778,1.302283,0.429497,0.026300,0.232118,0.222120
468,(STUFFING MIXES),(FLUID MILK WHITE ONLY),0.009390,0.240686,0.005064,0.539298,2.240667,1.0,0.002804,1.648166,0.558953,0.020669,0.393265,0.280169


In [130]:
fp_growth = rules_fp.drop(columns=[
    "antecedent support", 
    "consequent support", 
    "representativity", 
    "conviction", 
    "zhangs_metric", 
    "jaccard", 
    "certainty", 
    "kulczynski"
])

- 유의미한 규칙 필터링(향상도 1 이상)

In [131]:
fp_growth = fp_growth[fp_growth['lift'] >= 1]
fp_growth

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(BANANAS),(FLUID MILK WHITE ONLY),0.061402,0.516154,2.144508,0.032770
1,(POTATOES RUSSET),(FLUID MILK WHITE ONLY),0.019236,0.505624,2.100759,0.010079
2,"(BANANAS, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.006776,0.628093,2.609593,0.004179
3,"(MAINSTREAM WHITE BREAD, POTATOES RUSSET)",(FLUID MILK WHITE ONLY),0.007577,0.660959,2.746144,0.004818
4,"(POTATOES RUSSET, SHREDDED CHEESE)",(FLUID MILK WHITE ONLY),0.005606,0.658065,2.734118,0.003556
...,...,...,...,...,...,...
465,(PAPER NAPKINS),(FLUID MILK WHITE ONLY),0.005221,0.487179,2.024127,0.002642
466,(STRING CHEESE),(FLUID MILK WHITE ONLY),0.007267,0.514167,2.136253,0.003865
467,(ISOTONIC DRINKS SINGLE SERVE),(FLUID MILK WHITE ONLY),0.006572,0.416936,1.732282,0.002778
468,(STUFFING MIXES),(FLUID MILK WHITE ONLY),0.005064,0.539298,2.240667,0.002804


## **4️⃣ 연관 분석 - 결과 해석**

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">
<strong> 🤓 결과를 해석하고 전략을 세우는 게 해당 과제의 핵심이니 꼭!!!! 성의있게 깊게 고민한 흔적을 남겨주세요! <strong>
</span>

### **[연관분석] 지지도 0.9% 이상, 신뢰도 55% 이상, 향상도 1 이상 연관 분석**
subcategory 기준 분석. <br>
lift(향상도) 기준 상위 30개만 봄

In [175]:
results_sub = apriori_sub[(apriori_sub['confidence']>0.55)&(apriori_sub['support']>0.009)&(apriori_sub['lift']>1)]
# lift 기준 상위 30개만 보기
top_rules = apriori_sub.sort_values('lift', ascending=False).head(30)
display(top_rules)


Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
44660,"(DRY NOODLES/PASTA, COLD CEREAL, BAKED BREAD/B...","(PASTA SAUCE, FLUID MILK PRODUCTS)",0.005649,0.484512,18.77391,0.005348
46998,"(DRY NOODLES/PASTA, COLD CEREAL, CHEESE)","(PASTA SAUCE, FLUID MILK PRODUCTS)",0.005323,0.481192,18.645288,0.005038
41744,"(DRY NOODLES/PASTA, COLD CEREAL, CHEESE)","(PASTA SAUCE, BAKED BREAD/BUNS/ROLLS)",0.005355,0.484031,18.546521,0.005066
49044,"(DRY NOODLES/PASTA, BEEF, FLUID MILK PRODUCTS,...","(PASTA SAUCE, BAKED BREAD/BUNS/ROLLS)",0.005366,0.47252,18.105443,0.00507
26730,"(DINNER MXS:DRY, DRY NOODLES/PASTA)","(BEEF, PASTA SAUCE)",0.005225,0.415418,17.905158,0.004933
41746,"(COLD CEREAL, PASTA SAUCE, CHEESE)","(DRY NOODLES/PASTA, BAKED BREAD/BUNS/ROLLS)",0.005355,0.523408,17.834082,0.005054
44659,"(DRY NOODLES/PASTA, FLUID MILK PRODUCTS, COLD ...","(PASTA SAUCE, BAKED BREAD/BUNS/ROLLS)",0.005649,0.464943,17.815139,0.005332
22238,"(DINNER MXS:DRY, DRY NOODLES/PASTA)","(PASTA SAUCE, BAKED BREAD/BUNS/ROLLS)",0.005845,0.464732,17.807021,0.005517
49047,"(BEEF, PASTA SAUCE, FLUID MILK PRODUCTS, CHEESE)","(DRY NOODLES/PASTA, BAKED BREAD/BUNS/ROLLS)",0.005366,0.521557,17.771018,0.005064
49046,"(DRY NOODLES/PASTA, BEEF, BAKED BREAD/BUNS/ROL...","(PASTA SAUCE, FLUID MILK PRODUCTS)",0.005366,0.457803,17.738999,0.005064


### **[연관분석] 지지도 0.9% 이상, 신뢰도 55% 이상, 향상도 1 이상 연관 분석**

In [172]:
results = apriori[(apriori['confidence']>0.55)&(apriori['support']>0.009)&(apriori['lift']>1)]
results

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
1,(ADULT CEREAL),(FLUID MILK WHITE ONLY),0.019154,0.638028,2.65087,0.011928
2,(ALL FAMILY CEREAL),(FLUID MILK WHITE ONLY),0.026137,0.649118,2.696946,0.016446
49,(CHOCOLATE MILK),(FLUID MILK WHITE ONLY),0.021615,0.654308,2.71851,0.013664
52,(CORN),(FLUID MILK WHITE ONLY),0.013379,0.555682,2.308742,0.007584
59,(DAIRY PURE JUICE),(FLUID MILK WHITE ONLY),0.037451,0.588163,2.443691,0.022126
63,(EGGS),(FLUID MILK WHITE ONLY),0.051568,0.571652,2.375094,0.029856
70,(FRUIT BOWL AND CUPS),(FLUID MILK WHITE ONLY),0.01171,0.572773,2.379748,0.00679
95,(KIDS CEREAL),(FLUID MILK WHITE ONLY),0.033,0.654316,2.718544,0.020861
101,(MACARONI CHEESE DNRS),(FLUID MILK WHITE ONLY),0.0171,0.572706,2.379471,0.009914
129,(PASTA CANNED),(FLUID MILK WHITE ONLY),0.013096,0.569089,2.364444,0.007557


- 위 결과를 지지도, 신뢰도, 향상도 값을 바탕으로 해석해주세요! (두 가지 이상)

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 [ 예시 ] 'FLUID MILK WHITE ONLY'(우유)는 다양한 품목과 높은 결합 구매 패턴을 보이며, 30개 이상의 제품과 lift 2 이상으로 강한 연관성을 나타낸다. <br> 특히 시리얼 계열인 'ALL FAMILY CEREAL', 'KIDS CEREAL', 'ADULT CEREAL'은 각각 65% 이상의 신뢰도와 2.6~2.7의 lift를 기록해 눈에 띄는 결합 소비가 확인된다. <br> 전반적으로 우유는 시리얼, 과일, 베이커리, 아침 식사류 제품들과 자주 함께 구매되며, 이는 소비자의 식사 준비 맥락과 밀접하게 연결된 구매 경향을 보여준다. <strong>  
</span>

- 해석 1: [subcategory 기준 연관분석] 향상도 기준 상위 30개 서브카테고리를 보았을 때, '파스타 건면, 콜드 시리얼, 빵, 치즈'와 같은 품목이 '파스타 소스, 우유', '파스타 소스, 구운 빵', '소고기, 파스타 소스' 등과 함께 구매되는 연관성이 매우 높다. (lift 값이 16 이상으로 매우 높음). 이는 소비자들이 '파스타'를 해 먹기 위해 필요한 식재료들을 장바구니에 함께 담는 경우가 많음을 보여준다. 또한 '냉동 피자', '수프', '치즈' 등 우리가 흔히 생각했을 때 파스타와 잘 어울리는 품목들이 서로 밀접하게 연결된 구매 경향을 보여주고 있다.

- 해석 2: [subcategory 기준 연관분석] cat litter(고양이 화장실)와 cat food(고양이 음식)이 연관이 높다. 즉 고양이 관련 품목이 서로 밀접하게 연관되어 있음을 보여준다.

- 해석 3: [product type 기준 연관분석] fluid milk white only(우유)는 시리얼, 초코 우유, 옥수수, 달걀, 과일, 유제품 주스 등 다양한 제품들과 자주 함께 구매되고 있다. 

## **5️⃣ 연관 분석 - 비즈니스 전략 수립**

- 위 결과해석에 따라 비즈니스 전략을 수립해주세요! (2가지 이상) -> 냅다 GPT만 패서 쓰지 말아주세요 . . .. . 

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 [ 예시 ] 우유와 함께하는 식탁 큐레이션 존 구성 <strong>  
</span>

**🎯 목적**

* 우유와 자주 함께 구매되는 상품을 모아 **직관적인 구매 유도**
* **객단가 상승**, **편리한 쇼핑 경험 제공**

**🛒 구성 품목**

* **시리얼류**: KIDS / ALL FAMILY / ADULT CEREAL
* **베이커리**: 식빵, 비스킷, 토스터 페이스트리
* **과일류**: 바나나, 컵과일
* **유제품/간편식**: 요거트, 달걀, 마카로니 등

**📍 운영 방법**

* 우유 냉장고 인근에 **"우유와 최고의 궁합!"** 테마존 설치
* POP/QR코드로 **추천 식단**이나 **할인 쿠폰** 제공
* 계절별 테마 구성 (예: 여름=냉과일, 겨울=오트밀)

**📈 기대 효과**

* 우유 결합 구매율 상승
* 연관 제품 매출 증가
* ‘고민 없는 조합’으로 **고객 만족도 향상**



<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 전략 1: 파스타 추천 조합 묶음 판매 <strong>  
</span> <br>

**🎯 목적**

* 파스타 소스와 파스타 면 추천 조합을 만들어 **편리한 쇼핑 경험 제공**

**🛒 구성 품목**

* **파스타 추천 조합 세트**

| 완성 파스타 종류 | 파스타 소스 관련 품목| 파스타 면|
|---|---|---|
| 크림 파스타    | 흰 우유, 생크림, 버터, 치즈 (파르메산, 고르곤졸라)     | 두꺼운 면 (딸리아뗄레, 링귀니, 페투치네) |
| 오일 파스타    | 올리브 오일, 마늘, 알리오 올리오 소스, 페페론치노       | 가는 면 (스파게티니, 카펠리니)       |
| 토마토 파스타   | 토마토 소스, 다진 토마토 캔, 바질, 양파, 마늘        | 기본 면 (스파게티, 부카티니)        |
| 로제 파스타    | 토마토 소스, 생크림, 우유, 고추장 (로제 한식풍일 때)    | 두꺼운 면 (페투치네, 리가토니)       |
| 볼로네제      | 토마토 소스, 다진 소고기, 레드 와인, 양파, 셀러리      | 넓은 면 (파파르델레, 링귀니)        |
| 제노베제 파스타  | 바질페스토 소스, 올리브 오일, 잣, 파르메산 치즈        | 꼬인 면 (트로피에, 푸실리)         |
| 해산물 파스타   | 토마토 소스(매콤), 해산물 믹스(오징어, 홍합, 새우), 마늘 | 가는 면 (스파게티, 링귀니)         |


**📍 운영 방법**

* 완성 파스타 종류별로 재료 묶음 판매 및 행사
* 완성 파스타 종류별로 테마를 만들어 진열대에 관련 품목 함께 배치
  

**📈 기대 효과**

* 파스타 관련 구매율 상승
* 연관 제품 매출 증가
* ‘고민 없는 조합’으로 **고객 만족도 향상**



<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 전략 2: 양식 관련 테마존(아메추, 점메추/저메추) 만들기 <strong>  
</span> <br>

**🎯 목적**

* 아침 메뉴, 점심 메뉴, 저녁 메뉴에 어울리는 재료들을 묶어 테마존을 만들어서 **객단가 높이기**
  
**🛒 구성 품목**
* **아메추**: 우유 + 시리얼 + 달걀 + 바나나 + 요거트 + 수프 등 비교적 가벼운 식사에 어울리는 테마
* **점메추/저메추**: 파스타 관련 품목 + 빵 관련 품목 + 소고기 + 냉동 피자 등 점심이나 저녁 식사에 어울리는 테마


**📍 운영 방법**

* 테마별로 재료 묶음 판매 및 행사
* 테마별로 진열대에 관련 품목 함께 배치
  

**📈 기대 효과**

* 구매율 상승
* 연관 제품 매출 증가
* ‘고민 없는 조합’으로 **고객 만족도 향상**

<span style="color:black; background-color:#E6E6FA; padding:2px 4px; border-radius:4px">  
<strong> 🤓 전략 3: 애완동물 관련 품목 주별로 다른 제품 묶음 프로모션 <strong>  
</span> <br>

**🎯 목적**
* 애완동물 관련 품목을 구매하러 온 소비자들에게 (단골 품목 말고) 다양한 품목을 체험하게 함
* 
  
**📍 운영 방법**
* 5월 1주차에는 고등어맛 츄르+고양이 모래 행사
* 5월 2주차에는 참치맛 츄르+고양이 모래 행사
* 등과 같이 다양한 조합으로 판매 행사
  
**📈 기대 효과**
* 다양한 제품 체험한 소비자가 가장 만족스러운 애완동물 제품을 찾게 만들 수 있음

# **🤓 기가 막힌 전략을 제시하는 분께는 행운이 찾아옵니다~🍀**