## 조달청 나라장터 종합쇼핑몰 추천시스템 개발_모델 구현

### 01. 데이터 로드

In [13]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings(action='ignore')

# 종합쇼핑몰 품목 등록 내역 데이터
df_feature = pd.read_csv('shopping_feature.csv', encoding = 'UTF-8')
# 종합쇼핑몰 납품요구 상세내역 데이터
df_history = pd.read_csv('shopping_history.csv', encoding = 'utf-8')

print('종합쇼핑몰_현재품목 :',df_feature.shape)
print('종합소핑몰_구매목록 :',df_history.shape)

종합쇼핑몰_현재품목 : (100, 5)
종합소핑몰_구매목록 : (470, 7)


### 02. 데이터 전처리

In [14]:
# 품목 데이터
# 필요한 column 추출
df_feature = df_feature[['물품식별번호','품목','제품인증목록']]

In [16]:
# 구매내역 데이터
# 최종납품등록여부이 N인 경우의 행 삭제
index = df_history[df_history['최종납품등록여부']=='N'].index
df_history.drop(index, inplace = True)

# 필요한 column 추출
df_history = df_history[['수요기관명','물품식별번호','품목']]

# 구매횟수를 count하기 위해 rank column 생성
df_history['rank'] = 1

# 그룹화를 통해 최종구매횟수 도출
df_history = df_history.groupby(['수요기관명','물품식별번호','품목'])['rank'].sum().reset_index()

# cut을 통해 10개의 평점으로 구간화
df_history['rank'] = pd.cut(df_history['rank'],10, labels = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
df_history.head(5)

Unnamed: 0,수요기관명,물품식별번호,품목,rank
0,강원도강릉교육청 강릉초등학교,21887066,"유아용교구장, 아름드리교구, ARD-064, 1140×300×915mm",1
1,강원도강릉교육청 강릉초등학교,22802381,"유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용",1
2,강원도강릉교육청 강릉초등학교,23182936,"유아용교구장, 다나, KG-8, 1150×300×750mm, 어린이용",1
3,강원도강릉교육청 강릉초등학교,23182957,"유아용교구장, 다나, KG-43, 1150×300×950mm, 어린이용",1
4,강원도강릉교육청 강릉초등학교,23419139,"유아용교구장, 다나, KG-80, 1150×300×600mm, 어린이용",1


In [17]:
# 두 데이터 병합
df = pd.merge(df_history,df_feature,on = '품목', how = 'left')
df.rename(columns = {'물품식별번호_x':'물품식별번호'}, inplace = True)
df.drop(['물품식별번호_y'],axis=1,inplace = True)

In [18]:
# 종합쇼핑몰 인증내역 리스트
# 의무구매물품
duty = ["GR","환경표지제품","NEP","저공해자동차인증","저탄소인증제품"]
# 우선구매물품
first = ["우수조달물품","NET","GS","중증장애인생산품","성능인증제품","보훈단체생산품","여성기업제품","장애인기업","사회적기업","우수발명품",
        "우수조달 공동상표","녹색기술인증","녹색기술제품확인","장애인표준사업장","창업기업제품","GS인증(1등급)",
         "GS인증(2등급)","ICT융합품질인증"]

# 종합쇼핑몰 인증내역 처리 함수
def get_score(l):
    if pd.isna(l)==True:
        score = 0
    else:
        l = l.replace('(',',').replace(')',',').replace(' ',',')
        cer = l.split(',')
        cer = list(filter(None, cer))
        score = 0
        for i in range(len(cer)):
            if cer[i] in duty:
                score = 10
                break
            if cer[i] in first:
                score += 1
    return score

# 인증내역에 대한 추가점수 column 생성
df['인증점수'] = df['제품인증목록'].apply(lambda l : get_score(l))

# 합계를 위한 형변환
df = df.astype({'인증점수':'category'})
df = df.astype({'인증점수':'int'})
df = df.astype({'rank':'int'})

# 평점과 가산점 합계
df['rank'] = df['rank']+df['인증점수']

# # 합계가 10점이상이면 10점으로 설정
df.loc[df['rank'] > 10] = '10'

# # 불필요한 column 삭제
df.drop(['인증점수','제품인증목록'],axis=1,inplace = True)

# # 최종 데이터셋
df.head(5)

Unnamed: 0,수요기관명,물품식별번호,품목,rank
0,강원도강릉교육청 강릉초등학교,21887066,"유아용교구장, 아름드리교구, ARD-064, 1140×300×915mm",1
1,강원도강릉교육청 강릉초등학교,22802381,"유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용",1
2,강원도강릉교육청 강릉초등학교,23182936,"유아용교구장, 다나, KG-8, 1150×300×750mm, 어린이용",1
3,강원도강릉교육청 강릉초등학교,23182957,"유아용교구장, 다나, KG-43, 1150×300×950mm, 어린이용",1
4,강원도강릉교육청 강릉초등학교,23419139,"유아용교구장, 다나, KG-80, 1150×300×600mm, 어린이용",1


In [21]:
df['rank'].value_counts()

1     428
10      8
Name: rank, dtype: int64

In [22]:
# 피벗테이블로 변경
final_df = pd.pivot_table(df,index='수요기관명',columns=['물품식별번호','품목'],values='rank').fillna(0)
final_df.head(5)

물품식별번호,20963648,21116969,21193673,21194289,21194291,21199638,21200841,21200847,21201008,21201038,...,24359439,24359440,24359441,24359442,24359443,24359444,24359446,24373218,24386085,24386088
품목,"유아용교구장, 파랑새교구, prsi-311, 1170×300×750mm, 쌓기영역장, 어린이용","유아용교구장, 홍명퍼니처, HM9701, 900×550×800mm, 어린이용","유아용교구장, 아이땅, EPF610, 1100×300×820mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(소), 1280×500×930mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(대), 1700×500×930mm, 어린이용","유아용교구장, 삼성교구, ssgg-004, 1100×400×920mm, 자작사물함12인용, 어린이용","유아용교구장, 삼성교구, ssgg-014, 1100×300×710mm, 자작삼단막힘, 어린이용","유아용교구장, 삼성교구, ssgg-019, 1100×300×780mm, 자작완구책장, 어린이용","유아용교구장, 삼성교구, ssgg-117, 1100×300×710mm, 원목삼단막힘, 어린이용","유아용교구장, 삼성교구, ssgg-147, 1100×300×610mm, 원목영아이단막힘, 어린이용",...,"유아용교구장, 파랑새교구, PBAA22-2, 1170×300×600mm","유아용교구장, 파랑새교구, PBAB22-2, 1170×300×600mm","유아용교구장, 파랑새교구, PBAB22-5, 1170×300×600mm","유아용교구장, 파랑새교구, PBAA23-3, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-3, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-5, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-4, 1170×300×750mm","유아용교구장, 파랑새교구, PBAA13-3, 900×300×750mm","유아용교구장, 위노스, WM1004BC, 1050×300×808mm","유아용교구장, 위노스, WM1007BC, 1050×300×808mm"
수요기관명,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
강원도강릉교육청 강릉초등학교,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도교육연수원,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도동해교육지원청,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도속초양양교육청 상평초등학교,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도원주교육지원청,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 03. 모델 구현

### 03-1. 아이템 기반 협업 필터링

In [23]:
ratings_matrix = df.pivot_table( "rank","수요기관명", "품목")
ratings_matrix.fillna(0,inplace=True)
ratings_matrix_T = ratings_matrix.T
ratings_matrix_T.head(5)

수요기관명,강원도강릉교육청 강릉초등학교,강원도교육청 강원도교육연수원,강원도교육청 강원도동해교육지원청,강원도교육청 강원도속초양양교육청 상평초등학교,강원도교육청 강원도원주교육지원청,강원도교육청 강원도원주교육청 반곡초등학교,강원도교육청 봉대가온학교,강원도화천교육청 화천초등학교,경기도 수원시,경기도 안양시,...,충청남도금산교육청 금산동초등학교,충청남도아산교육청 배방초등학교,충청남도아산교육청 탕정초등학교,충청북도 청주시,충청북도교육청 충청북도진천교육지원청 진천상신초등학교,충청북도교육청 충청북도청주교육지원청,충청북도교육청 충청북도청주교육지원청 성화초등학교,충청북도교육청 충청북도청주교육지원청 초롱꽃유치원,한국수자원공사 낙동강권역본부 경남서부권관리단,한국토지주택공사 대전충남지역본부
품목,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"유아용교구장, (부품)다나, BO-626, 400×335×115mm, 바구니, 어린이용",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BO-627, 340×270×95mm, 바구니, 어린이용",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-1, 82×247×58mm, 바구니, 어린이용",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-2, 82×247×58mm, 바구니, 어린이용",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-3, 164×247×58mm, 바구니, 어린이용",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
from sklearn.metrics.pairwise import cosine_similarity

# 아이템 유사도 행렬
item_sim = cosine_similarity(ratings_matrix_T, ratings_matrix_T)

# 데이터 프레임 형태로 저장
item_sim_df = pd.DataFrame(item_sim, index=ratings_matrix.columns, columns=ratings_matrix.columns)

print(item_sim_df.shape)
item_sim_df.head(5)

(274, 274)


품목,"유아용교구장, (부품)다나, BO-626, 400×335×115mm, 바구니, 어린이용","유아용교구장, (부품)다나, BO-627, 340×270×95mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-1, 82×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-2, 82×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-3, 164×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-4, 267×367×110mm, 바구니, 어린이용","유아용교구장, 다나, BO-520, 1200×300×750mm, 어린이용","유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용","유아용교구장, 다나, BO-609, 1100×300×750mm, 어린이용","유아용교구장, 다나, KG-10, 1150×300×600mm, 어린이용",...,"유아용교구장, 현대교구산업앤키즈드림, KD-EUN-A14, 1230×295×600mm, 어린이용","유아용교구장, 현대교구산업앤키즈드림, KD-MEDL-05, 700×395×950mm, 어린이용","유아용교구장, 홍명퍼니처, HM9701, 900×550×800mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(대), 1700×500×930mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(소), 1280×500×930mm, 어린이용","자석판학습교구, 동원산업, BM1411, 스토리텔링창의맞춤수학5학년","자석판학습교구, 동원산업, BM1412, 스토리텔링창의맞춤수학 6학년","자석판학습교구, 동원산업, BM1707MA, 마그네틱학급세트","자석판학습교구, 이선생자석교구, ET-001, 1학년영어-교사용","자석판학습교구, 이선생자석교구, ET-002, 2학년영어-교사용"
품목,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"유아용교구장, (부품)다나, BO-626, 400×335×115mm, 바구니, 어린이용",1.0,0.353553,0.57735,0.353553,0.0,0.288675,0.049752,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BO-627, 340×270×95mm, 바구니, 어린이용",0.353553,1.0,0.408248,0.5,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-1, 82×247×58mm, 바구니, 어린이용",0.57735,0.408248,1.0,0.816497,0.408248,0.666667,0.057448,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-2, 82×247×58mm, 바구니, 어린이용",0.353553,0.5,0.816497,1.0,0.5,0.408248,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"유아용교구장, (부품)다나, BS-3, 164×247×58mm, 바구니, 어린이용",0.0,0.0,0.408248,0.5,1.0,0.816497,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
item_sim_df.value_counts()

유아용교구장, (부품)다나, BO-626, 400×335×115mm, 바구니, 어린이용  유아용교구장, (부품)다나, BO-627, 340×270×95mm, 바구니, 어린이용  유아용교구장, (부품)다나, BS-1, 82×247×58mm, 바구니, 어린이용  유아용교구장, (부품)다나, BS-2, 82×247×58mm, 바구니, 어린이용  유아용교구장, (부품)다나, BS-3, 164×247×58mm, 바구니, 어린이용  유아용교구장, (부품)다나, BS-4, 267×367×110mm, 바구니, 어린이용  유아용교구장, 다나, BO-520, 1200×300×750mm, 어린이용  유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용  유아용교구장, 다나, BO-609, 1100×300×750mm, 어린이용  유아용교구장, 다나, KG-10, 1150×300×600mm, 어린이용  유아용교구장, 다나, KG-11, 1150×300×750mm, 어린이용  유아용교구장, 다나, KG-12, 1150×300×750mm, 어린이용  유아용교구장, 다나, KG-19, 1500×300×950mm, 어린이용  유아용교구장, 다나, KG-205, 1150×300×965mm  유아용교구장, 다나, KG-211, 1150×300×965mm  유아용교구장, 다나, KG-212, 1150×300×965mm  유아용교구장, 다나, KG-22, 1150×300×750mm, 어린이용  유아용교구장, 다나, KG-30, 1250×300×600mm, 어린이용  유아용교구장, 다나, KG-31, 1200×300×750mm, 어린이용  유아용교구장, 다나, KG-32, 1250×300×750mm, 어린이용  유아용교구장, 다나, KG-38, 1150×300×680mm, 어린이용  유아용교구장, 다나, KG-40, 1150×400×680mm, 어린이용  유아용교구장, 다나, KG-43, 1150×300×950mm, 어린이용  유아용교구장, 다나, KG-45, 1050×400×9

In [11]:
# 특정 제품에 대해 유사도가 높은 10개의 제품 출력
item_sim_df['물탱크, (부품)금강, KST-PN1.5T, STS패널, 1000×1000×1.5mm'].sort_values(ascending = False)[:10]

품목
물탱크, (부품)금강, KST-PN1.5T, STS패널, 1000×1000×1.5mm    1.0
물탱크, (부품)금강, KST-PN2.5T, STS패널, 1000×1000×2.5mm    1.0
물탱크, (부품)금강, KST-PN2.0T, STS패널, 1000×1000×2mm      1.0
소방물탱크차, 우리특장, WFJ-P210-D85C21, 물탱크6000L            0.0
물탱크, 세진에스엠씨, SEN-016, 16톤, SMC/사각형                 0.0
물탱크, 서흥, CSHP-0120, 120톤, STS/사각패널형                0.0
물탱크, 성일, SI-18T-1, 18톤, SMC/사각형/단판                 0.0
물탱크, 성일, SI-48T-10, 48톤, SMC/사각형/단판                0.0
물탱크, 성일산업, SIT-0005H, 5톤, STS304/사각형/보온            0.0
물탱크, 성일산업, SIT-0025H, 25톤, STS304/사각형/보온           0.0
Name: 물탱크, (부품)금강, KST-PN1.5T, STS패널, 1000×1000×1.5mm, dtype: float64

In [27]:
# 사용자별로 최적화 된 평점을 예측하는 함수 정의
def predict_rating(ratings_arr, item_sim_arr):
    ratings_pred = ratings_arr.dot(item_sim_arr) / np.array([np.abs(item_sim_arr).sum(axis=1)])
    
    return ratings_pred
  
ratings_pred = predict_rating(ratings_matrix.values , item_sim_df.values)

# 제품 간 유사도에 대한 예측 평점 행렬을 데이터프레임으로 변환
ratings_pred_matrix = pd.DataFrame(data=ratings_pred, index= ratings_matrix.index,
                                   columns = ratings_matrix.columns)

ratings_pred_matrix.head(3)

품목,"유아용교구장, (부품)다나, BO-626, 400×335×115mm, 바구니, 어린이용","유아용교구장, (부품)다나, BO-627, 340×270×95mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-1, 82×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-2, 82×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-3, 164×247×58mm, 바구니, 어린이용","유아용교구장, (부품)다나, BS-4, 267×367×110mm, 바구니, 어린이용","유아용교구장, 다나, BO-520, 1200×300×750mm, 어린이용","유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용","유아용교구장, 다나, BO-609, 1100×300×750mm, 어린이용","유아용교구장, 다나, KG-10, 1150×300×600mm, 어린이용",...,"유아용교구장, 현대교구산업앤키즈드림, KD-EUN-A14, 1230×295×600mm, 어린이용","유아용교구장, 현대교구산업앤키즈드림, KD-MEDL-05, 700×395×950mm, 어린이용","유아용교구장, 홍명퍼니처, HM9701, 900×550×800mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(대), 1700×500×930mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(소), 1280×500×930mm, 어린이용","자석판학습교구, 동원산업, BM1411, 스토리텔링창의맞춤수학5학년","자석판학습교구, 동원산업, BM1412, 스토리텔링창의맞춤수학 6학년","자석판학습교구, 동원산업, BM1707MA, 마그네틱학급세트","자석판학습교구, 이선생자석교구, ET-001, 1학년영어-교사용","자석판학습교구, 이선생자석교구, ET-002, 2학년영어-교사용"
수요기관명,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
강원도강릉교육청 강릉초등학교,0.047179,0.0,0.034353,0.0,0.0,0.034782,0.008887,0.736367,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도교육연수원,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
강원도교육청 강원도동해교육지원청,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.192993,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [28]:
from sklearn.metrics import mean_squared_error

# 성능 평가는 MSE를 사용
def get_mse(pred, actual):
    # 평점이 있는 데이터만 추출 (1차원 배열로 변환)
    pred = pred[actual.nonzero()].flatten()
    actual = actual[actual.nonzero()].flatten()
    
    mse = mean_squared_error(pred, actual)
    rmse = np.sqrt(mse)
    
    return mse, rmse

mse, rmse = get_mse(ratings_pred, ratings_matrix.values)
print('아이템 기반 모든 최근접 이웃 mse:', mse, ", rmse:", rmse)

아이템 기반 모든 최근접 이웃 mse: 0.5268085991163517 , rmse: 0.7258158162484142


In [29]:
def predict_rating_topsim(ratings_arr, item_sim_arr, n=20):
    pred = np.zeros(ratings_arr.shape)

    for col in range(ratings_arr.shape[1]):
        # 아이템의 유사도 상위 N개 아이템
        top_n_items = [np.argsort(item_sim_arr[:, col])[:-1-n:-1]]
        
        # 개인화된 예측 평점을 계산: 반복당 특정 아이템의 예측 평점(사용자 전체)
        for row in range(ratings_arr.shape[0]):
            # 예측 평점
            pred[row, col] = item_sim_arr[col,:][top_n_items].dot(ratings_arr[row, :][top_n_items].T)
            pred[row, col] /= np.sum( np.abs(item_sim_arr[col,:][top_n_items]))
            
    return pred

ratings_pred = predict_rating_topsim(ratings_matrix.values, item_sim_df.values,n=20)
mse, rmse = get_mse(ratings_pred, ratings_matrix.values)
print('아이템 기반 Top20 최근접 이웃 mse:', mse, ", rmse:", rmse)

아이템 기반 Top20 최근접 이웃 mse: 0.5210914639532725 , rmse: 0.7218666524734831


In [31]:
# 사용자의 구매이력이 있는 특정 제품과 비슷한 유사도를 가지는 top_n 추출, top_n = 10
user_rating_id=ratings_matrix.loc['강원도강릉교육청 강릉초등학교'].sort_values(ascending=False)[:10] 
# DataFrame으로 변환
recomm_prods_item = pd.DataFrame(data=user_rating_id.values,index=user_rating_id.index,columns=['pred_score'])
recomm_prods_item

Unnamed: 0_level_0,pred_score
품목,Unnamed: 1_level_1
"유아용교구장, 다나, KG-80, 1150×300×600mm, 어린이용",1.0
"유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용",1.0
"유아용교구장, 다나, KG-43, 1150×300×950mm, 어린이용",1.0
"유아용교구장, 다나, KG-8, 1150×300×750mm, 어린이용",1.0
"유아용교구장, 아름드리교구, ARD-064, 1140×300×915mm",1.0
"유아용교구장, 우리OA가구산업, WRKS25, 1150×295×760mm, 어린이용",0.0
"유아용교구장, 월드퍼니처, WD-KGC0832, 800×400×950mm, 어린이용",0.0
"유아용교구장, 위노스, WM1004BC, 1050×300×808mm",0.0
"유아용교구장, 위노스, WM1007BC, 1050×300×808mm",0.0
"유아용교구장, 자연교구, JY-0030-01, 1100×300×600mm, 어린이용",0.0


### 03-2. 잠재요인 기반 협업 필터링

In [33]:
from sklearn.metrics import mean_squared_error

def get_eval(R, P, Q, non_zeros):
    error = 0
    
    # 예측 R 행렬 생성
    full_pred_matrix = np.dot(P, Q.T)
    
    # 실제 R 행렬에서 널이 아닌 값의 위치 인덱스 추출하여 실제 R 행렬과 예측 행렬의 RMSE 추출
    x_non_zero_ind = [non_zero[0] for non_zero in non_zeros]
    y_non_zero_ind = [non_zero[1] for non_zero in non_zeros]
    R_non_zeros = R[x_non_zero_ind, y_non_zero_ind]
    full_pred_matrix_non_zeros = full_pred_matrix[x_non_zero_ind, y_non_zero_ind]
      
    mse = mean_squared_error(R_non_zeros, full_pred_matrix_non_zeros)
    rmse = np.sqrt(mse)

    return mse, rmse

In [34]:
def matrix_factorization(R, K, steps=200, learning_rate=0.01, r_lambda=0.01):
    num_users, num_items = R.shape

    np.random.seed(1)
    P = np.random.normal(scale=1.0/K, size=(num_users, K))
    Q = np.random.normal(scale=1.0/K, size=(num_items, K))
    
    prev_rmse = 1000
    break_count = 0

    # R>0인 행 위치, 열 위치, 값을 non_zeros 리스트에 저장한다.
    non_zeros = [ (i, j, R[i, j]) for i in range(num_users) for j in range(num_items) if R[i, j] > 0 ]

    # SGD 기법으로 P, Q 매트릭스를 업데이트 함
    for step in range(steps):
        for i, j, r in non_zeros:
            # 잔차 구함
            eij = r - np.dot(P[i, :], Q[j, :].T)

            # Regulation을 반영한 SGD 업데이터 적용
            P[i, :] = P[i, :] + learning_rate*(eij * Q[j, :] - r_lambda*P[i, :])
            Q[j, :] = Q[j, :] + learning_rate*(eij * P[i, :] - r_lambda*Q[j, :])

        # rmse, precision,recall,f1 = get_eval(R, P, Q, non_zeros)
        mse, rmse = get_eval(R, P, Q, non_zeros)
        if step % 10 == 0:
            print("iter step: ", step,"mse: ", mse, "rmse: ", rmse)
            
    return P, Q

In [35]:
# 모델 구현
P, Q = matrix_factorization(final_df.values, K=50, steps=200, learning_rate=0.01, r_lambda=0.01)
pred_matrix = np.dot(P,Q.T) # 도출된 P,Q.T행렬의 내적을 통한 예측 행렬 도출

iter step:  0 mse:  2.813182435085566 rmse:  1.677254433616309
iter step:  10 mse:  2.69283841371142 rmse:  1.6409870242361515
iter step:  20 mse:  1.413087297592016 rmse:  1.1887334846768707
iter step:  30 mse:  1.0153324943445734 rmse:  1.007637084641377
iter step:  40 mse:  0.8763063448439461 rmse:  0.9361123569550538
iter step:  50 mse:  0.7956313909906616 rmse:  0.891981721220038
iter step:  60 mse:  0.6929201683349437 rmse:  0.8324182652578833
iter step:  70 mse:  0.5795331109628244 rmse:  0.7612707212042404
iter step:  80 mse:  0.47259098380988646 rmse:  0.6874525320412214
iter step:  90 mse:  0.38106889610946265 rmse:  0.6173077806973298
iter step:  100 mse:  0.3061692888962775 rmse:  0.5533256626041101
iter step:  110 mse:  0.24542276918954864 rmse:  0.4954016241288967
iter step:  120 mse:  0.19629014111879622 rmse:  0.44304643223797235
iter step:  130 mse:  0.1568377780840937 rmse:  0.39602749662629955
iter step:  140 mse:  0.12550727927427077 rmse:  0.3542700654504566
iter s

In [36]:
# 평점 예측 행렬 데이터프레임 형태로 변환
ratings_pred_matrix = pd.DataFrame(data=pred_matrix,index=final_df.index,columns=final_df.columns)
ratings_pred_matrix.head(3)

물품식별번호,20963648,21116969,21193673,21194289,21194291,21199638,21200841,21200847,21201008,21201038,...,24359439,24359440,24359441,24359442,24359443,24359444,24359446,24373218,24386085,24386088
품목,"유아용교구장, 파랑새교구, prsi-311, 1170×300×750mm, 쌓기영역장, 어린이용","유아용교구장, 홍명퍼니처, HM9701, 900×550×800mm, 어린이용","유아용교구장, 아이땅, EPF610, 1100×300×820mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(소), 1280×500×930mm, 어린이용","유아용교구장, 홍명퍼니처, HMS-A형바구니장(대), 1700×500×930mm, 어린이용","유아용교구장, 삼성교구, ssgg-004, 1100×400×920mm, 자작사물함12인용, 어린이용","유아용교구장, 삼성교구, ssgg-014, 1100×300×710mm, 자작삼단막힘, 어린이용","유아용교구장, 삼성교구, ssgg-019, 1100×300×780mm, 자작완구책장, 어린이용","유아용교구장, 삼성교구, ssgg-117, 1100×300×710mm, 원목삼단막힘, 어린이용","유아용교구장, 삼성교구, ssgg-147, 1100×300×610mm, 원목영아이단막힘, 어린이용",...,"유아용교구장, 파랑새교구, PBAA22-2, 1170×300×600mm","유아용교구장, 파랑새교구, PBAB22-2, 1170×300×600mm","유아용교구장, 파랑새교구, PBAB22-5, 1170×300×600mm","유아용교구장, 파랑새교구, PBAA23-3, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-3, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-5, 1170×300×750mm","유아용교구장, 파랑새교구, PBAB23-4, 1170×300×750mm","유아용교구장, 파랑새교구, PBAA13-3, 900×300×750mm","유아용교구장, 위노스, WM1004BC, 1050×300×808mm","유아용교구장, 위노스, WM1007BC, 1050×300×808mm"
수요기관명,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
강원도강릉교육청 강릉초등학교,0.178641,0.045756,0.397995,0.189792,0.051402,-0.031859,0.322404,0.015761,0.117601,0.084457,...,0.309961,0.360775,0.251932,0.326788,0.393135,0.331447,0.243955,0.129168,-0.009088,-0.102037
강원도교육청 강원도교육연수원,0.018742,0.107907,0.08991,0.015305,-0.023494,0.051501,0.081061,0.042295,-0.057284,-0.062608,...,0.017465,0.067458,0.057173,0.013091,0.070894,0.095996,0.087636,0.081576,0.029075,0.047783
강원도교육청 강원도동해교육지원청,-0.096119,0.048101,0.160095,0.254652,0.205937,0.240139,0.01751,-0.137995,0.142673,0.136727,...,-0.169091,-0.238277,-0.143447,-0.194957,-0.27855,-0.232209,-0.252355,-0.091141,0.146947,0.047272


In [39]:
# 사용자의 예측 평점 행렬이 높은 top_n개 추출, top_n = 10
recomm_prod = ratings_pred_matrix.loc['강원도강릉교육청 강릉초등학교'].sort_values(ascending=False)[:10]

# DataFrame으로 생성
recomm_prods_latent = pd.DataFrame(data=recomm_prod.values,index=recomm_prod.index,columns=['pred_score'])
recomm_prods_latent

Unnamed: 0_level_0,Unnamed: 1_level_0,pred_score
물품식별번호,품목,Unnamed: 2_level_1
23182936,"유아용교구장, 다나, KG-8, 1150×300×750mm, 어린이용",1.03297
22802381,"유아용교구장, 다나, BO-521, 1250×300×750mm, 어린이용",1.00791
23182934,"유아용교구장, 다나, KG-6, 1150×300×600mm, 어린이용",0.988234
21887066,"유아용교구장, 아름드리교구, ARD-064, 1140×300×915mm",0.974126
23182933,"유아용교구장, 다나, KG-5, 1150×300×600mm, 어린이용",0.971101
23182957,"유아용교구장, 다나, KG-43, 1150×300×950mm, 어린이용",0.957371
22802380,"유아용교구장, 다나, BO-520, 1200×300×750mm, 어린이용",0.922306
23419139,"유아용교구장, 다나, KG-80, 1150×300×600mm, 어린이용",0.918798
23182955,"유아용교구장, 다나, KG-19, 1500×300×950mm, 어린이용",0.51675
24018915,"유아용교구장, 세종산업, sj087jk3-l, 1174×300×630mm",0.473756
