Skip to content

Latest commit

ย 

History

History
65 lines (37 loc) ยท 3.5 KB

readme.md

File metadata and controls

65 lines (37 loc) ยท 3.5 KB

What's In Your Closet?

Light GCN ํŽธ

cluster_id : ์•„์ดํ…œ์˜ ์ค‘๋ถ„๋ฅ˜์™€ ์ƒ‰์ƒ์ด ๋ชจ๋‘ ๋™์ผํ•œ ๊ฒƒ์„ ๋ฌถ์–ด ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋•Œ์˜ ๊ณ ์œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๋ฒˆํ˜ธ๊ฐ€ cluster_id ์ž…๋‹ˆ๋‹ค.

item_id : ํฌ๋กค๋ง์„ ์ง„ํ–‰ํ•œ ์•„์ดํ…œ์˜ ๊ณ ์œ ํ•œ ๋ฒˆํ˜ธ์ž…๋‹ˆ๋‹ค.

์œ„์˜ ์„ค๋ช…์„ ๋ฐ”ํƒ•์œผ๋กœ cluster_id์™€ item_id๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ํ•˜์—ฌ lightGCN ๋ชจ๋ธ์„ ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ์™€ ์•„์ดํ…œ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ํ™•๋ฅ ์„ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด๋•Œ, cluster_id์™€ item_id๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉํ–‰๋ ฌ์˜ sparsity๋Š” 99.4% ์ •๋„๊ฐ€ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋“  ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด 0์œผ๋กœ๋งŒ ์˜ˆ์ธก์„ ํ•ด๋„ accuracy ์ˆ˜์น˜๊ฐ€ 99.4%๊ฐ€ ๋‚˜์™€๋ฒ„๋ฆฌ๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ ‡๊ธฐ์— ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ๋„ ์ƒํ˜ธ์ž‘์šฉ ๊ฒฐ๊ณผ์˜ ๋น„์œจ์ด ๋น„์Šทํ•˜๋„๋ก ์„ค์ •ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

  • ํด๋Ÿฌ์Šคํ„ฐID, ์•„์ดํ…œID, ์ƒํ˜ธ์ž‘์šฉ = 1 : 4799๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์กด์žฌ
  • ํด๋Ÿฌ์Šคํ„ฐID, ์•„์ดํ…œID, ์ƒํ˜ธ์ž‘์šฉ = 0 : 6723๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์กด์žฌ (๊ธฐ์กด ์•ฝ 68๋งŒ๊ฐœ)

์—ฌ๊ธฐ์„œ ๋˜ ์ฃผ์˜ํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์€, ์ƒํ˜ธ์ž‘์šฉ์„ ํ•˜์ง€ ์•Š์•˜๋‹ค๊ณ  ํ•ด์„œ ์ง„์งœ๋กœ ๊ทธ ํด๋Ÿฌ์Šคํ„ฐ์™€ ์•„์ดํ…œ์ด ์–ด์šธ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์–ด์šธ๋ฆฌ๋Š”๋ฐ ๊ทธ๋Ÿฐ ์ฝ”๋””๊ฐ€ ์—†์–ด์„œ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๊ฐ€ ์—†์„ ์ˆ˜ ์žˆ๊ณ , ๋ฐ์ดํ„ฐ ํฌ๋กค๋ง ๋ฒ”์œ„์— ๊ทธ๋Ÿฐ ์ฝ”๋””๊ฐ€ ์—†์–ด์„œ ์ƒํ˜ธ์ž‘์šฉ์ด 0์ด๋ผ๊ณ  ํ‘œ์‹œ๋œ ๊ฒƒ์ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๋ฌธ์ œ๋ฅผ LightGCN์ด ํ•ด๊ฒฐํ•ด ์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.


๋ณธ๊ฒฉ์ ์ธ ๋ชจ๋ธ ์‹คํ—˜์„ ์œ„ํ•œ ์ค€๋น„์‚ฌํ•ญ

Jira Google Drive์— ์˜ฌ๋ผ๊ฐ„ dataset ์‚ฌ์šฉ (data.zip ์†Œ์œ ์ž: killerwhale)
์œ„์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์—ฌ ์ „์ฒ˜๋ฆฌ ์ง„ํ–‰ (preprocess branch / preprocess_item.py ์‹คํ–‰)

์ดํ›„์—, ํ˜„์žฌ ํด๋” ๋‚ด์— ์žˆ๋Š” make_cluster_item_interaction.ipynb ๋กœ ๋ชจ๋“  ํด๋Ÿฌ์Šคํ„ฐ์™€ ์•„์ดํ…œ ์‚ฌ์ด์˜ interaction ์ •๋ณด๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ์ถ”์ถœํ•˜์—ฌ ์ €์žฅ

๋ชจ๋ธ ์‹คํ–‰ : python3 train.py
์ถ”๋ก  ์ง„ํ–‰ : python3 inference.py

์ด๋•Œ ์ถ”๋ก ์€ ๋ชจ๋“  ํด๋Ÿฌ์Šคํ„ฐ์™€ ์•„์ดํ…œ ์‚ฌ์ด์˜ interactionํ•  ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜๋„๋ก ์„ค์ •.


๋ชจ๋ธ inference ๊ฒฐ๊ณผ

์ถ”๋ก  ๊ฒฐ๊ณผ, ์‹ค์ œ๋กœ interaction์„ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์™€ ์•„์ดํ…œ ์‚ฌ์ด์˜ ํ™•๋ฅ ์€ ๋Œ€๋ถ€๋ถ„ 99%๋กœ ์ •ํ™•ํ•˜๊ฒŒ ๋‚˜์™”๊ณ , ์ƒํ˜ธ์ž‘์šฉ์„ ํ•˜์ง€ ์•Š์•˜๋˜ ์•„์ดํ…œ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์ด์—์„œ๋„ ๋†’์€ ํ™•๋ฅ ๋กœ ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๋“ค์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.


๋‹ค ํ•„์š”์—†๊ณ  ์ง„์งœ ์ตœ์ข… ์ •๋ณด

Jira - Google Drive์— ์˜ฌ๋ผ๊ฐ„ cluster_item_prob.csv ํŒŒ์ผ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

csvํŒŒ์ผ ๋‚ด๋ถ€๋Š” item_id, cluster_id, prob 3๊ฐœ์˜ ์นผ๋Ÿผ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ, ๊ฐ๊ฐ item_id๊ฐ€ cluster_id์™€ ์ƒํ˜ธ์ž‘์šฉํ•  ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ(prob)๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

streamlit์œผ๋กœ ์„œ๋น™ํ•˜๋Š” ๋‹จ๊ณ„์—์„œ๋Š” ์ด csv์ •๋ณด๋งŒ์„ ์‚ฌ์šฉํ•ด์„œ ํ•œ๋‹ค๋ฉด, ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ ์ •๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๋งค์šฐ ๋น ๋ฅธ ์‹œ๊ฐ„์•ˆ์— inference๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž๋กœ ๋ถ€ํ„ฐ ์ž…๋ ฅ๋ฐ›์€ ์•„์ดํ…œ์˜ cluster_id ๋ฅผ ์ถ”์ถœํ•˜๊ณ , ๊ทธ cluster_id์™€ ๊ฐ€์žฅ ์ƒํ˜ธ์ž‘์šฉ ํ™•๋ฅ ์ด ๋†’์€ ์•„์ดํ…œ๋“ค์„ ์ถ”์ฒœํ•ด์ฃผ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์•„์ง ๋ชจ๋“  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณธ ๊ฒƒ์€ ์•„๋‹ˆ๋ผ์„œ Rule-Based์™€ 50:50 ์œผ๋กœ ์„ž์–ด์„œ ์ถ”์ฒœํ•˜๋Š” ๊ฒƒ์ด ๋” ๋งŒ์กฑ์Šค๋Ÿฌ์šด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.


๋ชจ๋ธ ํ•™์Šต Configuration ์ •๋ณด (hyperparamter)

n_epoch = 10000
learning_rate = 0.005
embedding_dim = 250  # int
num_layers = 6  # int

Result : loss=0.537, acc=0.657, AUC=0.728