카카오 멜론 플레이리스트 추천시스템

2020년 여름에 진행된 카카오 아레나 대회에 참가하며 진행한 playlist continuation 프로젝트입니다.
플레이리스트에 수록된 곡과 태그의 절반 또는 전부가 숨겨져 있을 때, 주어지지 않은 곡들과 태그를 예측합니다.
Public 상위 4% 랭크했습니다.
https://arena.kakao.com/c/8

Split data

First, put original train.json data in res/ folder.

Run split_data.py

>python split_data.py run res/train.json

Check your directory

$> tree -d
.
├── arena_data (new directory!)
│   ├── answers
│   	├── val.json	# the answer to 'questions/val.json'
│   ├── orig
│   	├── train.json	# 80% original data
│   	├── val.json	# 20% original data
│   └── questions
│   	├── val.json	# masked data of 'orig/val.json'
│   
└── res

Train with orig/train.json and test with questions/val.json.
After prediction, I recommend you to save the result as results.json and put that in arena_data/results/ directory.
Then you should run the evaluation code with answers/val.json and results/results.json.
This is Arena official github style. I will follow this steps :)

Requirements

pandas
numpy
sklearn
PyTorch
gensim

Usage

!python main.py

Process

1. CF, Sparse matrix

점수 rating(ply 담긴 순서) – 11만 x 63만 -> 500개(충분하게)
평가지표 함수 nDCG의 역함수로 가중치 함수 모델링.
최대값인 200(100 for val,test)까지 가중치가 1 미만으로 떨어지지 않게 조정.
x : ply에 담긴 순서.
https://www.desmos.com/calculator/lrbcbfdqjr

2. song, tags 채우기

case 1 : O O ( mf or autoencoder)
case 2 : O X ( X : predict tag by song, O : mf )
case 3 : X O ( X : predict song by tag, O : mf )
case 4 : X X ( X : title2song, title2tag)

3. Rerank

메타데이터 : Date, 장르, 가수, 플레이리스트 제목, tags, song 활용해서 부스팅으로 re-ranking, 각 곡에 대한 Factorization machine으로 score 계산해보기

Data 분포에 따른 가중곱
Title(word2vec) -> 장르(word2vec)이랑 비교해서 그 장르에 해당하는 노래에 가중치
Train set으로 artist density(unique score) 임계값 정하고(상위 n%) 이를 넘으면 가중치
Ply title autoencoder (title2rec.py)

Contributor

Hyun Lee(https://github.com/HyunLee103)
Hyelin Nam(https://github.com/HyelinNAM)
Kyojung Koo(https://github.com/koo616)
Sanghyung Jung(https://github.com/SangHyung-Jung)

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
neural_cf		neural_cf
rerank		rerank
res		res
README.md		README.md
arena_util.py		arena_util.py
autoencoding.py		autoencoding.py
cf_models.py		cf_models.py
cf_song-tag.py		cf_song-tag.py
evaluate.py		evaluate.py
koo.py		koo.py
main.py		main.py
mf_models.py		mf_models.py
mk_meta.py		mk_meta.py
ply_tag_embedding.py		ply_tag_embedding.py
split_data.py		split_data.py
title2rec.py		title2rec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

카카오 멜론 플레이리스트 추천시스템

Split data

Requirements

Usage

Process

1. CF, Sparse matrix

2. song, tags 채우기

3. Rerank

Contributor

About

Releases

Packages

Contributors 4

Languages

HyunLee103/Music_RecSys

Folders and files

Latest commit

History

Repository files navigation

카카오 멜론 플레이리스트 추천시스템

Split data

Requirements

Usage

Process

1. CF, Sparse matrix

2. song, tags 채우기

3. Rerank

Contributor

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages