언더샘플링 데이터 실험하기 (LG) #19

syleeie2310 · 2024-04-13T09:11:10Z

언더샘플링 데이터 실험하기 (LG)

AUC 기준으로 어떤 데이터가 좋을지 결정 필요

syleeie2310 · 2024-04-17T11:59:06Z

from recommenders.evaluation.spark_evaluation import SparkRankingEvaluation, SparkRatingEvaluation

evaluations = SparkRankingEvaluation(
dfs_test, # 테스트 데이터
dfs_pred_final, # 실제 prediction 데이터
col_user=COL_USER, # asin1
col_item=COL_ITEM, # asin2
col_rating=COL_RATING, # co-review cnts
col_prediction=COL_PREDICTION, # prob
k=10 # k 갯수
)

print(
"Precision@k = {}".format(evaluations.precision_at_k()),
"Recall@k = {}".format(evaluations.recall_at_k()),
"NDCG@k = {}".format(evaluations.ndcg_at_k()),
"Mean average precision = {}".format(evaluations.map_at_k()),
sep="\n"
)

syleeie2310 · 2024-04-17T11:59:23Z

https://github.com/recommenders-team/recommenders/blob/main/recommenders/evaluation/spark_evaluation.py

syleeie2310 · 2024-04-17T12:01:27Z

relevancy_method = top_k 면 10개 추천

by_threshold = 3으로 주면 테스트 데이터 알아서 빠짐.

syleeie2310 · 2024-04-28T08:23:36Z

추천 모델링 실험할 때 a,b / b,a중에 1개만 쓰기
0인 라벨 중에서 데이터 가져올 때 기준 상품과 동일한 상품 기준으로 몇개인지 확인한다음에 샘플링 하기 (ex, 1:1,1:2,1:3,1:5,1:10)
-1,2번 하고 로지스틱 traindata auc, precision, recall 확인 한 다음에 데이터 저장해서 지윤님 알려주기

syleeie2310 assigned minjongkim1234 Apr 13, 2024

This was referenced Apr 28, 2024

item2item 분류기 개발 (로지스틱 회귀분석) #20

Open

추천 모델(초안) 랭킹 평가 #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

언더샘플링 데이터 실험하기 (LG) #19

언더샘플링 데이터 실험하기 (LG) #19

syleeie2310 commented Apr 13, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 28, 2024

언더샘플링 데이터 실험하기 (LG) #19

언더샘플링 데이터 실험하기 (LG) #19

Comments

syleeie2310 commented Apr 13, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 17, 2024

syleeie2310 commented Apr 28, 2024