Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

언더샘플링 데이터 실험하기 (LG) #19

Open
syleeie2310 opened this issue Apr 13, 2024 · 4 comments
Open

언더샘플링 데이터 실험하기 (LG) #19

syleeie2310 opened this issue Apr 13, 2024 · 4 comments
Assignees

Comments

@syleeie2310
Copy link
Contributor

언더샘플링 데이터 실험하기 (LG)

AUC 기준으로 어떤 데이터가 좋을지 결정 필요

@syleeie2310
Copy link
Contributor Author

from recommenders.evaluation.spark_evaluation import SparkRankingEvaluation, SparkRatingEvaluation

evaluations = SparkRankingEvaluation(
dfs_test, # 테스트 데이터
dfs_pred_final, # 실제 prediction 데이터
col_user=COL_USER, # asin1
col_item=COL_ITEM, # asin2
col_rating=COL_RATING, # co-review cnts
col_prediction=COL_PREDICTION, # prob
k=10 # k 갯수
)

print(
"Precision@k = {}".format(evaluations.precision_at_k()),
"Recall@k = {}".format(evaluations.recall_at_k()),
"NDCG@k = {}".format(evaluations.ndcg_at_k()),
"Mean average precision = {}".format(evaluations.map_at_k()),
sep="\n"
)

@syleeie2310
Copy link
Contributor Author

@syleeie2310
Copy link
Contributor Author

relevancy_method = top_k 면 10개 추천

by_threshold = 3으로 주면 테스트 데이터 알아서 빠짐.

@syleeie2310
Copy link
Contributor Author

  • 추천 모델링 실험할 때 a,b / b,a중에 1개만 쓰기
  • 0인 라벨 중에서 데이터 가져올 때 기준 상품과 동일한 상품 기준으로 몇개인지 확인한다음에 샘플링 하기 (ex, 1:1,1:2,1:3,1:5,1:10)
    -1,2번 하고 로지스틱 traindata auc, precision, recall 확인 한 다음에 데이터 저장해서 지윤님 알려주기

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants