Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criteo Dataset #362

Open
AkihikoWatanabe opened this issue Jun 1, 2021 · 3 comments
Open

Criteo Dataset #362

AkihikoWatanabe opened this issue Jun 1, 2021 · 3 comments

Comments

@AkihikoWatanabe
Copy link
Owner

https://www.kaggle.com/c/criteo-display-ad-challenge/data

@AkihikoWatanabe
Copy link
Owner Author

AkihikoWatanabe commented Jun 1, 2021

Criteo Dataset (https://www.kaggle.com/c/criteo-display-ad-challenge/data)

DeepFM等のモデルで利用されているCTR Predictionのためのデータセット

Data Description

  • train.csv: 7日間のcriteoのtraffic recordの一部。個々の行が1 impに対応している。click, non-clickのラベル付き。chronologically order. click, non-clickのexampleはデータセットのサイズを縮小するために異なるrateでサブサンプルされている。
  • training: trainingデータと同様の作成データだが、trainingデータの翌日のデータで構成されている。

Data Fields

  • Label - Target variable that indicates if an ad was clicked (1) or not (0).
  • I1-I13 - A total of 13 columns of integer features (mostly count features).
  • C1-C26 - A total of 26 columns of categorical features. The values of these features have been hashed onto 32 bits for anonymization purposes.

13種類のinteger featureと、26種類のcategorical featuresがある。

@AkihikoWatanabe
Copy link
Owner Author

Avazu Data (https://www.kaggle.com/c/avazu-ctr-prediction/data)

File descriptions

  • train - Training set. 10 days of click-through data, ordered chronologically. Non-clicks and clicks are subsampled according to different strategies.
  • test - Test set. 1 day of ads to for testing your model predictions.
    sampleSubmission.csv - Sample submission file in the correct format, corresponds to the All-0.5 Benchmark.

Data fields

  • id: ad identifier
  • click: 0/1 for non-click/click
  • hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC.
  • C1 -- anonymized categorical variable
  • banner_pos
  • site_id
  • site_domain
  • site_category
  • app_id
  • app_domain
  • app_category
  • device_id
  • device_ip
  • device_model
  • device_type
  • device_conn_type
  • C14-C21 -- anonymized categorical variables

@AkihikoWatanabe
Copy link
Owner Author

基本的には click/non-click のラベルと、そのclick時の付帯情報によって構成されている模様

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant