Skip to content

alxmamaev/IDAO-contest

Repository files navigation

IDAO-contest

Higher School of Economics, Yandex and Sberbank along with Harbour.Space University are proud to announce an olympiad created by and for data analysts.

Track-1

Human behaviour isn’t governed by the rules of logic. It tends to defy even the shrewdest predictions, so successfully forecasting the future desires of just a small fraction of users would be a major achievement. Your task is having a browsing history of a large number of users to select a small sample group — 5% of users — and recommend five product categories for each person. At least one of these picks must be something that doesn’t interest the user right now, but will interest them during the next week.

Evaluation

The task is to choose exactly 53,979 users (user_id, 5% of all users in the dataset) and for each select five third-level product categories (id3) that they have not viewed in the last three weeks and which will be of interest to them in the next seven days. The resulting score is based on the number of users for which at least one product category is correctly nominated. Accurate predictions of two or more categories for one user will not improve your score.

Input format

Input format You will be working with Yandex.Market search logs. Each row in the data corresponds to a "view" event: a particular user viewed an item that belongs to a particular category. The data is stored in a .csv file with the following fields:

user_id — individual shopper identifier date — the day when user’s interest in a particular product was recorded; from 1 to 54 id1 — first (highest) level category identifier, e.g. “Home appliances”. id2 — second (middle) level category identifier, e.g. “Kitchen appliances”. id3 — third (lowest) level category identifier, e.g. “Refrigerators”. The data can be downloaded using this link.

Output format

Please upload your predictions into the system in the .csv format. The file should consist of 53,979 + 1 rows and contain columns user_id, id3_1, id3_2, id3_3, id3_4, id3_5. A sample submission can be found here.


Track-2

In this task you need to create program, that wiil be solve first task for 5k users. You need to upload your program in .zip file into contest platform.

Output format

You need to submit a .zip archive that includes a Makefile with tags "build" and "run" that will be executed one after another in a container. The log, produced during "build" phase will be visible on the submission page, so it is possible to debug the installation. For the "Run" phase your code should process the data stored in ./train.csv.zip and is expected to produce a ./submission.csv file with predictions. The submission file should consist of 5000 + 1 rows and contain columns user_id, id3_1, id3_2, id3_3, id3_4, id3_5.

Using Python

The container has python 2.7/3.6 installed with the major libraries:

  • numpy
  • scipy
  • pandas
  • scikit-learn
  • matplotlib
  • joblib
  • tqdm
  • xgboost
  • lightgbm
  • catboost

About

International datascience olimpiad

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published