Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating integration tests for quick-start for ranking #1015

Merged
merged 10 commits into from Jul 4, 2023

Conversation

gabrielspmoreira
Copy link
Member

@gabrielspmoreira gabrielspmoreira commented Jun 16, 2023

Closes #667
This PR creates the integration tests for quick-start for ranking scripts, which includes preprocessing the TenRec dataset with different options and training ranking models on the preprocessed data.

Preprocessing tests

  • Check for basic preprocessing + target encoding, and proper tagging, dtype and number of rows and max values
  • Tests the available data split strategies: random, random_by_user, temporal
  • Tests the available filtering strategies: query string and min/max frequency for users and items
  • Tests frequency capping

Model building, training and evaluation tests

  • Trains single task-learning models with the model specific options: MLP, DLRM, DCN-v2, Wide&Deep, DeepFM
  • Trains single task-learning models with the model specific options: DLRM, MMOE, PLE

Data setup

These integration tests require a 10M rows sample of the TenRec dataset, which is available in this internal Google Drive (tenrec_ci.zip).
The data needs to be downloaded in the CI machine and uncompressed to /raid/data/tenrec_ci/, which is the standard path where our other CI datasets are (e.g. /raid/data/lastfm/preprocessed).
P.s. If needed, the path for the TenRec sample data can be set by using the CI_TENREC_DATA_PATH env variable

@gabrielspmoreira gabrielspmoreira self-assigned this Jun 16, 2023
@gabrielspmoreira gabrielspmoreira added chore Infrastructure update unit-test labels Jun 16, 2023
@gabrielspmoreira gabrielspmoreira added this to the Merlin 23.06 milestone Jun 16, 2023
@gabrielspmoreira gabrielspmoreira marked this pull request as draft June 16, 2023 20:34
@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-1015

@gabrielspmoreira gabrielspmoreira marked this pull request as ready for review June 20, 2023 01:25
@gabrielspmoreira gabrielspmoreira force-pushed the quick_start_tests branch 2 times, most recently from 984a0c4 to d0de505 Compare June 22, 2023 15:48
@jperez999 jperez999 merged commit eb656e6 into main Jul 4, 2023
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Infrastructure update unit-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create integration tests for ranking models on CI to track accuracy or performance regressions over time
3 participants