-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add xgboost example #522
add xgboost example #522
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Click to view CI ResultsGitHub pull request #522 of commit e7c7255f4e5a9d2b500b50e2c1f48e49b49f4e2b, no merge conflicts. Running as SYSTEM Setting status of e7c7255f4e5a9d2b500b50e2c1f48e49b49f4e2b to PENDING with url https://10.20.13.93:8080/job/merlin_models/498/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse e7c7255f4e5a9d2b500b50e2c1f48e49b49f4e2b^{commit} # timeout=10 Checking out Revision e7c7255f4e5a9d2b500b50e2c1f48e49b49f4e2b (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e7c7255f4e5a9d2b500b50e2c1f48e49b49f4e2b # timeout=10 Commit message: "add xgboost example" > git rev-list --no-walk 6645ed406cb8f191064838dc877e4a28d4e3e9c4 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins13758519675208954468.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 454 items / 3 skipped |
Documentation preview |
Click to view CI ResultsGitHub pull request #522 of commit 27b8bfe47a5fbe68a917b75840961b197c2bc1aa, no merge conflicts. Running as SYSTEM Setting status of 27b8bfe47a5fbe68a917b75840961b197c2bc1aa to PENDING with url https://10.20.13.93:8080/job/merlin_models/535/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse 27b8bfe47a5fbe68a917b75840961b197c2bc1aa^{commit} # timeout=10 Checking out Revision 27b8bfe47a5fbe68a917b75840961b197c2bc1aa (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 27b8bfe47a5fbe68a917b75840961b197c2bc1aa # timeout=10 Commit message: "add edits" > git rev-list --no-walk b8aca8cca10eca7b49dc4567497894ff05df03af # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins10086752768738425287.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 462 items / 2 skipped |
Click to view CI ResultsGitHub pull request #522 of commit e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa, no merge conflicts. Running as SYSTEM Setting status of e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa to PENDING with url https://10.20.13.93:8080/job/merlin_models/536/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa^{commit} # timeout=10 Checking out Revision e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa # timeout=10 Commit message: "edit 07" > git rev-list --no-walk 27b8bfe47a5fbe68a917b75840961b197c2bc1aa # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins4277570765245801549.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
Click to view CI ResultsGitHub pull request #522 of commit 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc, no merge conflicts. Running as SYSTEM Setting status of 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc to PENDING with url https://10.20.13.93:8080/job/merlin_models/537/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc^{commit} # timeout=10 Checking out Revision 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc # timeout=10 Commit message: "Merge branch 'main' into xgboost_example" > git rev-list --no-walk e760ffbbd9e1f29bb8e2ca7c2ca5c60e541c5eaa # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins7260797542546902872.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
Click to view CI ResultsGitHub pull request #522 of commit f05adb351f934ca858ff21a83bedfb8c132db110, no merge conflicts. Running as SYSTEM Setting status of f05adb351f934ca858ff21a83bedfb8c132db110 to PENDING with url https://10.20.13.93:8080/job/merlin_models/538/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse f05adb351f934ca858ff21a83bedfb8c132db110^{commit} # timeout=10 Checking out Revision f05adb351f934ca858ff21a83bedfb8c132db110 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f f05adb351f934ca858ff21a83bedfb8c132db110 # timeout=10 Commit message: "add nb test" > git rev-list --no-walk 9ad09cc03a962a4a417f291aa096ce3c9fd2b8dc # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins6586512712802776481.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
@@ -0,0 +1,346 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is NVTabular dataset the preferred terminology to refer to: merlin.io.dataset.Dataset
or Merlin Dataset?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great point! To be honest, I think I have been imprecise here. Will change the wording to the one you suggested.
@@ -0,0 +1,346 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we're passing this directly to the model. The schema passed in still has both target columns. However, the logic in the code currently picks the target column based on the objective. e.g. objective="binary:logistic" -> will look for targets in schema that are tagged with Tags.BINARY_CLASSIFICATION. And in this case rating has tags (Tags.REGRESSION, Tags.TARGET), and rating_binary has tags (Tags.BINARY_CLASSIFICATION, Tags.TARGET). So if the objective were switched from "binary:logistic" to "reg:logistic" then the target will be switched from rating_binary to rating.
OBJECTIVES = {
"binary:logistic": Tags.BINARY_CLASSIFICATION,
"reg:logistic": Tags.REGRESSION,
"reg:squarederror": Tags.REGRESSION,
"rank:pairwise": Tags.TARGET,
"rank:ndcg": Tags.TARGET,
"rank:map": Tags.TARGET,
}
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, Targets can be explicitly specified with target_columns=[...]
in the constructor of the XGBoost class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this feels like too much magic. We could simplify the default (when no target columns are specfified) to rely only on the TARGET tag (e.g. all columns in schema with TARGET tag become the target for the model) and remove the extra complexity involving the objective and other tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is some amazing sleuthing there Oliver! 🙂 outstanding
I had an earlier version of the example where I was passing in target_columns
and then I must have omitted it here and it still worked.
I really like how this automatic selection of the target column provides a glimpse into how the Merlin Models API works, and what value it can add, thus I am adding a bit of prose to explain how it actually works, based on what you shared above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great Radek 🙂
Click to view CI ResultsGitHub pull request #522 of commit c366601c98f08397088ddbcbf99de080b41ccb1c, no merge conflicts. Running as SYSTEM Setting status of c366601c98f08397088ddbcbf99de080b41ccb1c to PENDING with url https://10.20.13.93:8080/job/merlin_models/547/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse c366601c98f08397088ddbcbf99de080b41ccb1c^{commit} # timeout=10 Checking out Revision c366601c98f08397088ddbcbf99de080b41ccb1c (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f c366601c98f08397088ddbcbf99de080b41ccb1c # timeout=10 Commit message: "remove file" > git rev-list --no-walk 74f24d3e3bcd45ae4161973dcd546745bb216638 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins14799215251113267759.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
Click to view CI ResultsGitHub pull request #522 of commit 90c539bb36f8ebb7838e95e356ffc12a74d4b9f9, no merge conflicts. Running as SYSTEM Setting status of 90c539bb36f8ebb7838e95e356ffc12a74d4b9f9 to PENDING with url https://10.20.13.93:8080/job/merlin_models/548/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse 90c539bb36f8ebb7838e95e356ffc12a74d4b9f9^{commit} # timeout=10 Checking out Revision 90c539bb36f8ebb7838e95e356ffc12a74d4b9f9 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 90c539bb36f8ebb7838e95e356ffc12a74d4b9f9 # timeout=10 Commit message: "implement comments from Oliver" > git rev-list --no-walk c366601c98f08397088ddbcbf99de080b41ccb1c # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins6233721795598812828.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high-level API: "high-level" is used as an adjective and should be hyphenated.
Instead of "currently", do you mind adding a time frame or a software version so that the sentence ages better? Maybe "For the Merlin Models v0.6.0 release, some xgboost
and implicit
models are supported."
nit: s/allows/enables/ -- I've been conditioned to avoid "allows" due to possible confusion related to granting permission.
"try out" can be "evaluate" if you want a $10 word. Up to you.
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My $0.02 is to remove the parens--they tend to diminish the importance of the words and what you have is no less important than the rest of the sentence.
I'd also break the second sentence into two:
"The dataset consists of userId
and movieId
pairings. For each record, a user rates a movie and the record includes additional information such as genre of the movie, age of the user, and so on."
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: My sugg is to use present tense nearly all the time: "The get_movielens
function downloads the movielens-100k
data for us and returns it materialized as a Merlin Dataset
."
Present tense is typically easier for the reader to understand and lends confidence and authority to your voice.
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Merlin Models API
Maybe "A key feature of the Merlin Models API is tagging."
s/can be/is/ and avoid parens again:
"You can tag your data once, during preprocessing, and this information is picked up during later steps such as additional preprocessing steps, training your model, serving the model, and so on."
I think I'd even use present tense in the last sentence--though clearly up to you:
"During preprocessing that is performed by the get_movielens
function, two columns in the dataset are assigned the Tags.TARGET
tag:"
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's anything grammatically wrong with "...to train with by passing..." but the two prepositions side-by-side caught my eye. I think you can remove "with" and not lose meaning. Up to you.
sugg: s/when constructing/when you construct/
Please add a colon after "the following:"
Maybe instead of "...do something else" it could be "...do something better" or "more powerful".
nit: avoid "xgboost" in plain text. Show it as inline code: xgboost
when it seems valuable to suggest coding or XGBoost without any styling if you don't need to suggest coding. It similar to using "Merlin Models" in plain text.
sugg: s/we will be setting/we will set/. Future tense seems OK here. Also, "Given this piece of information, the Merlin Models code can infer that we want to train..."
sugg: s/would not be/is not/: "...it further, it is not useful for training."
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sugg: style as "...an xgboost
model" or "...an XGBoost model"
sugg: s/that will predict/that predicts/
I find starting a sentence with inline code or a number awkward. My sugg is "For the rating_binary
column, a value of 1
indicates..." I'd style the 0
as inline code too because it is like directed user input. (Conversely, it's not that all numbers are inline code. It's OK to say the dataset has 100K records.)
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you: I wouldn't bother including the apostrophe in the inline code style.
Instead of "recently introduced," can you say something like "This class is introduced in the XGBoost 1.1 release and this data format provides..."
Instead of a link with "here" as the text, can you use "You can read more about it in this [article](...) from the RAPIDS AI channarticle
more present tense sugg: "...with early stopping, the training ceases as soon as the model starts overfitting..."
Maybe: "The verbose_eval
parameter specifies..." Up to you.
Reply via ReviewNB
@@ -0,0 +1,374 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"GPU RAM" is likely technically correct, but most of our docs refer to "GPU memory."
Reply via ReviewNB
I learned lots from both you and Oliver. Thank you both. If I can clarify anything in my comments, please let me know. |
* add example notebook * add integration test
ffb2dfa
to
01527fb
Compare
Click to view CI ResultsGitHub pull request #522 of commit ffb2dfa23c1d51388004e9f3379210ee08766897, no merge conflicts. Running as SYSTEM Setting status of ffb2dfa23c1d51388004e9f3379210ee08766897 to PENDING with url https://10.20.13.93:8080/job/merlin_models/554/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse ffb2dfa23c1d51388004e9f3379210ee08766897^{commit} # timeout=10 Checking out Revision ffb2dfa23c1d51388004e9f3379210ee08766897 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f ffb2dfa23c1d51388004e9f3379210ee08766897 # timeout=10 Commit message: "update" > git rev-list --no-walk 5cadf90a28caf6bbe8b9f78564716fa238b403d9 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins8749052371973519031.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 473 items / 2 skipped |
Thank you very much for your review @mikemckiernan, really appreciate it! 🙂 Learning a lot from you as well! I really appreciate all the explanations you included behind the changes you suggested, learned a lot from them. Thank you very much!!! I implemented all of your suggestions. And thank you both @oliverholworthy and @mikemckiernan for such a timely help on this one! 🙂 @bschifferer asked me to see if I can merge this before the end of the week and seems like I might be able to do so thx to your help! |
Click to view CI ResultsGitHub pull request #522 of commit 01527fb9f4a21c95cbc4b33465fd83f61a5f1ea6, no merge conflicts. Running as SYSTEM Setting status of 01527fb9f4a21c95cbc4b33465fd83f61a5f1ea6 to PENDING with url https://10.20.13.93:8080/job/merlin_models/555/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/522/*:refs/remotes/origin/pr/522/* # timeout=10 > git rev-parse 01527fb9f4a21c95cbc4b33465fd83f61a5f1ea6^{commit} # timeout=10 Checking out Revision 01527fb9f4a21c95cbc4b33465fd83f61a5f1ea6 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 01527fb9f4a21c95cbc4b33465fd83f61a5f1ea6 # timeout=10 Commit message: "Add xgboost example" > git rev-list --no-walk ffb2dfa23c1d51388004e9f3379210ee08766897 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins5001041357408556881.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 475 items / 2 skipped |
* add example notebook * add integration test
I have begun to work on an example for an
xgboost
model having been asked to do so by @bschifferer (please see the notebook in the PR). Being able to pass aDataset
directly toxgboost
and the context manager for starting a dask cluster feel great!But the problem is that with the current integration
xgboost
would be very hard to use in practice.xgboost
makes it very easy to overfit and so you really need to use early stopping. But early stopping evaluated on the train set is not helpful, you really need to be able to run this on your validation set. On top of that, it is really helpful to be able to pass your own custom metric both to monitor the training and to evaluate when to stop.In the notebook, I implement a minimum workflow with
xgboost
that I think would be good to have supported by Merlin Models.As far as I can tell, what we might be missing:
✅ passing a validation set for monitoring / early stopping in training
✅ passing a custom metric
✅ ability to specify what we run the custom metric on (that is an extra I guess, since not being to specify to run it only one the validation set only has the adverse effect of slowing things down, crowding the UI, though being able to limit this would be nice)
Now, if we would like to be on the cutting edge, and that might be a bit of a stretch (though this is an amazing functionality, especially when working with a lot of data), it would be amazing to support
xgb.DeviceQuantileDMatrix
as input source.More in an article from our rapids friends here. Can provide an example of how this could be used with the movielens dataset if that can be helpful.
Fixes #132