Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple label columns #80

Merged
merged 2 commits into from
Dec 6, 2022
Merged

Support multiple label columns #80

merged 2 commits into from
Dec 6, 2022

Conversation

edknv
Copy link
Contributor

@edknv edknv commented Dec 6, 2022

Dataloader currently only returns the first label column if there are multiple label columns (permalink), but this might be problematic for Merlin Models and Transformers4rec, which expect a dictionary output for the labels when there are multiple labels. This PR implements support for when there are multiple label columns.

Note that Keras expects the name of the output layer to match and users will have to specify the output layer name if there multiple columns. This should not be an issue for Merlin Models.

Testing

Additional unit test.
Tested with NVIDIA-Merlin/models#845 (manually with corresponding changes)

@edknv edknv added the enhancement New feature or request label Dec 6, 2022
@edknv edknv added this to the Merlin 22.12 milestone Dec 6, 2022
@edknv edknv self-assigned this Dec 6, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #80 of commit a7b86154546994580b54758ac69473e7b4f642c3, no merge conflicts.
Running as SYSTEM
Setting status of a7b86154546994580b54758ac69473e7b4f642c3 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_dataloader/139/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_dataloader/workspace
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/dataloader # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/dataloader
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/dataloader +refs/pull/80/*:refs/remotes/origin/pr/80/* # timeout=10
 > git rev-parse a7b86154546994580b54758ac69473e7b4f642c3^{commit} # timeout=10
Checking out Revision a7b86154546994580b54758ac69473e7b4f642c3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a7b86154546994580b54758ac69473e7b4f642c3 # timeout=10
Commit message: "Support multiple label columns"
 > git rev-list --no-walk 02f233a2b8c14a5df19de9d749d06e97221dd10d # timeout=10
[workspace] $ /bin/bash /tmp/jenkins8979695637588379939.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_dataloader/dataloader/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu
test-gpu installdeps: -rrequirements/dev.txt, pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/.tmp/package/1/merlin-dataloader-0.0.2+23.ga7b8615.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.22,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.22,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cpplint==1.6.1,cryptography==38.0.4,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-dataloader==0.0.2+23.ga7b8615,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,moto==4.0.11,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nbsphinx==0.8.10,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,npy-append-array==0.9.13,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,responses==0.22.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-markdown-tables==0.0.15,sphinx-multiversion==0.2.4,sphinx-rtd-theme==1.1.1,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@8d23fdd32ee0b2f2d2ee091ac8eb8c1e88271dd4,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,types-toml==0.10.8.1,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,xmltodict==0.13.0,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3027935774'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-muk7oqgz
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-muk7oqgz
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 4f73ff5bd4121c1acaabdc01a123af4f986ffc78
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (0.55.1)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.3.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.2.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (7.0.0)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.5.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.10.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (21.3)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (4.64.1)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.3.5)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (3.19.5)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.3.0)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (0.4.3)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (0.12.0)
Requirement already satisfied: jinja2 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (3.0.3)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.0.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (8.1.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (6.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.4.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (0.38.1)
Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (1.20.3)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (65.5.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.9.0+14.g4f73ff5) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.9.0+14.g4f73ff5-py3-none-any.whl size=119010 sha256=79238cfc6ef5f4de93fafdf872671ece2e4d49e32265ac7929c678555ae7a65c
  Stored in directory: /tmp/pip-ephem-wheel-cache-veghcpdp/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.9.0
    Uninstalling merlin-core-0.9.0:
      Successfully uninstalled merlin-core-0.9.0
Successfully installed merlin-core-0.9.0+14.g4f73ff5
test-gpu run-test: commands[1] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/nvtabular.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/nvtabular.git
  Cloning https://github.com/NVIDIA-Merlin/nvtabular.git to /tmp/pip-req-build-2thfr0k2
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/nvtabular.git /tmp/pip-req-build-2thfr0k2
  Resolved https://github.com/NVIDIA-Merlin/nvtabular.git to commit 51af616069689b3ba57e8842a6f4a20377795df7
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+13.g51af6160) (1.8.1)
Requirement already satisfied: merlin-core>=0.2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from nvtabular==1.6.0+13.g51af6160) (0.9.0+14.g4f73ff5)
Requirement already satisfied: merlin-dataloader>=0.0.2 in ./.tox/test-gpu/lib/python3.8/site-packages (from nvtabular==1.6.0+13.g51af6160) (0.0.2+23.ga7b8615)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (0.55.1)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2022.3.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.2.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (7.0.0)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2022.5.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.10.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (21.3)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (4.64.1)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.3.5)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (3.19.5)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2022.3.0)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from scipy->nvtabular==1.6.0+13.g51af6160) (1.20.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (0.4.3)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (0.12.0)
Requirement already satisfied: jinja2 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (3.0.3)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2.0.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (8.1.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (6.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2.4.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (65.5.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from jinja2->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (2.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+13.g51af6160) (6.0.1)
Building wheels for collected packages: nvtabular
  Building wheel for nvtabular (pyproject.toml): started
  Building wheel for nvtabular (pyproject.toml): finished with status 'done'
  Created wheel for nvtabular: filename=nvtabular-1.6.0+13.g51af6160-cp38-cp38-linux_x86_64.whl size=257601 sha256=3f6c8e417006782738d5c75eba60c6290fe1e515221d5d7f5d0b108046ca5c5a
  Stored in directory: /tmp/pip-ephem-wheel-cache-vl4uxxq_/wheels/8f/d9/f9/30f2cdc5bf8787fae6fdfe55afd6e1b493e619ec32c32ec40b
Successfully built nvtabular
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 1.1.1
    Not uninstalling nvtabular at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu
    Can't uninstall 'nvtabular'. No files were found to uninstall.
Successfully installed nvtabular-1.6.0+13.g51af6160
test-gpu run-test: commands[2] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_dataloader/dataloader, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 97 items / 1 skipped

tests/unit/dataloader/test_dataloader_backend.py .... [ 4%]
tests/unit/dataloader/test_tf_dataloader.py ............................ [ 32%]
...........s.... [ 49%]
tests/unit/dataloader/test_tf_embeddings.py ............ [ 61%]
tests/unit/dataloader/test_torch_dataloader.py ......................... [ 87%]
[ 87%]
tests/unit/dataloader/test_torch_embeddings.py ............ [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

tests/unit/dataloader/test_jax_dataloader.py:24
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_jax_dataloader.py:24: PytestUnknownMarkWarning: Unknown pytest.mark.jax - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.jax

tests/unit/dataloader/test_tf_dataloader.py:33
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_tf_dataloader.py:33: PytestUnknownMarkWarning: Unknown pytest.mark.tensorflow - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.tensorflow

tests/unit/dataloader/test_tf_embeddings.py:25
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_tf_embeddings.py:25: PytestUnknownMarkWarning: Unknown pytest.mark.tensorflow - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.tensorflow

tests/unit/dataloader/test_torch_dataloader.py:30
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_torch_dataloader.py:30: PytestUnknownMarkWarning: Unknown pytest.mark.torch - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.torch

tests/unit/dataloader/test_torch_embeddings.py:34
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_torch_embeddings.py:34: PytestUnknownMarkWarning: Unknown pytest.mark.torch - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.torch

tests/unit/dataloader/test_dataloader_backend.py::test_dataloader_seeding[128]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/dataloader/test_dataloader_backend.py::test_dataloader_empty_error[128]
/var/jenkins_home/workspace/merlin_dataloader/dataloader/merlin/dataloader/loader_base.py:85: UserWarning: no schema associated with the input dataset. Calling dataset.infer_schema to automatically generate
warnings.warn(

tests/unit/dataloader/test_tf_dataloader.py: 7 warnings
tests/unit/dataloader/test_torch_dataloader.py: 11 warnings
/var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:253: UserWarning: Initializing an NVTabular Dataset in CPU mode.This is an experimental feature with extremely limited support!
warnings.warn(

tests/unit/dataloader/test_torch_embeddings.py::test_embedding_torch_dl_with_lookup[None]
/var/jenkins_home/workspace/merlin_dataloader/dataloader/merlin/dataloader/ops/embeddings/torch_embedding_op.py:52: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
return torch.Tensor(values).to(torch.int32)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/dataloader/init.py 2 0 100%
merlin/dataloader/_version.py 354 205 42%
merlin/dataloader/jax.py 51 51 0%
merlin/dataloader/loader_base.py 448 39 91%
merlin/dataloader/ops/init.py 0 0 100%
merlin/dataloader/ops/embeddings/init.py 0 0 100%
merlin/dataloader/ops/embeddings/embedding_op.py 62 6 90%
merlin/dataloader/ops/embeddings/tf_embedding_op.py 19 0 100%
merlin/dataloader/ops/embeddings/torch_embedding_op.py 20 0 100%
merlin/dataloader/tensorflow.py 107 20 81%
merlin/dataloader/tf_utils.py 57 27 53%
merlin/dataloader/torch.py 66 8 88%
merlin/loader/init.py 4 4 0%
merlin/loader/jax.py 1 1 0%
merlin/loader/tensorflow.py 1 1 0%
merlin/loader/torch.py 1 1 0%

TOTAL 1193 363 70%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/dataloader/test_jax_dataloader.py:26: could not import 'jax': No module named 'jax'
SKIPPED [1] tests/unit/dataloader/test_tf_dataloader.py:526: not working correctly in ci environment
============ 96 passed, 2 skipped, 61 warnings in 141.44s (0:02:21) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/dataloader/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins18183185153756192353.sh

@github-actions
Copy link

github-actions bot commented Dec 6, 2022

Documentation preview

https://nvidia-merlin.github.io/dataloader/review/pr-80

@@ -626,9 +626,14 @@ def _handle_tensors(self, tensors, tensor_names):
# TODO: use dict for labels as well?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment can be updated along with this this change. We can probably remove the TODO and replace with a description of what is going on here.

@@ -626,9 +626,14 @@ def _handle_tensors(self, tensors, tensor_names):
# TODO: use dict for labels as well?
# would require output layers to match naming
# labels should not exist separately they should be a regular column
labels = None
if len(self.label_names) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks like it was previously not working for multiple columns. (we're silently ignoring targets and keeping them in the input data?)

Do we have any tests or functionality in Transformer4Rec that makes use of multiple targets? (cc. @sararb ) And if so, I'm wondering why this didn't cause any test to fail in this PR NVIDIA-Merlin/Transformers4Rec#547

I'd like to check that if we have muliple targets in Transformers4Rec and in PyTorch Models in general, that having the output of the 'y' here as a dict is going to work and the preferred option. (Which might inform whether or not we should make this output type configurable at this stage.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this test in Transformers4Rec that makes use of multiple targets but it is not using the merlin data loader. We are just manually creating the targets.

The data loader is only tested with the HF Trainer class and currently, this class is supporting only one next-item prediction task.

However, I am currently working on the T4Rec refactoring task that will extend the Trainer to support multiple tasks. So having the output of the dataloader 'y' here as a dict . will be important for this use-case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 3 different formats of the output y expected by in Merlin-Models/Transformers4Rec libraries:

  • If the schema does not include any target variable, the expected output is None
  • If the schema contains only one target variable, the expected output is a Tensor
  • If the schema contains multiple target variables, the expected output is a dict[feature_name, Tensor]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation @sararb . So this PR looks like it will satisfy both Merlin Models and Transformers4Rec then with the three possible output types as you described in the last comment.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #80 of commit 2972449b913ba50c8849c36da881cd24e79d5dd8, no merge conflicts.
Running as SYSTEM
Setting status of 2972449b913ba50c8849c36da881cd24e79d5dd8 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_dataloader/142/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_dataloader/workspace
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/dataloader # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/dataloader
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/dataloader +refs/pull/80/*:refs/remotes/origin/pr/80/* # timeout=10
 > git rev-parse 2972449b913ba50c8849c36da881cd24e79d5dd8^{commit} # timeout=10
Checking out Revision 2972449b913ba50c8849c36da881cd24e79d5dd8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2972449b913ba50c8849c36da881cd24e79d5dd8 # timeout=10
Commit message: "Replace comment"
 > git rev-list --no-walk 89b08e5fa62de2663059eed37a840522723c8a00 # timeout=10
[workspace] $ /bin/bash /tmp/jenkins9566664716934008070.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_dataloader/dataloader/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu
test-gpu installdeps: -rrequirements/dev.txt, pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/.tmp/package/1/merlin-dataloader-0.0.2+24.g2972449.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.23,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.23,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cpplint==1.6.1,cryptography==38.0.4,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-dataloader==0.0.2+24.g2972449,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,moto==4.0.11,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nbsphinx==0.8.10,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,npy-append-array==0.9.13,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,responses==0.22.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-markdown-tables==0.0.15,sphinx-multiversion==0.2.4,sphinx-rtd-theme==1.1.1,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@8d23fdd32ee0b2f2d2ee091ac8eb8c1e88271dd4,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,types-toml==0.10.8.1,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,xmltodict==0.13.0,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='891348832'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-70u4kqu8
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-70u4kqu8
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 4f73ff5bd4121c1acaabdc01a123af4f986ffc78
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (7.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (21.3)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (3.19.5)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.2.5)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.3.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (0.55.1)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.5.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (1.10.0)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.9.0+14.g4f73ff5) (2022.3.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.9.0+14.g4f73ff5) (4.64.1)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (0.4.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (0.12.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.2.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (5.4.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (8.1.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.7.0)
Requirement already satisfied: jinja2 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (3.0.3)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (6.2)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.0.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.4.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (0.38.1)
Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (1.20.3)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.9.0+14.g4f73ff5) (65.5.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.9.0+14.g4f73ff5) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.9.0+14.g4f73ff5) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (6.0.2)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.9.0+14.g4f73ff5) (2.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.9.0+14.g4f73ff5) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.9.0+14.g4f73ff5-py3-none-any.whl size=119010 sha256=bdd1fd92b8bce3e414bda266ebdfab0bb67fb84d42c6f5ace9f57dfd7dfed90f
  Stored in directory: /tmp/pip-ephem-wheel-cache-p64jr4iq/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.9.0
    Uninstalling merlin-core-0.9.0:
      Successfully uninstalled merlin-core-0.9.0
Successfully installed merlin-core-0.9.0+14.g4f73ff5
test-gpu run-test: commands[1] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/nvtabular.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/nvtabular.git
  Cloning https://github.com/NVIDIA-Merlin/nvtabular.git to /tmp/pip-req-build-ayjmjf5e
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/nvtabular.git /tmp/pip-req-build-ayjmjf5e
  Resolved https://github.com/NVIDIA-Merlin/nvtabular.git to commit 0f3a9b8d704173845c5d53cdf840ffa9ccf79080
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: merlin-dataloader>=0.0.2 in ./.tox/test-gpu/lib/python3.8/site-packages (from nvtabular==1.6.0+14.g0f3a9b8d) (0.0.2+24.g2972449)
Requirement already satisfied: merlin-core>=0.2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from nvtabular==1.6.0+14.g0f3a9b8d) (0.9.0+14.g4f73ff5)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+14.g0f3a9b8d) (1.8.1)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (7.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (21.3)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (3.19.5)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.2.5)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2022.3.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (0.55.1)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2022.5.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.10.0)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2022.3.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (4.64.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from scipy->nvtabular==1.6.0+14.g0f3a9b8d) (1.20.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (0.4.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (0.12.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2.2.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (5.4.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (8.1.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.7.0)
Requirement already satisfied: jinja2 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (3.0.3)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in ./.tox/test-gpu/lib/python3.8/site-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (6.2)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2.0.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2.4.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (65.5.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (6.0.2)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.tox/test-gpu/lib/python3.8/site-packages (from jinja2->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (2.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+14.g0f3a9b8d) (6.0.1)
Building wheels for collected packages: nvtabular
  Building wheel for nvtabular (pyproject.toml): started
  Building wheel for nvtabular (pyproject.toml): finished with status 'done'
  Created wheel for nvtabular: filename=nvtabular-1.6.0+14.g0f3a9b8d-cp38-cp38-linux_x86_64.whl size=257673 sha256=849556b53b04ae785c8c336efa3eac10df9a8a7ac2d43fc06190f72be67c95d5
  Stored in directory: /tmp/pip-ephem-wheel-cache-50v5os6g/wheels/8f/d9/f9/30f2cdc5bf8787fae6fdfe55afd6e1b493e619ec32c32ec40b
Successfully built nvtabular
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 1.1.1
    Not uninstalling nvtabular at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu
    Can't uninstall 'nvtabular'. No files were found to uninstall.
Successfully installed nvtabular-1.6.0+14.g0f3a9b8d
test-gpu run-test: commands[2] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_dataloader/dataloader, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 97 items / 1 skipped

tests/unit/dataloader/test_dataloader_backend.py .... [ 4%]
tests/unit/dataloader/test_tf_dataloader.py ............................ [ 32%]
...........s.... [ 49%]
tests/unit/dataloader/test_tf_embeddings.py ............ [ 61%]
tests/unit/dataloader/test_torch_dataloader.py ......................... [ 87%]
[ 87%]
tests/unit/dataloader/test_torch_embeddings.py ............ [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

tests/unit/dataloader/test_jax_dataloader.py:24
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_jax_dataloader.py:24: PytestUnknownMarkWarning: Unknown pytest.mark.jax - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.jax

tests/unit/dataloader/test_tf_dataloader.py:33
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_tf_dataloader.py:33: PytestUnknownMarkWarning: Unknown pytest.mark.tensorflow - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.tensorflow

tests/unit/dataloader/test_tf_embeddings.py:25
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_tf_embeddings.py:25: PytestUnknownMarkWarning: Unknown pytest.mark.tensorflow - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.tensorflow

tests/unit/dataloader/test_torch_dataloader.py:30
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_torch_dataloader.py:30: PytestUnknownMarkWarning: Unknown pytest.mark.torch - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.torch

tests/unit/dataloader/test_torch_embeddings.py:34
/var/jenkins_home/workspace/merlin_dataloader/dataloader/tests/unit/dataloader/test_torch_embeddings.py:34: PytestUnknownMarkWarning: Unknown pytest.mark.torch - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytestmark = pytest.mark.torch

tests/unit/dataloader/test_dataloader_backend.py::test_dataloader_seeding[128]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/dataloader/test_dataloader_backend.py::test_dataloader_empty_error[128]
/var/jenkins_home/workspace/merlin_dataloader/dataloader/merlin/dataloader/loader_base.py:85: UserWarning: no schema associated with the input dataset. Calling dataset.infer_schema to automatically generate
warnings.warn(

tests/unit/dataloader/test_tf_dataloader.py: 7 warnings
tests/unit/dataloader/test_torch_dataloader.py: 11 warnings
/var/jenkins_home/workspace/merlin_dataloader/dataloader/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:253: UserWarning: Initializing an NVTabular Dataset in CPU mode.This is an experimental feature with extremely limited support!
warnings.warn(

tests/unit/dataloader/test_torch_embeddings.py::test_embedding_torch_dl_with_lookup[None]
/var/jenkins_home/workspace/merlin_dataloader/dataloader/merlin/dataloader/ops/embeddings/torch_embedding_op.py:52: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
return torch.Tensor(values).to(torch.int32)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/dataloader/init.py 2 0 100%
merlin/dataloader/_version.py 354 205 42%
merlin/dataloader/jax.py 51 51 0%
merlin/dataloader/loader_base.py 448 39 91%
merlin/dataloader/ops/init.py 0 0 100%
merlin/dataloader/ops/embeddings/init.py 0 0 100%
merlin/dataloader/ops/embeddings/embedding_op.py 62 6 90%
merlin/dataloader/ops/embeddings/tf_embedding_op.py 19 0 100%
merlin/dataloader/ops/embeddings/torch_embedding_op.py 20 0 100%
merlin/dataloader/tensorflow.py 107 20 81%
merlin/dataloader/tf_utils.py 57 27 53%
merlin/dataloader/torch.py 66 8 88%
merlin/loader/init.py 4 4 0%
merlin/loader/jax.py 1 1 0%
merlin/loader/tensorflow.py 1 1 0%
merlin/loader/torch.py 1 1 0%

TOTAL 1193 363 70%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/dataloader/test_jax_dataloader.py:26: could not import 'jax': No module named 'jax'
SKIPPED [1] tests/unit/dataloader/test_tf_dataloader.py:526: not working correctly in ci environment
============ 96 passed, 2 skipped, 61 warnings in 132.60s (0:02:12) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/dataloader/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins5657960847449392754.sh

@edknv edknv merged commit 93a7ac7 into main Dec 6, 2022
@edknv edknv deleted the multiple_labels branch December 6, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants