Allowing to pass in a vocab in Categorify #935

marcromeyn · 2021-07-12T12:36:29Z

This also fixes #763.

nvidia-merlin-bot · 2021-07-12T13:16:48Z

Click to view CI Results

GitHub pull request #935 of commit e3ddab778dd54622a171dddce447dc4d0b7352bc, no merge conflicts. Running as SYSTEM Setting status of e3ddab778dd54622a171dddce447dc4d0b7352bc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2766/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10 > git rev-parse e3ddab778dd54622a171dddce447dc4d0b7352bc^{commit} # timeout=10 Checking out Revision e3ddab778dd54622a171dddce447dc4d0b7352bc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e3ddab778dd54622a171dddce447dc4d0b7352bc # timeout=10 Commit message: "Allow to pass in vocabs in Categorify to fix make_feature_column_workflow" > git rev-list --no-walk 26310aeecd05c9a1eb772aadcdf90c754f406b70 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins393578207587878685.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command. Running black --check All done! ✨ 🍰 ✨ 108 files would be left unchanged. Running flake8 Running isort /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs warn(f"Likely recursive symlink detected to {resolved_path}") Skipped 1 files Running bandit Running pylint ************* Module bench.datasets.tools.train_hugectr bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member) bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-12 12:38:13.691478: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 12:38:14.911102: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-12 12:38:14.911164: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-12 12:38:14.912223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 12:38:14.913213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 12:38:14.913241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 12:38:14.913291: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-12 12:38:14.913326: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-07-12 12:38:14.913363: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-12 12:38:14.913396: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-12 12:38:14.913499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-12 12:38:14.913533: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-12 12:38:14.913569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-12 12:38:14.917442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88, got 80
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py ...........................Build timed out (after 40 minutes). Marking the build as failed.
Terminated
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5837560369399229333.sh

benfred · 2021-07-12T18:12:23Z

rerun tests

nvidia-merlin-bot · 2021-07-12T18:32:22Z

Click to view CI Results

GitHub pull request #935 of commit e3ddab778dd54622a171dddce447dc4d0b7352bc, no merge conflicts. Running as SYSTEM Setting status of e3ddab778dd54622a171dddce447dc4d0b7352bc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2770/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10 > git rev-parse e3ddab778dd54622a171dddce447dc4d0b7352bc^{commit} # timeout=10 Checking out Revision e3ddab778dd54622a171dddce447dc4d0b7352bc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e3ddab778dd54622a171dddce447dc4d0b7352bc # timeout=10 Commit message: "Allow to pass in vocabs in Categorify to fix make_feature_column_workflow" > git rev-list --no-walk d75180f8f20473ce56b86922c9d96c406b510d67 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins1604705900853845016.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command. Running black --check All done! ✨ 🍰 ✨ 108 files would be left unchanged. Running flake8 Running isort /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs warn(f"Likely recursive symlink detected to {resolved_path}") Skipped 1 files Running bandit Running pylint ************* Module bench.datasets.tools.train_hugectr bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member) bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-12 18:15:44.928307: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 18:15:46.432615: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-12 18:15:46.432789: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-12 18:15:46.434079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 18:15:46.435277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 18:15:46.435369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 18:15:46.435477: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-12 18:15:46.435541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-07-12 18:15:46.435603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-12 18:15:46.435662: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-12 18:15:46.435922: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-12 18:15:46.435985: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-12 18:15:46.436051: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-12 18:15:46.440595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88, got 80
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ...................... [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6560: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 232 38 112 19 82% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 351, 355, 396, 420, 422, 429
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 136 120 11 52% 118-168, 213-274, 305, 307, 331-343, 351-356, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 687, 690, 695->698
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 33 6 12 3 80% 21-22, 45, 47-48, 52
nvtabular/io/writer.py 171 13 64 5 91% 24-25, 51, 79, 125, 128, 205, 214, 217, 260, 281-283
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 560 58 315 43 87% 243, 258, 262, 270, 278, 280, 305, 324-325, 340, 351->363, 357-359, 369-373, 452-453, 547->549, 670, 706, 735->738, 739-741, 748-749, 762-764, 765->733, 781, 789, 791, 798->exit, 821, 824->827, 835, 860, 865, 881-884, 895, 899, 901, 913-916, 994, 996, 1025->1048, 1091, 1109->1114, 1113, 1123->1120, 1128->1120, 1136, 1144-1154
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 60->59, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 1 2 1 94% 25
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6238 1056 2462 276 81%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.87%
=========== 1118 passed, 9 skipped, 11 warnings in 949.01s (0:15:49) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2918128182569040717.sh

.gitignore

nvtabular/ops/categorify.py

benfred

this looks good!

There are a couple of minor things I'd like to see before merging:

We should move the pandas/cudf code into nvtabular/dispatch.py
We should add an entry to the Categorify docstring for the 'vocabs' parameter
I don't think this will work with encode_type='joint' (like when using this op to compute feature crosses). I'm not too concerned about this - but maybe we can throw an exception in the constructor if both vocabs and encode_type='joint' are both set?

nvidia-merlin-bot · 2021-07-14T08:44:35Z

Click to view CI Results

GitHub pull request #935 of commit 66d934f114befde0c104791058aae361259b8bae, no merge conflicts. Running as SYSTEM Setting status of 66d934f114befde0c104791058aae361259b8bae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2810/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10 > git rev-parse 66d934f114befde0c104791058aae361259b8bae^{commit} # timeout=10 Checking out Revision 66d934f114befde0c104791058aae361259b8bae (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 66d934f114befde0c104791058aae361259b8bae # timeout=10 Commit message: "Update .gitignore" > git rev-list --no-walk 842fead98542f6b9ea85b37f2e6c205087a88db3 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins8912513169153605297.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command. Running black --check All done! ✨ 🍰 ✨ 108 files would be left unchanged. Running flake8 Running isort /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs warn(f"Likely recursive symlink detected to {resolved_path}") Skipped 1 files Running bandit Running pylint ************* Module bench.datasets.tools.train_hugectr bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member) bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-14 08:30:53.379488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 08:30:54.600577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-14 08:30:54.601673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 08:30:54.602687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 08:30:54.602718: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 08:30:54.602765: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-14 08:30:54.602799: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-14 08:30:54.602833: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-14 08:30:54.602865: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-14 08:30:54.602927: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-14 08:30:54.602963: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-14 08:30:54.603004: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-14 08:30:54.607425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 232 38 112 19 82% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 351, 355, 396, 420, 422, 429
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 33 6 12 3 80% 21-22, 45, 47-48, 52
nvtabular/io/writer.py 171 13 64 5 91% 24-25, 51, 79, 125, 128, 205, 214, 217, 260, 281-283
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 560 68 315 45 85% 243, 258, 262, 270, 278, 280, 305, 324-325, 340, 351->363, 357-359, 369-373, 452-453, 471-474, 547->549, 670, 706, 735->738, 739-741, 748-749, 762-764, 765->733, 781, 789, 791, 798->exit, 821, 824->827, 835, 860, 865, 881-884, 895, 899, 901, 913-916, 994, 996, 1025->1048, 1031->1048, 1049-1054, 1091, 1109->1114, 1113, 1123->1120, 1128->1120, 1136, 1144-1154
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6238 1091 2462 282 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.24%
========== 1114 passed, 13 skipped, 11 warnings in 772.03s (0:12:52) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5396969967190544189.sh

nvidia-merlin-bot · 2021-07-14T17:26:44Z

Click to view CI Results

GitHub pull request #935 of commit 34c96879396fa1f3f7e3cee350b023fbb8e20c8f, no merge conflicts.
Running as SYSTEM
Setting status of 34c96879396fa1f3f7e3cee350b023fbb8e20c8f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2813/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 34c96879396fa1f3f7e3cee350b023fbb8e20c8f^{commit} # timeout=10
Checking out Revision 34c96879396fa1f3f7e3cee350b023fbb8e20c8f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 34c96879396fa1f3f7e3cee350b023fbb8e20c8f # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk f7c3db141b3a74d74ae59888f268a5259a93d3c5 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins9043349621935770599.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-14 17:12:59.410855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 17:13:00.671095: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-14 17:13:00.672192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 17:13:00.673193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 17:13:00.673223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 17:13:00.673271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-14 17:13:00.673305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-14 17:13:00.673339: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-14 17:13:00.673371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-14 17:13:00.673416: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-14 17:13:00.673447: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-14 17:13:00.673484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-14 17:13:00.677932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py F                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py ssss..................               [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________________ test_feature_column_utils ___________________________
def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]


  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")


tests/unit/test_tf_feature_columns.py:23:

nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow

features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   vocab_1 vocab_2

0       a       1

1       b       2

2       c       3

3       d       4
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            243     47    120     20    79%   33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     83     88     12    34%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    158    120     15    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          277     33    122     23    84%   238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     23    156     13    94%   33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     84    317     44    83%   230, 232, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    83      4     36      5    92%   108, 110, 152, 169->171, 205

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6252   1121   2478    281    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 79.74%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...

===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 775.54s (0:12:55) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins844051744038619054.sh

nvtabular/ops/categorify.py

tests/unit/test_tf_feature_columns.py

nvidia-merlin-bot · 2021-07-14T23:10:18Z

Click to view CI Results

GitHub pull request #935 of commit e6536dc85ed06f4d71bee522e0fb7cd564c45fb1, no merge conflicts.
Running as SYSTEM
Setting status of e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2814/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e6536dc85ed06f4d71bee522e0fb7cd564c45fb1^{commit} # timeout=10
Checking out Revision e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 # timeout=10
Commit message: "Update nvtabular/ops/categorify.py"
 > git rev-list --no-walk 34c96879396fa1f3f7e3cee350b023fbb8e20c8f # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3304963982174998869.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-14 22:56:35.337507: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 22:56:36.544179: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-14 22:56:36.545259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 22:56:36.546245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 22:56:36.546274: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 22:56:36.546321: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-14 22:56:36.546354: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-14 22:56:36.546388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-14 22:56:36.546421: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-14 22:56:36.546480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-14 22:56:36.546515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-14 22:56:36.546552: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-14 22:56:36.550738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py F                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py ssss..................               [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e4790>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_0'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9271a8b80>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_1'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab406cd0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_2'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab4c5220>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_3'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e0430>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_4'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab413e20>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_5'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae6dea90>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_6'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab5c0be0>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_7'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ad9bd310>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_8'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9af825e50>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_9'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab588b50>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_10'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb947128df0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_11'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9add4cd90>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_12'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae6f4d00>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_13'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab40c4f0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_14'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e0cd0>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_15'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ad9beeb0>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_16'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae7118e0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_17'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________________ test_feature_column_utils ___________________________
def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]


  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")


tests/unit/test_tf_feature_columns.py:23:

nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow

features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))

self = <nvtabular.ops.categorify.Categorify object at 0x7fb8f6097640>

freq_threshold = 0, out_path = None, tree_width = None, na_sentinel = None

cat_cache = 'host', dtype = None, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   vocab_1 vocab_2

0       a       1

1       b       2

2       c       3

3       d       4

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            243     47    120     20    79%   33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     83     88     12    34%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    158    120     15    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          277     33    122     23    84%   238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     23    156     13    94%   33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     83    317     43    83%   230, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    83      4     36      5    92%   108, 110, 152, 169->171, 205

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6252   1120   2478    280    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 79.76%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...

===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 773.36s (0:12:53) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6167220704054926419.sh

nvtabular/ops/categorify.py

benfred · 2021-07-14T23:44:16Z

rerun tests

nvidia-merlin-bot · 2021-07-14T23:59:43Z

Click to view CI Results

GitHub pull request #935 of commit 4902361e79f0abf2e19dede2ec01adb30f4201e9, no merge conflicts.
Running as SYSTEM
Setting status of 4902361e79f0abf2e19dede2ec01adb30f4201e9 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2816/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 4902361e79f0abf2e19dede2ec01adb30f4201e9^{commit} # timeout=10
Checking out Revision 4902361e79f0abf2e19dede2ec01adb30f4201e9 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4902361e79f0abf2e19dede2ec01adb30f4201e9 # timeout=10
Commit message: "Update nvtabular/ops/categorify.py"
 > git rev-list --no-walk 4902361e79f0abf2e19dede2ec01adb30f4201e9 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins722899612749464217.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-14 23:46:01.940212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 23:46:03.147989: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-14 23:46:03.149073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 23:46:03.150085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-14 23:46:03.150118: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-14 23:46:03.150167: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-14 23:46:03.150203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-14 23:46:03.150241: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-14 23:46:03.150275: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-14 23:46:03.150322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-14 23:46:03.150356: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-14 23:46:03.150396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-14 23:46:03.154679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py .                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py ssss..................               [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            243     41    120     20    82%   33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 335-338, 368, 372, 413, 437, 439, 446

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    158    120     15    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          277     33    122     23    84%   238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     23    156     13    94%   33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     69    317     45    85%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->346, 349-356, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    83      4     36      5    92%   108, 110, 152, 169->171, 205

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6252   1095   2478    285    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.17%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

===== 18 failed, 1096 passed, 13 skipped, 11 warnings in 772.39s (0:12:52) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6498648267293293010.sh

…flow

Co-authored-by: Karl Higley <karlb@nvidia.com>

nvidia-merlin-bot · 2021-07-16T07:53:33Z

Click to view CI Results

GitHub pull request #935 of commit 4d32576175eab491f12be8bd7592166fc9dcbaf8, no merge conflicts.
Running as SYSTEM
Setting status of 4d32576175eab491f12be8bd7592166fc9dcbaf8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2829/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 4d32576175eab491f12be8bd7592166fc9dcbaf8^{commit} # timeout=10
Checking out Revision 4d32576175eab491f12be8bd7592166fc9dcbaf8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4d32576175eab491f12be8bd7592166fc9dcbaf8 # timeout=10
Commit message: "Addressing PR comments"
 > git rev-list --no-walk c8dd7eb2a0de01818574913bb0ed04af58e7f0aa # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins86938168095787545.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-16 07:39:50.510965: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-16 07:39:51.828519: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-16 07:39:51.829612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-16 07:39:51.830621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-16 07:39:51.830652: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-16 07:39:51.830702: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-16 07:39:51.830736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-16 07:39:51.830770: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-16 07:39:51.830802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-16 07:39:51.830848: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-16 07:39:51.830879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-16 07:39:51.830916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-16 07:39:51.834775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py F                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py ssss..................               [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

__________________________ test_feature_column_utils ___________________________
def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]


  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")


tests/unit/test_tf_feature_columns.py:23:

nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow

features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))

nvtabular/ops/categorify.py:231: in init

if encode_type == "joint" and vocabs:

self =   vocab_1 vocab_2

0       a       1

1       b       2

2       c       3

3       d       4
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            243     47    120     20    79%   33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     83     88     12    34%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    158    120     15    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          277     33    122     23    84%   238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     23    156     13    94%   33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     84    317     44    83%   230, 232, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    83      4     36      5    92%   108, 110, 152, 169->171, 205

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6252   1121   2478    281    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 79.74%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...

===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 772.97s (0:12:52) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins8811147866039403348.sh

nvidia-merlin-bot · 2021-07-16T08:03:13Z

Click to view CI Results

GitHub pull request #935 of commit 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa, no merge conflicts.
Running as SYSTEM
Setting status of 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2830/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa^{commit} # timeout=10
Checking out Revision 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa # timeout=10
Commit message: "Quick fix to try to make the tests pass"
 > git rev-list --no-walk 4d32576175eab491f12be8bd7592166fc9dcbaf8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins291720411122994120.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8
       cwd: /var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Complete output (18 lines):
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 280, in 
      main()
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 154, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 135, in _get_build_requires
      self.run_setup()
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 258, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 150, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 56, in 
      cmdclass = versioneer.get_cmdclass()
  AttributeError: module 'versioneer' has no attribute 'get_cmdclass'
  ----------------------------------------
WARNING: Discarding file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular. Command errored out with exit status 1: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8 Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8 Check the logs for full command output.
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1995729761163832366.sh

marcromeyn · 2021-07-16T08:07:08Z

@benfred The build is failing now with AttributeError: module 'versioneer' has no attribute 'get_cmdclass' which is unrelated to the changes in this PR. Any idea how to resolve this?

benfred · 2021-07-16T22:05:45Z

@benfred The build is failing now with AttributeError: module 'versioneer' has no attribute 'get_cmdclass' which is unrelated to the changes in this PR. Any idea how to resolve this?

I a different PR I had to change the cpu CI to work around this (https://github.com/NVIDIA/NVTabular/pull/926/files#diff-f48959cf62357a95aff9d53b6b9cdbbf3aa81317d24eb321c296a7d9fd6866b7R43). I've pushed the same change here to see if it helps

benfred · 2021-07-17T01:56:14Z

rerun tests

benfred · 2021-07-18T17:58:13Z

rerun tests

nvidia-merlin-bot · 2021-07-18T18:13:33Z

Click to view CI Results

GitHub pull request #935 of commit c5a82bef60fe1d123f389440f5c06ba383c03b9a, no merge conflicts.
Running as SYSTEM
Setting status of c5a82bef60fe1d123f389440f5c06ba383c03b9a to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2845/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse c5a82bef60fe1d123f389440f5c06ba383c03b9a^{commit} # timeout=10
Checking out Revision c5a82bef60fe1d123f389440f5c06ba383c03b9a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c5a82bef60fe1d123f389440f5c06ba383c03b9a # timeout=10
Commit message: "Update cpu-ci.yml"
 > git rev-list --no-walk 3779be9b16585f589c38fb944304c2ee61a3aac4 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7111287436070886359.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular-0.5.3+57.gc5a82be
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-18 17:59:59.918255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-18 18:00:01.106670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-18 18:00:01.107891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-18 18:00:01.109058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-18 18:00:01.109092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-18 18:00:01.109145: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-18 18:00:01.109183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-18 18:00:01.109221: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-18 18:00:01.109258: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-18 18:00:01.109310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-18 18:00:01.109345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-18 18:00:01.109388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-18 18:00:01.113816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py F                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py ssss..................               [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f836afa6730>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f836af8fa90>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83827331f0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83d4e8beb0>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4b3ad00>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f836af8fb80>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5'

tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None

on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False

num_buckets = None, vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4af96a0>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83d4f2c580>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c6b520>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c35e80>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c51e20>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4819940>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4bc3190>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f836aea1f10>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c53100>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4944850>

freq_threshold = 0

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f47e2310>

freq_threshold = 1

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]


  cat_features = cat_names >> ops.Categorify(


        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:

self = <nvtabular.ops.categorify.Categorify object at 0x7f83f40288b0>

freq_threshold = 2

out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17'

tree_width = None, na_sentinel = None, cat_cache = 'host'

dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E, max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

__________________________ test_feature_column_utils ___________________________
def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]


  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")


tests/unit/test_tf_feature_columns.py:23:

nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow

features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))

self = <nvtabular.ops.categorify.Categorify object at 0x7f833a7cbd00>

freq_threshold = 0, out_path = None, tree_width = None, na_sentinel = None

cat_cache = 'host', dtype = None, on_host = True, encode_type = 'joint'

name_sep = '_', search_sorted = False, num_buckets = None

vocabs =   vocab_1 vocab_2

0       a       1

1       b       2

2       c       3

3       d       4

max_size = 0
def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:


      raise ValueError("Passing in vocabs is not supported with a joint encoding.")


E           ValueError: Passing in vocabs is not supported with a joint encoding.
nvtabular/ops/categorify.py:232: ValueError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            243     47    120     20    79%   33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     83     88     12    34%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    158    120     15    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          277     33    122     23    84%   238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     23    156     13    94%   33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     83    317     43    83%   230, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    83      4     36      5    92%   108, 110, 152, 169->171, 205

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6252   1120   2478    280    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 79.76%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...

===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 767.19s (0:12:47) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1272602320064861379.sh

nvidia-merlin-bot · 2021-07-19T20:23:42Z

Click to view CI Results

GitHub pull request #935 of commit 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7, no merge conflicts. Running as SYSTEM Setting status of 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2861/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10 > git rev-parse 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7^{commit} # timeout=10 Checking out Revision 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 # timeout=10 Commit message: "Update nvtabular/ops/categorify.py" > git rev-list --no-walk e1595564fabec0c6f8756584c85cb18a20644485 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins1311705690513581580.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular-0.5.3+58.g1ef8e43 Running black --check All done! ✨ 🍰 ✨ 108 files would be left unchanged. Running flake8 Running isort /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") /usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs warn(f"Likely recursive symlink detected to {resolved_path}") Skipped 1 files Running bandit Running pylint ************* Module bench.datasets.tools.train_hugectr bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member) bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-19 20:23:30.423923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:23:31.801837: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 20:23:31.803007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:23:31.804050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:23:31.804110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:23:31.804179: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 20:23:31.804223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 20:23:31.804266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 20:23:31.804305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 20:23:31.804360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 20:23:31.804401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 20:23:31.804450: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 20:23:31.808386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))

Notebook error:
JSONDecodeError in examples/getting-started-movielens/03-Training-with-PyTorch.ipynb:
Expecting value: line 1 column 1 (char 0)
Terminated
make: *** [Makefile:20: html] Error 143
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6417902073353816443.sh

nvidia-merlin-bot · 2021-07-19T21:48:59Z

Click to view CI Results

GitHub pull request #935 of commit 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5, no merge conflicts.
Running as SYSTEM
Setting status of 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2865/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5^{commit} # timeout=10
Checking out Revision 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk e1595564fabec0c6f8756584c85cb18a20644485 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4922912435786826593.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (1.0.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (2021.4.1)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (5.4.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.53.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (1.1.5)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.20)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.0.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (2021.4.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2021.6.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.6.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (0.11.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.2.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (6.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (5.8.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2.0.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (8.0.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.0.2)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (57.4.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+60.g55e5ca9) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+60.g55e5ca9) (1.20.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (2021.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+60.g55e5ca9) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-19 20:53:30.774899: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-19 20:53:32.714569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-19 20:53:32.715798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-19 20:53:32.716946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-19 20:53:32.716984: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-19 20:53:32.717048: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-19 20:53:32.717085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-19 20:53:32.717122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-19 20:53:32.717156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-19 20:53:32.717207: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-19 20:53:32.717242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-19 20:53:32.717285: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-19 20:53:32.721900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1127 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 59%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py .                                                  [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py .                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..Terminated

Build was aborted

Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins8784313865889049440.sh

nvidia-merlin-bot · 2021-07-19T23:38:06Z

Click to view CI Results

GitHub pull request #935 of commit c3f615c853d66cd641df7578bdbb5fa5aa6d1667, no merge conflicts.
Running as SYSTEM
Setting status of c3f615c853d66cd641df7578bdbb5fa5aa6d1667 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2874/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse c3f615c853d66cd641df7578bdbb5fa5aa6d1667^{commit} # timeout=10
Checking out Revision c3f615c853d66cd641df7578bdbb5fa5aa6d1667 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c3f615c853d66cd641df7578bdbb5fa5aa6d1667 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk a6739dec03b2670528472355adbf1d16b917eb28 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5506574157971313495.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (0.0.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (2021.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (5.4.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (1.1.5)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (1.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (0.53.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.6.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (0.11.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2021.6.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2.4.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (57.4.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (5.8.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (8.0.1)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (6.1)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+63.gc3f615c) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+63.gc3f615c) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+63.gc3f615c) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-19 22:39:36.341045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-19 22:39:37.593701: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-19 22:39:37.594814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-19 22:39:37.595833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-19 22:39:37.595867: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-19 22:39:37.595932: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-19 22:39:37.595973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-19 22:39:37.596015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-19 22:39:37.596055: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-19 22:39:37.596110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-19 22:39:37.596150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-19 22:39:37.596198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-19 22:39:37.600251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1128 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 58%]

........................................................................ [ 65%]

...................                                                      [ 67%]

tests/unit/test_s3.py ..                                                 [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py .                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .....................Build timed out (after 60 minutes). Marking the build as failed.

Terminated

Build was aborted

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5120461984064153790.sh

nvidia-merlin-bot · 2021-07-20T03:28:16Z

Click to view CI Results

GitHub pull request #935 of commit feff50e7ba0870f9fd51c12f7e4957dffdda0b1f, no merge conflicts.
Running as SYSTEM
Setting status of feff50e7ba0870f9fd51c12f7e4957dffdda0b1f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2884/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse feff50e7ba0870f9fd51c12f7e4957dffdda0b1f^{commit} # timeout=10
Checking out Revision feff50e7ba0870f9fd51c12f7e4957dffdda0b1f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f feff50e7ba0870f9fd51c12f7e4957dffdda0b1f # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk 04f42e04b4040cc6fe412f6d7cac7d8da5d1db22 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6034723035166010996.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (2021.4.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (1.1.5)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (0.53.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (1.0.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (0.20)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (0.0.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2021.6.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.6.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (0.11.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (5.8.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (6.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.0.2)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (8.0.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (57.4.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+65.gfeff50e) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+65.gfeff50e) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+65.gfeff50e) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-20 03:13:37.149431: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 03:13:38.351703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-20 03:13:38.352822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 03:13:38.353868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 03:13:38.353898: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 03:13:38.353957: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-20 03:13:38.353994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-20 03:13:38.354030: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-20 03:13:38.354066: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-20 03:13:38.354112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-20 03:13:38.354147: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-20 03:13:38.354185: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-20 03:13:38.358233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1129 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]

........................................................................ [ 52%]

........................................................................ [ 58%]

........................................................................ [ 65%]

...................                                                      [ 66%]

tests/unit/test_s3.py ..                                                 [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py .                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py sssss..................              [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [2, 3], [3]]


E           assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]

E             At index 0 diff: [0] != [1]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            245     42    120     20    81%   33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    157    120     14    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          283     35    124     23    84%   43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     21    156     12    95%   33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      563     69    317     45    85%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->346, 349-356, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    89      7     38      6    90%   20-21, 113, 115, 117, 159, 176->178, 212

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6266   1098   2482    284    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.19%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

===== 18 failed, 1097 passed, 14 skipped, 11 warnings in 829.60s (0:13:49) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7042895182003705964.sh

benfred · 2021-07-20T04:06:27Z

@marcromeyn The unittests are failing here with what looks like an off by one error E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]] . The tests sorta assume that the index '0' is reserved for out of vocabulary/ unknown items - is this being handled?

nvidia-merlin-bot · 2021-07-20T15:09:10Z

Click to view CI Results

GitHub pull request #935 of commit f6183acc644a7d041a9bb2ebccd730fb4f81ef9c, has merge conflicts.
Running as SYSTEM
!!! PR mergeability status has changed !!!  
PR now has NO merge conflicts
Setting status of f6183acc644a7d041a9bb2ebccd730fb4f81ef9c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2891/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse f6183acc644a7d041a9bb2ebccd730fb4f81ef9c^{commit} # timeout=10
Checking out Revision f6183acc644a7d041a9bb2ebccd730fb4f81ef9c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f6183acc644a7d041a9bb2ebccd730fb4f81ef9c # timeout=10
Commit message: "Fixing prepend in dispatch._add_to_series"
 > git rev-list --no-walk 9a86adc67b82f2095ac56823719fbc8e54f66281 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2068987169614134547.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (1.1.5)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (0.20)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (1.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (0.53.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (0.0.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (0.11.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2021.6.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.2.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (6.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (8.0.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (57.4.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2.0.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.0.2)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (5.8.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+66.gf6183ac) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+66.gf6183ac) (1.20.2)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+66.gf6183ac) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-20 14:53:05.981958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 14:53:07.194806: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-20 14:53:07.195877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 14:53:07.196875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 14:53:07.196907: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 14:53:07.196956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-20 14:53:07.196991: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-20 14:53:07.197025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-20 14:53:07.197057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-20 14:53:07.197103: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-20 14:53:07.197136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-20 14:53:07.197174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-20 14:53:07.201385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1129 items
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 12%]

tests/unit/test_dataloader_backend.py .                                  [ 12%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 23%]

ssss..................................................                   [ 28%]

tests/unit/test_notebooks.py ......                                      [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 39%]

........................................................................ [ 46%]

........................................................................ [ 52%]

........................................................................ [ 58%]

........................................................................ [ 65%]

...................                                                      [ 66%]

tests/unit/test_s3.py ..                                                 [ 67%]

tests/unit/test_tf_dataloader.py ....................................... [ 70%]

.................................s                                       [ 73%]

tests/unit/test_tf_feature_columns.py .                                  [ 73%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 80%]

tests/unit/test_tools.py ......................                          [ 82%]

tests/unit/test_torch_dataloader.py .................................... [ 85%]

..............................................                           [ 89%]

tests/unit/test_triton_inference.py sssss..................              [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     18     82      5    87%   54, 87, 128, 152-165, 214, 301

nvtabular/dispatch.py                                            245     42    120     20    81%   33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     12     85      6    91%   60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      0     28      0   100%

nvtabular/framework_utils/torch/utils.py                          75      4     30      2    94%   64, 118-120

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    157    120     14    43%   118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 690, 694-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87      3     58      4    95%   32-33, 84

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-266

nvtabular/inference/triton/model_config_pb2.py                   299      0      2      0   100%

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          283     35    124     23    84%   43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     21    156     12    95%   33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     12    138      9    95%   142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     22     50      7    85%   57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         81     13     16      2    78%   25-27, 30-36, 111, 149-150

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      564     69    317     45    85%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 455-458, 531->533, 654, 690, 719->722, 723-725, 732-733, 746-748, 749->717, 765, 773, 775, 782->exit, 805, 808->811, 819, 844, 849, 865-868, 879, 883, 885, 897-900, 978, 980, 1009->1032, 1015->1032, 1033-1038, 1075, 1093->1098, 1097, 1107->1104, 1112->1104, 1120, 1128-1138

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             57      2     20      1    96%   92, 118

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    89      7     38      6    90%   20-21, 113, 115, 117, 159, 176->178, 212

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39      6     18      6    79%   59, 63, 77, 89, 94, 103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      3      2      1    87%   25, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     43     44      8    49%   30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           6267   1098   2482    284    80%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 80.19%

========== 1115 passed, 14 skipped, 11 warnings in 916.56s (0:15:16) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5399356434971752398.sh

nvidia-merlin-bot · 2021-07-20T15:09:41Z

Click to view CI Results

GitHub pull request #935 of commit 456607ae4dbbeb1d348c9a005ea253141c1a6f33, no merge conflicts.
Running as SYSTEM
Setting status of 456607ae4dbbeb1d348c9a005ea253141c1a6f33 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2892/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 456607ae4dbbeb1d348c9a005ea253141c1a6f33^{commit} # timeout=10
Checking out Revision 456607ae4dbbeb1d348c9a005ea253141c1a6f33 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 456607ae4dbbeb1d348c9a005ea253141c1a6f33 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk f6183acc644a7d041a9bb2ebccd730fb4f81ef9c # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5500282410223543448.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (2021.4.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (0.53.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (0.0.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (1.1.5)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (1.0.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.6.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.2.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (2021.6.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (0.11.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (2.4.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (57.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (6.1)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (5.8.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (8.0.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (2.0.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+68.g456607a) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+68.g456607a) (1.20.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (2021.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+68.g456607a) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._arange' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._encode_list_column' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._flatten_list_column' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._from_host' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._hash_series' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._is_list_dtype' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._parquet_writer_dispatch' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._read_parquet_dispatch' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._series_has_nulls' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins628072434382999635.sh

marcromeyn · 2021-07-20T15:15:02Z

rerun tests

nvidia-merlin-bot · 2021-07-20T15:49:03Z

Click to view CI Results

GitHub pull request #935 of commit 419b07f18b99926b9227a14a483b21758dbb4cae, no merge conflicts.
Running as SYSTEM
Setting status of 419b07f18b99926b9227a14a483b21758dbb4cae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2895/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 419b07f18b99926b9227a14a483b21758dbb4cae^{commit} # timeout=10
Checking out Revision 419b07f18b99926b9227a14a483b21758dbb4cae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
Commit message: "Fixing flake8"
 > git rev-list --no-walk 49044acc44eb8f7e64e2ba31509bbd363c6cf6e6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1907607570036027871.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (5.4.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.0.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.20)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.1.5)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (0.53.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2021.6.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.11.1)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (6.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (8.0.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.0.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (57.4.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (5.8.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+69.g419b07f) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-20 15:37:33.799125: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 15:37:35.001494: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-20 15:37:35.002584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 15:37:35.003578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 15:37:35.003607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 15:37:35.003654: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-20 15:37:35.003687: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-20 15:37:35.003722: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-20 15:37:35.003754: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-20 15:37:35.003800: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-20 15:37:35.003832: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-20 15:37:35.003869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-20 15:37:35.007886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1100 items / 2 skipped / 1098 selected
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 13%]

tests/unit/test_dataloader_backend.py .                                  [ 13%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 24%]

ssss..................................................                   [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 40%]

........................................................................ [ 46%]

........................................................................ [ 53%]

........................................................................ [ 59%]

........................................................................ [ 66%]

...................                                                      [ 68%]

tests/unit/test_s3.py ..                                                 [ 68%]

tests/unit/test_tf_dataloader.py ....................................... [ 71%]

.................................s                                       [ 75%]

tests/unit/test_tf_feature_columns.py .                                  [ 75%]

tests/unit/test_tf_layers.py ........................................... [ 79%]

...................................                                      [ 82%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .................................... [ 87%]

..............................................                           [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     42     82      4    72%   54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301

nvtabular/dispatch.py                                            245     42    120     21    81%   33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     13     85      6    90%   60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      1     28      1    97%   108

nvtabular/framework_utils/torch/utils.py                          75     13     30      3    79%   22, 25-33, 64, 118-120, 132->115

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    269    120      0     3%   30-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87     87     58      0     0%   27-150

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-267

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          283     35    124     23    84%   43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     21    156     12    95%   33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      9     16      4    64%   42-49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     13    138     11    95%   98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     23     50      8    84%   57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     27     20      5    44%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113

nvtabular/loader/torch.py                                         81     15     16      2    76%   25-27, 30-36, 111, 149-150, 190, 193

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      565     71    317     47    84%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->854, 866-869, 880, 884, 886, 898-901, 979, 981, 1010->1033, 1016->1033, 1034-1039, 1076, 1094->1099, 1098, 1108->1105, 1113->1105, 1121, 1129-1139

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             63      6     22      1    89%   62-66, 101, 127

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    89      8     38      8    87%   20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39     13     18      3    58%   59, 63, 77, 88-103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      4      2      1    84%   25, 99, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     44     44      9    47%   30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           5975   1369   2482    272    74%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 74.48%

========== 1091 passed, 11 skipped, 11 warnings in 640.17s (0:10:40) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins6983651131543558821.sh

nvidia-merlin-bot · 2021-07-20T16:04:56Z

Click to view CI Results

GitHub pull request #935 of commit 419b07f18b99926b9227a14a483b21758dbb4cae, no merge conflicts.
Running as SYSTEM
Setting status of 419b07f18b99926b9227a14a483b21758dbb4cae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2896/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 419b07f18b99926b9227a14a483b21758dbb4cae^{commit} # timeout=10
Checking out Revision 419b07f18b99926b9227a14a483b21758dbb4cae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
Commit message: "Fixing flake8"
 > git rev-list --no-walk 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2269438400353680660.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (0.53.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.1.5)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (5.4.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2021.6.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.11.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.2)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (5.8.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (57.4.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (8.0.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (6.1)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+69.g419b07f) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-20 15:50:57.598634: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 15:50:58.827801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-20 15:50:58.828892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 15:50:58.829898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 15:50:58.829928: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 15:50:58.829999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-20 15:50:58.830033: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-20 15:50:58.830066: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-20 15:50:58.830097: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-20 15:50:58.830143: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-20 15:50:58.830174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-20 15:50:58.830215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-20 15:50:58.834628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1100 items / 2 skipped / 1098 selected
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 13%]

tests/unit/test_dataloader_backend.py .                                  [ 13%]

tests/unit/test_io.py .................................................. [ 17%]

....................................................................ssss [ 24%]

ssss..................................................                   [ 29%]

tests/unit/test_ops.py ................................................. [ 33%]

........................................................................ [ 40%]

........................................................................ [ 46%]

........................................................................ [ 53%]

........................................................................ [ 59%]

........................................................................ [ 66%]

...................                                                      [ 68%]

tests/unit/test_s3.py ..                                                 [ 68%]

tests/unit/test_tf_dataloader.py ....................................... [ 71%]

.................................s                                       [ 75%]

tests/unit/test_tf_feature_columns.py .                                  [ 75%]

tests/unit/test_tf_layers.py ........................................... [ 79%]

...................................                                      [ 82%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .................................... [ 87%]

..............................................                           [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     42     82      4    72%   54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301

nvtabular/dispatch.py                                            245     42    120     21    81%   33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     13     85      6    90%   60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      1     28      1    97%   108

nvtabular/framework_utils/torch/utils.py                          75     13     30      3    79%   22, 25-33, 64, 118-120, 132->115

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    269    120      0     3%   30-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87     87     58      0     0%   27-150

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-267

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             179      7     68     11    93%   110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          283     35    124     23    84%   43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     21    156     12    95%   33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      9     16      4    64%   42-49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     13    138     11    95%   98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     23     50      8    84%   57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     27     20      5    44%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113

nvtabular/loader/torch.py                                         81     15     16      2    76%   25-27, 30-36, 111, 149-150, 190, 193

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      565     71    317     47    84%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->854, 866-869, 880, 884, 886, 898-901, 979, 981, 1010->1033, 1016->1033, 1034-1039, 1076, 1094->1099, 1098, 1108->1105, 1113->1105, 1121, 1129-1139

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             63      6     22      1    89%   62-66, 101, 127

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    89      8     38      8    87%   20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39     13     18      3    58%   59, 63, 77, 88-103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      4      2      1    84%   25, 99, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     44     44      9    47%   30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           5975   1369   2482    272    74%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 74.48%

========== 1091 passed, 11 skipped, 11 warnings in 791.23s (0:13:11) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7499799071554537368.sh

nvidia-merlin-bot · 2021-07-20T16:55:57Z

Click to view CI Results

GitHub pull request #935 of commit ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af, no merge conflicts.
Running as SYSTEM
Setting status of ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2902/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af^{commit} # timeout=10
Checking out Revision ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk deed53a14aead05524deeb50fd45ba75523b07ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3664444352533196477.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d
       cwd: /var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Complete output (16 lines):
  Traceback (most recent call last):
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in 
      main()
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 154, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 135, in _get_build_requires
      self.run_setup()
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 150, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 66, in 
      define_macros=[("VERSION_INFO", versioneer.get_version())],
  AttributeError: module 'versioneer' has no attribute 'get_version'
  ----------------------------------------
WARNING: Discarding file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular. Command errored out with exit status 1: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d Check the logs for full command output.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins817958748879255520.sh

nvidia-merlin-bot · 2021-07-20T20:59:28Z

Click to view CI Results

GitHub pull request #935 of commit e4cf7089ffbabdd33fc49a2c19c46123c3b524d0, no merge conflicts.
Running as SYSTEM
Setting status of e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2916/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e4cf7089ffbabdd33fc49a2c19c46123c3b524d0^{commit} # timeout=10
Checking out Revision e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk 40ccda0b43ac0bdfb3fc79837f66d7276fddcc77 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8599164028311122005.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.0)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.5.3+73.ge4cf708 is already the active version in easy-install.pth
Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular

Processing dependencies for nvtabular==0.5.3+73.ge4cf708

Searching for pyarrow==1.0.1

Best match: pyarrow 1.0.1

Adding pyarrow 1.0.1 to easy-install.pth file

Installing plasma_store script to /var/jenkins_home/.local/bin
Using /usr/local/lib/python3.8/dist-packages

Searching for tdqm==0.0.1

Best match: tdqm 0.0.1

Adding tdqm 0.0.1 to easy-install.pth file
Using /var/jenkins_home/.local/lib/python3.8/site-packages

Searching for numba==0.53.1

Best match: numba 0.53.1

Adding numba 0.53.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for pandas==1.1.5

Best match: pandas 1.1.5

Adding pandas 1.1.5 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for distributed==2021.4.1

Best match: distributed 2021.4.1

Adding distributed 2021.4.1 to easy-install.pth file

Installing dask-ssh script to /var/jenkins_home/.local/bin

Installing dask-scheduler script to /var/jenkins_home/.local/bin

Installing dask-worker script to /var/jenkins_home/.local/bin
Using /var/jenkins_home/.local/lib/python3.8/site-packages

Searching for dask==2021.4.1

Best match: dask 2021.4.1

Adding dask 2021.4.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for PyYAML==5.4.1

Best match: PyYAML 5.4.1

Adding PyYAML 5.4.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for numpy==1.20.2

Best match: numpy 1.20.2

Adding numpy 1.20.2 to easy-install.pth file

Installing f2py script to /var/jenkins_home/.local/bin

Installing f2py3 script to /var/jenkins_home/.local/bin

Installing f2py3.8 script to /var/jenkins_home/.local/bin
Using /usr/local/lib/python3.8/dist-packages

Searching for tqdm==4.61.2

Best match: tqdm 4.61.2

Adding tqdm 4.61.2 to easy-install.pth file

Installing tqdm script to /var/jenkins_home/.local/bin
Using /usr/local/lib/python3.8/dist-packages

Searching for setuptools==57.4.0

Best match: setuptools 57.4.0

Adding setuptools 57.4.0 to easy-install.pth file
Using /var/jenkins_home/.local/lib/python3.8/site-packages

Searching for llvmlite==0.36.0

Best match: llvmlite 0.36.0

Adding llvmlite 0.36.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for pytz==2021.1

Best match: pytz 2021.1

Adding pytz 2021.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for python-dateutil==2.8.2

Best match: python-dateutil 2.8.2

Adding python-dateutil 2.8.2 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for psutil==5.8.0

Best match: psutil 5.8.0

Adding psutil 5.8.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for toolz==0.11.1

Best match: toolz 0.11.1

Adding toolz 0.11.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for click==8.0.1

Best match: click 8.0.1

Adding click 8.0.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for msgpack==1.0.2

Best match: msgpack 1.0.2

Adding msgpack 1.0.2 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for cloudpickle==1.6.0

Best match: cloudpickle 1.6.0

Adding cloudpickle 1.6.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for zict==2.0.0

Best match: zict 2.0.0

Adding zict 2.0.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for tblib==1.7.0

Best match: tblib 1.7.0

Adding tblib 1.7.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for tornado==6.1

Best match: tornado 6.1

Adding tornado 6.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for sortedcontainers==2.4.0

Best match: sortedcontainers 2.4.0

Adding sortedcontainers 2.4.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for partd==1.2.0

Best match: partd 1.2.0

Adding partd 1.2.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for fsspec==2021.7.0

Best match: fsspec 2021.7.0

Adding fsspec 2021.7.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for six==1.15.0

Best match: six 1.15.0

Adding six 1.15.0 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for HeapDict==1.0.1

Best match: HeapDict 1.0.1

Adding HeapDict 1.0.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Searching for locket==0.2.1

Best match: locket 0.2.1

Adding locket 0.2.1 to easy-install.pth file
Using /usr/local/lib/python3.8/dist-packages

Finished processing dependencies for nvtabular==0.5.3+73.ge4cf708

Running black --check

All done! ✨ 🍰 ✨

109 files would be left unchanged.

Running flake8

Running isort

/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images

warn(f"Likely recursive symlink detected to {resolved_path}")

/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs

warn(f"Likely recursive symlink detected to {resolved_path}")

Skipped 1 files

Running bandit

Running pylint

************* Module nvtabular.ops.categorify

nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

************* Module nvtabular.ops.fill

nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

************* Module bench.datasets.tools.train_hugectr

bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb

Building docs

make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

2021-07-20 20:40:52.645434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 20:40:54.960475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2021-07-20 20:40:54.961883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 20:40:54.963186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0

coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s

2021-07-20 20:40:54.963282: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-20 20:40:54.963385: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2021-07-20 20:40:54.963454: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2021-07-20 20:40:54.963525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2021-07-20 20:40:54.963596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2021-07-20 20:40:54.963693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2021-07-20 20:40:54.963763: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2021-07-20 20:40:54.963847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-20 20:40:54.968808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document

warn("Container node skipped: type={0}".format(mdnode.t))

make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'

============================= test session starts ==============================

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml

plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0

collected 1112 items / 2 skipped / 1110 selected
tests/unit/test_column_group.py ..                                       [  0%]

tests/unit/test_column_similarity.py ........................            [  2%]

tests/unit/test_cpu_workflow.py ......                                   [  2%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 13%]

tests/unit/test_dataloader_backend.py .                                  [ 13%]

tests/unit/test_io.py .................................................. [ 17%]

........................................................................ [ 24%]

........ssssssss..................................................       [ 30%]

tests/unit/test_ops.py ................................................. [ 34%]

........................................................................ [ 40%]

.................................................FFFFFFFFFFFFFFFFFF..... [ 47%]

........................................................................ [ 53%]

........................................................................ [ 60%]

........................................................................ [ 66%]

...................                                                      [ 68%]

tests/unit/test_s3.py ..                                                 [ 68%]

tests/unit/test_tf_dataloader.py ....................................... [ 72%]

.................................s                                       [ 75%]

tests/unit/test_tf_feature_columns.py .                                  [ 75%]

tests/unit/test_tf_layers.py ........................................... [ 79%]

...................................                                      [ 82%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .................................... [ 87%]

..............................................                           [ 91%]

tests/unit/test_workflow.py ............................................ [ 95%]

................................................                         [100%]
=================================== FAILURES ===================================

_________________ test_categorify_lists[vocabs1-None-False-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_0')

freq_threshold = 0, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_1')

freq_threshold = 1, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-None-False-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_2')

freq_threshold = 2, cpu = False, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_3')

freq_threshold = 0, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_4')

freq_threshold = 1, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

__________________ test_categorify_lists[vocabs1-None-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_5')

freq_threshold = 2, cpu = True, dtype = None

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_6')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_7')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_8')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_9')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_10')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int32-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_11')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-0] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_12')

freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_13')

freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-False-2] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_14')

freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-0] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_15')

freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-1] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_16')

freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

_________________ test_categorify_lists[vocabs1-int64-True-2] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_17')

freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>

vocabs =   Authors

0  User_A

1  User_B

2  User_C

3  User_E
@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:


      assert compare == [[1], [1, 4], [3, 2], [2]]


E           assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]

E             At index 2 diff: [2, 3] != [3, 2]

E             Use -v to get the full diff
tests/unit/test_ops.py:475: AssertionError

=============================== warnings summary ===============================

tests/unit/test_ops.py::test_fill_missing[True-True-parquet]

tests/unit/test_ops.py::test_fill_missing[True-False-parquet]

tests/unit/test_ops.py::test_filter[parquet-0.1-True]

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

iloc._setitem_with_indexer(indexer, value)
tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]

tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

_ext.drop_duplicates(ignore_index=True, inplace=True)
tests/unit/test_ops.py::test_filter[parquet-0.1-True]

tests/unit/test_ops.py::test_filter[parquet-0.1-False]

tests/unit/test_ops.py::test_groupby_op[id-False]

tests/unit/test_ops.py::test_groupby_op[id-True]

/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.

warnings.warn(msg.format(n, len(r)))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
examples/multi-gpu-movielens/torch_trainer.py                     65      0      6      1    99%   32->36

nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        157     42     82      4    72%   54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301

nvtabular/dispatch.py                                            245     42    120     21    81%   33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     132     78     88     15    38%   29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     13     85      6    90%   60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              30      1     12      1    95%   47

nvtabular/framework_utils/torch/models.py                         45      1     28      1    97%   108

nvtabular/framework_utils/torch/utils.py                          75     13     30      3    79%   22, 25-33, 64, 118-120, 132->115

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           279    269    120      0     3%   30-700

nvtabular/inference/triton/benchmarking_tools.py                  52     52     10      0     0%   2-103

nvtabular/inference/triton/data_conversions.py                    87     87     58      0     0%   27-150

nvtabular/inference/triton/model.py                              140    140     66      0     0%   27-267

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               57      6     20      5    86%   22-23, 99, 103->107, 108, 110, 124

nvtabular/io/dask.py                                             180      7     70     11    93%   110, 113, 149, 225, 385->383, 413->416, 424, 428->430, 430->426, 435, 437

nvtabular/io/dataframe_engine.py                                  61      5     28      6    88%   19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125

nvtabular/io/dataset.py                                          289     35    126     21    86%   43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 710-711, 738, 745-746, 752, 758, 853-854, 970-975, 981, 1031

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          492     21    156     12    95%   33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938

nvtabular/io/shuffle.py                                           31      6     16      5    77%   42, 44-45, 49, 59, 63

nvtabular/io/writer.py                                           173     13     66      5    92%   24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      327     13    138     11    95%   98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558

nvtabular/loader/tensorflow.py                                   155     23     50      8    84%   57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404

nvtabular/loader/tf_utils.py                                      55     27     20      5    44%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113

nvtabular/loader/torch.py                                         81     15     16      2    76%   25-27, 30-36, 111, 149-150, 190, 193

nvtabular/ops/init.py                                         21      0      0      0   100%

nvtabular/ops/bucketize.py                                        32     10     18      3    62%   52-54, 58, 61-64, 83-86

nvtabular/ops/categorify.py                                      573     71    323     48    85%   230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->855, 866->870, 877-880, 891, 895, 897, 909-912, 990, 992, 1021->1044, 1027->1044, 1045-1050, 1087, 1105->1110, 1109, 1119->1116, 1124->1116, 1132, 1140-1150

nvtabular/ops/clip.py                                             18      2      6      3    79%   43, 51->53, 54

nvtabular/ops/column_similarity.py                               103     24     36      5    72%   19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245

nvtabular/ops/data_stats.py                                       56      2     22      3    94%   91->93, 95, 97->87, 102

nvtabular/ops/difference_lag.py                                   25      0      8      1    97%   66->68

nvtabular/ops/dropna.py                                            8      0      0      0   100%

nvtabular/ops/fill.py                                             63      6     22      1    89%   62-66, 101, 127

nvtabular/ops/filter.py                                           20      1      6      1    92%   49

nvtabular/ops/groupby.py                                          92      4     56      6    92%   71, 80, 82, 92->94, 104->109, 180

nvtabular/ops/hash_bucket.py                                      29      2     18      2    87%   69, 99

nvtabular/ops/hashed_cross.py                                     28      3     13      4    83%   50, 63, 77->exit, 78

nvtabular/ops/join_external.py                                    89      8     38      8    87%   20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212

nvtabular/ops/join_groupby.py                                     84      5     30      2    94%   106, 109->118, 194-195, 198-199

nvtabular/ops/lambdaop.py                                         39     13     18      3    58%   59, 63, 77, 88-103

nvtabular/ops/list_slice.py                                       63     24     26      1    56%   21-22, 52-53, 100-114, 122-133

nvtabular/ops/logop.py                                             8      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        70      8     14      2    86%   60->59, 67, 75-76, 109-110, 132-133, 137

nvtabular/ops/operator.py                                         29      4      2      1    84%   25, 99, 104, 109

nvtabular/ops/rename.py                                           23      3     14      3    84%   45, 66-68

nvtabular/ops/stat_operator.py                                     8      0      0      0   100%

nvtabular/ops/target_encoding.py                                 146     11     64      5    90%   147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      236      1     62      1    99%   323

nvtabular/tools/dataset_inspector.py                              49      7     18      1    79%   31-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                94     44     44      9    47%   30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152

nvtabular/worker.py                                               82      5     38      7    90%   24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113

nvtabular/workflow.py                                            156     11     73      4    93%   28-29, 45, 131, 145-147, 251, 280-281, 369
TOTAL                                                           5990   1366   2492    272    75%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 74.66%

=========================== short test summary info ============================

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...

FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...

==== 18 failed, 1085 passed, 11 skipped, 11 warnings in 1013.92s (0:16:53) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7770121283389326806.sh

nvidia-merlin-bot · 2021-07-20T22:27:21Z

Click to view CI Results

GitHub pull request #935 of commit 11f5b0379ff6074590a18147d6732f1644d1ca21, no merge conflicts. Running as SYSTEM Setting status of 11f5b0379ff6074590a18147d6732f1644d1ca21 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2918/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10 > git rev-parse 11f5b0379ff6074590a18147d6732f1644d1ca21^{commit} # timeout=10 Checking out Revision 11f5b0379ff6074590a18147d6732f1644d1ca21 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 11f5b0379ff6074590a18147d6732f1644d1ca21 # timeout=10 Commit message: "Fix to match frequency sorting categorify changes" > git rev-list --no-walk c4d3124cdebfa8d674b600f95ecff529443eecb0 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4765283923681026855.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0) Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.0) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.5.3+74.g11f5b03 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.5.3+74.g11f5b03
Searching for pyarrow==1.0.1
Best match: pyarrow 1.0.1
Adding pyarrow 1.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tdqm==0.0.1
Best match: tdqm 0.0.1
Adding tdqm 0.0.1 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for numba==0.53.1
Best match: numba 0.53.1
Adding numba 0.53.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pandas==1.1.5
Best match: pandas 1.1.5
Adding pandas 1.1.5 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Adding distributed 2021.4.1 to easy-install.pth file
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Adding dask 2021.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Adding PyYAML 5.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Adding tqdm 4.61.2 to easy-install.pth file
Installing tqdm script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for llvmlite==0.36.0
Best match: llvmlite 0.36.0
Adding llvmlite 0.36.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==57.4.0
Best match: setuptools 57.4.0
Adding setuptools 57.4.0 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tornado==6.1
Best match: tornado 6.1
Adding tornado 6.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Adding toolz 0.11.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Adding msgpack 1.0.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Adding sortedcontainers 2.4.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Adding psutil 5.8.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Adding cloudpickle 1.6.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Adding zict 2.0.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Adding tblib 1.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for click==8.0.1
Best match: click 8.0.1
Adding click 8.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for fsspec==2021.7.0
Best match: fsspec 2021.7.0
Adding fsspec 2021.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for partd==1.2.0
Best match: partd 1.2.0
Adding partd 1.2.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Adding HeapDict 1.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for locket==0.2.1
Best match: locket 0.2.1
Adding locket 0.2.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Finished processing dependencies for nvtabular==0.5.3+74.g11f5b03
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 22:16:03.309271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 22:16:04.593825: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 22:16:04.594912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 22:16:04.595907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 22:16:04.595940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 22:16:04.595990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 22:16:04.596025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 22:16:04.596060: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 22:16:04.596094: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 22:16:04.596142: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 22:16:04.596175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 22:16:04.596214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 22:16:04.600206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1112 items / 2 skipped / 1110 selected

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 13%]
tests/unit/test_dataloader_backend.py . [ 13%]
tests/unit/test_io.py .................................................. [ 17%]
........................................................................ [ 24%]
........ssssssss.................................................. [ 30%]
tests/unit/test_ops.py ................................................. [ 34%]
........................................................................ [ 40%]
........................................................................ [ 47%]
........................................................................ [ 53%]
........................................................................ [ 60%]
........................................................................ [ 66%]
................... [ 68%]
tests/unit/test_s3.py .. [ 68%]
tests/unit/test_tf_dataloader.py ....................................... [ 72%]
.................................s [ 75%]
tests/unit/test_tf_feature_columns.py . [ 75%]
tests/unit/test_tf_layers.py ........................................... [ 79%]
................................... [ 82%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .................................... [ 87%]
.............................................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 42 82 4 72% 54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301
nvtabular/dispatch.py 245 42 120 21 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 13 85 6 90% 60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 1 28 1 97% 108
nvtabular/framework_utils/torch/utils.py 75 13 30 3 79% 22, 25-33, 64, 118-120, 132->115
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 269 120 0 3% 30-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 87 58 0 0% 27-150
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-267
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 180 7 70 11 93% 110, 113, 149, 225, 385->383, 413->416, 424, 428->430, 430->426, 435, 437
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 289 35 126 21 86% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 710-711, 738, 745-746, 752, 758, 853-854, 970-975, 981, 1031
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 13 138 11 95% 98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 23 50 8 84% 57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 81 15 16 2 76% 25-27, 30-36, 111, 149-150, 190, 193
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 573 71 323 48 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->855, 866->870, 877-880, 891, 895, 897, 909-912, 990, 992, 1021->1044, 1027->1044, 1045-1050, 1087, 1105->1110, 1109, 1119->1116, 1124->1116, 1132, 1140-1150
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 63 6 22 1 89% 62-66, 101, 127
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 8 38 8 87% 20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 13 18 3 58% 59, 63, 77, 88-103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 4 2 1 84% 25, 99, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 44 44 9 47% 30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 5990 1366 2492 272 75%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 74.66%
========== 1103 passed, 11 skipped, 11 warnings in 628.57s (0:10:28) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4750676581666921359.sh

diggerk · 2021-08-20T00:47:56Z

nvtabular/framework_utils/tensorflow/feature_column_utils.py

@@ -227,7 +226,7 @@ def _get_parents(column):
        features += features_replaced_buckets

    if len(categorifies) > 0:
-        features += categorifies.keys() >> Categorify()
+        features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))


This line fails if features have vocabularies of varying size. In particular, the sample notebook "using-feature-columns.ipynb" fails on this line with the following error. Am I missing anything, or this is a bug here?

I used docker image "merlin-tensorflow-training" that had nvtabular 0.6.1 installed.

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_783/4041583971.py in <module> ----> 1 online_workflow, feature_columns = make_feature_column_workflow(feature_columns, "AdoptionSpeed") /nvtabular/nvtabular/framework_utils/tensorflow/feature_column_utils.py in make_feature_column_workflow(feature_columns, label_name, category_dir) 227 228 if len(categorifies) > 0: --> 229 features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies)) 230 231 if len(hashes) > 0: /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy) 466 467 elif isinstance(data, dict): --> 468 mgr = init_dict(data, index, columns, dtype=dtype) 469 elif isinstance(data, ma.MaskedArray): 470 import numpy.ma.mrecords as mrecords /usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype) 281 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays 282 ] --> 283 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) 284 285 /usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity) 76 # figure out the index, if necessary 77 if index is None: ---> 78 index = extract_index(arrays) 79 else: 80 index = ensure_index(index) /usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in extract_index(data) 395 lengths = list(set(raw_lengths)) 396 if len(lengths) > 1: --> 397 raise ValueError("arrays must all be same length") 398 399 if have_dicts: ValueError: arrays must all be same length

Thoughts on how to handle this @marcromeyn?

Thanks for reporting! That looks like a bug to me - I opened #1062 to track

Allow to pass in vocabs in Categorify to fix make_feature_column_workflow

marcromeyn requested a review from benfred July 12, 2021 12:38

karlhigley reviewed Jul 12, 2021

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

nvtabular/ops/categorify.py Outdated Show resolved Hide resolved

nvtabular/ops/categorify.py Outdated Show resolved Hide resolved

nvtabular/ops/categorify.py Show resolved Hide resolved

benfred reviewed Jul 13, 2021

View reviewed changes

benfred approved these changes Jul 14, 2021

View reviewed changes

nvtabular/ops/categorify.py Outdated Show resolved Hide resolved

tests/unit/test_tf_feature_columns.py Show resolved Hide resolved

benfred reviewed Jul 14, 2021

View reviewed changes

nvtabular/ops/categorify.py Outdated Show resolved Hide resolved

marcromeyn and others added 3 commits July 16, 2021 09:37

Allow to pass in vocabs in Categorify to fix make_feature_column_work…

47d6570

…flow

Update .gitignore

110d1df

Co-authored-by: Karl Higley <karlb@nvidia.com>

Addressing PR comments

4d32576

marcromeyn force-pushed the feature-cols-categorify branch from 4902361 to 4d32576 Compare July 16, 2021 07:38

Quick fix to try to make the tests pass

3ffc73f

benfred added 2 commits July 16, 2021 15:02

Merge branch 'main' into feature-cols-categorify

a80cdcb

Update cpu-ci.yml

c5a82be

benfred added 2 commits July 19, 2021 11:48

Update nvtabular/ops/categorify.py

1ef8e43

Merge branch 'main' into feature-cols-categorify

55e5ca9

Merge branch 'main' into feature-cols-categorify

c3f615c

Merge branch 'main' into feature-cols-categorify

feff50e

marcromeyn added 3 commits July 20, 2021 16:47

Fixing prepend in dispatch._add_to_series

f6183ac

Merge branch 'main' into feature-cols-categorify

456607a

Fixing flake8

419b07f

Merge branch 'main' into feature-cols-categorify

ad2f61e

Merge branch 'main' into feature-cols-categorify

e4cf708

Fix to match frequency sorting categorify changes

11f5b03

benfred merged commit df8b4db into main Jul 20, 2021

benfred deleted the feature-cols-categorify branch July 20, 2021 22:33

diggerk reviewed Aug 20, 2021

View reviewed changes

benfred mentioned this pull request Aug 24, 2021

[BUG] make_feature_column_workflow fails in Categorify if features have vocabularies of varying size. #1062

Closed

mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022

Allowing to pass in a vocab in Categorify (#935)

fd72d31

Allow to pass in vocabs in Categorify to fix make_feature_column_workflow

Allowing to pass in a vocab in Categorify #935

Allowing to pass in a vocab in Categorify #935

Conversation

marcromeyn commented Jul 12, 2021

nvidia-merlin-bot commented Jul 12, 2021

benfred commented Jul 12, 2021

nvidia-merlin-bot commented Jul 12, 2021

----------- coverage: platform linux, python 3.8.5-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred left a comment • edited Loading

Choose a reason for hiding this comment

nvidia-merlin-bot commented Jul 14, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 14, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 14, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Jul 14, 2021

nvidia-merlin-bot commented Jul 14, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 16, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 16, 2021

marcromeyn commented Jul 16, 2021

benfred commented Jul 16, 2021

benfred commented Jul 17, 2021

benfred commented Jul 18, 2021

nvidia-merlin-bot commented Jul 18, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 19, 2021

nvidia-merlin-bot commented Jul 19, 2021

nvidia-merlin-bot commented Jul 19, 2021

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Jul 20, 2021

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 20, 2021

marcromeyn commented Jul 20, 2021

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 20, 2021

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Jul 20, 2021

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

diggerk Aug 20, 2021

Choose a reason for hiding this comment

karlhigley Aug 20, 2021

Choose a reason for hiding this comment

benfred Aug 24, 2021

Choose a reason for hiding this comment

----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred left a comment •

edited

Loading

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing