Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing to pass in a vocab in Categorify #935

Merged
merged 16 commits into from
Jul 20, 2021
Merged

Conversation

marcromeyn
Copy link
Contributor

This also fixes #763.

@marcromeyn marcromeyn requested a review from benfred July 12, 2021 12:38
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit e3ddab778dd54622a171dddce447dc4d0b7352bc, no merge conflicts.
Running as SYSTEM
Setting status of e3ddab778dd54622a171dddce447dc4d0b7352bc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2766/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e3ddab778dd54622a171dddce447dc4d0b7352bc^{commit} # timeout=10
Checking out Revision e3ddab778dd54622a171dddce447dc4d0b7352bc (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e3ddab778dd54622a171dddce447dc4d0b7352bc # timeout=10
Commit message: "Allow to pass in vocabs in Categorify to fix make_feature_column_workflow"
 > git rev-list --no-walk 26310aeecd05c9a1eb772aadcdf90c754f406b70 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins393578207587878685.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-12 12:38:13.691478: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 12:38:14.911102: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-12 12:38:14.911164: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-12 12:38:14.912223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 12:38:14.913213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 12:38:14.913241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 12:38:14.913291: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-12 12:38:14.913326: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-07-12 12:38:14.913363: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-12 12:38:14.913396: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-12 12:38:14.913499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-12 12:38:14.913533: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-12 12:38:14.913569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-12 12:38:14.917442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88, got 80
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py ...........................Build timed out (after 40 minutes). Marking the build as failed.
Terminated
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5837560369399229333.sh

@benfred
Copy link
Member

benfred commented Jul 12, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit e3ddab778dd54622a171dddce447dc4d0b7352bc, no merge conflicts.
Running as SYSTEM
Setting status of e3ddab778dd54622a171dddce447dc4d0b7352bc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2770/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e3ddab778dd54622a171dddce447dc4d0b7352bc^{commit} # timeout=10
Checking out Revision e3ddab778dd54622a171dddce447dc4d0b7352bc (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e3ddab778dd54622a171dddce447dc4d0b7352bc # timeout=10
Commit message: "Allow to pass in vocabs in Categorify to fix make_feature_column_workflow"
 > git rev-list --no-walk d75180f8f20473ce56b86922c9d96c406b510d67 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1604705900853845016.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-12 18:15:44.928307: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 18:15:46.432615: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-12 18:15:46.432789: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-12 18:15:46.434079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 18:15:46.435277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-12 18:15:46.435369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-12 18:15:46.435477: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-12 18:15:46.435541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-07-12 18:15:46.435603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-12 18:15:46.435662: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-12 18:15:46.435922: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-12 18:15:46.435985: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-12 18:15:46.436051: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-12 18:15:46.440595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88, got 80
:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ...................... [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6560: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 232 38 112 19 82% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 351, 355, 396, 420, 422, 429
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 136 120 11 52% 118-168, 213-274, 305, 307, 331-343, 351-356, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 687, 690, 695->698
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 33 6 12 3 80% 21-22, 45, 47-48, 52
nvtabular/io/writer.py 171 13 64 5 91% 24-25, 51, 79, 125, 128, 205, 214, 217, 260, 281-283
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 560 58 315 43 87% 243, 258, 262, 270, 278, 280, 305, 324-325, 340, 351->363, 357-359, 369-373, 452-453, 547->549, 670, 706, 735->738, 739-741, 748-749, 762-764, 765->733, 781, 789, 791, 798->exit, 821, 824->827, 835, 860, 865, 881-884, 895, 899, 901, 913-916, 994, 996, 1025->1048, 1091, 1109->1114, 1113, 1123->1120, 1128->1120, 1136, 1144-1154
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 7 14 2 87% 60->59, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 1 2 1 94% 25
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6238 1056 2462 276 81%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.87%
=========== 1118 passed, 9 skipped, 11 warnings in 949.01s (0:15:49) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2918128182569040717.sh

.gitignore Outdated Show resolved Hide resolved
nvtabular/ops/categorify.py Outdated Show resolved Hide resolved
nvtabular/ops/categorify.py Outdated Show resolved Hide resolved
nvtabular/ops/categorify.py Show resolved Hide resolved
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good!

There are a couple of minor things I'd like to see before merging:

  1. We should move the pandas/cudf code into nvtabular/dispatch.py
  2. We should add an entry to the Categorify docstring for the 'vocabs' parameter
  3. I don't think this will work with encode_type='joint' (like when using this op to compute feature crosses). I'm not too concerned about this - but maybe we can throw an exception in the constructor if both vocabs and encode_type='joint' are both set?

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 66d934f114befde0c104791058aae361259b8bae, no merge conflicts.
Running as SYSTEM
Setting status of 66d934f114befde0c104791058aae361259b8bae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2810/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 66d934f114befde0c104791058aae361259b8bae^{commit} # timeout=10
Checking out Revision 66d934f114befde0c104791058aae361259b8bae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 66d934f114befde0c104791058aae361259b8bae # timeout=10
Commit message: "Update .gitignore"
 > git rev-list --no-walk 842fead98542f6b9ea85b37f2e6c205087a88db3 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8912513169153605297.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-14 08:30:53.379488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 08:30:54.600577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-14 08:30:54.601673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 08:30:54.602687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 08:30:54.602718: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 08:30:54.602765: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-14 08:30:54.602799: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-14 08:30:54.602833: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-14 08:30:54.602865: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-14 08:30:54.602927: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-14 08:30:54.602963: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-14 08:30:54.603004: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-14 08:30:54.607425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 232 38 112 19 82% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 351, 355, 396, 420, 422, 429
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 33 6 12 3 80% 21-22, 45, 47-48, 52
nvtabular/io/writer.py 171 13 64 5 91% 24-25, 51, 79, 125, 128, 205, 214, 217, 260, 281-283
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 560 68 315 45 85% 243, 258, 262, 270, 278, 280, 305, 324-325, 340, 351->363, 357-359, 369-373, 452-453, 471-474, 547->549, 670, 706, 735->738, 739-741, 748-749, 762-764, 765->733, 781, 789, 791, 798->exit, 821, 824->827, 835, 860, 865, 881-884, 895, 899, 901, 913-916, 994, 996, 1025->1048, 1031->1048, 1049-1054, 1091, 1109->1114, 1113, 1123->1120, 1128->1120, 1136, 1144-1154
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6238 1091 2462 282 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.24%
========== 1114 passed, 13 skipped, 11 warnings in 772.03s (0:12:52) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5396969967190544189.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 34c96879396fa1f3f7e3cee350b023fbb8e20c8f, no merge conflicts.
Running as SYSTEM
Setting status of 34c96879396fa1f3f7e3cee350b023fbb8e20c8f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2813/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 34c96879396fa1f3f7e3cee350b023fbb8e20c8f^{commit} # timeout=10
Checking out Revision 34c96879396fa1f3f7e3cee350b023fbb8e20c8f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 34c96879396fa1f3f7e3cee350b023fbb8e20c8f # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk f7c3db141b3a74d74ae59888f268a5259a93d3c5 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins9043349621935770599.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-14 17:12:59.410855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 17:13:00.671095: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-14 17:13:00.672192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 17:13:00.673193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 17:13:00.673223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 17:13:00.673271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-14 17:13:00.673305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-14 17:13:00.673339: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-14 17:13:00.673371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-14 17:13:00.673416: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-14 17:13:00.673447: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-14 17:13:00.673484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-14 17:13:00.677932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py F [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________________ test_feature_column_utils ___________________________

def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]
  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")

tests/unit/test_tf_feature_columns.py:23:


nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow
features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))
nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = vocab_1 vocab_2
0 a 1
1 b 2
2 c 3
3 d 4

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 243 47 120 20 79% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 83 88 12 34% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 84 317 44 83% 230, 232, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6252 1121 2478 281 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 79.74%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...
===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 775.54s (0:12:55) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins844051744038619054.sh

nvtabular/ops/categorify.py Outdated Show resolved Hide resolved
tests/unit/test_tf_feature_columns.py Show resolved Hide resolved
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit e6536dc85ed06f4d71bee522e0fb7cd564c45fb1, no merge conflicts.
Running as SYSTEM
Setting status of e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2814/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e6536dc85ed06f4d71bee522e0fb7cd564c45fb1^{commit} # timeout=10
Checking out Revision e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e6536dc85ed06f4d71bee522e0fb7cd564c45fb1 # timeout=10
Commit message: "Update nvtabular/ops/categorify.py"
 > git rev-list --no-walk 34c96879396fa1f3f7e3cee350b023fbb8e20c8f # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3304963982174998869.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-14 22:56:35.337507: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 22:56:36.544179: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-14 22:56:36.545259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 22:56:36.546245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 22:56:36.546274: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 22:56:36.546321: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-14 22:56:36.546354: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-14 22:56:36.546388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-14 22:56:36.546421: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-14 22:56:36.546480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-14 22:56:36.546515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-14 22:56:36.546552: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-14 22:56:36.550738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py F [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e4790>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_0'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9271a8b80>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_1'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab406cd0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_2'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab4c5220>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_3'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e0430>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_4'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab413e20>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_5'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae6dea90>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_6'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab5c0be0>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_7'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ad9bd310>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_8'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9af825e50>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_9'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab588b50>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_10'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb947128df0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_11'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9add4cd90>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_12'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae6f4d00>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_13'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ab40c4f0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_14'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9270e0cd0>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_15'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ad9beeb0>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_16'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7fb9ae7118e0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-1/test_categorify_lists_vocabs1_17'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________________ test_feature_column_utils ___________________________

def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]
  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")

tests/unit/test_tf_feature_columns.py:23:


nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow
features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))


self = <nvtabular.ops.categorify.Categorify object at 0x7fb8f6097640>
freq_threshold = 0, out_path = None, tree_width = None, na_sentinel = None
cat_cache = 'host', dtype = None, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = vocab_1 vocab_2
0 a 1
1 b 2
2 c 3
3 d 4
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 243 47 120 20 79% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 83 88 12 34% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 83 317 43 83% 230, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6252 1120 2478 280 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 79.76%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...
===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 773.36s (0:12:53) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6167220704054926419.sh

@benfred
Copy link
Member

benfred commented Jul 14, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 4902361e79f0abf2e19dede2ec01adb30f4201e9, no merge conflicts.
Running as SYSTEM
Setting status of 4902361e79f0abf2e19dede2ec01adb30f4201e9 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2816/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 4902361e79f0abf2e19dede2ec01adb30f4201e9^{commit} # timeout=10
Checking out Revision 4902361e79f0abf2e19dede2ec01adb30f4201e9 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4902361e79f0abf2e19dede2ec01adb30f4201e9 # timeout=10
Commit message: "Update nvtabular/ops/categorify.py"
 > git rev-list --no-walk 4902361e79f0abf2e19dede2ec01adb30f4201e9 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins722899612749464217.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-14 23:46:01.940212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 23:46:03.147989: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-14 23:46:03.149073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 23:46:03.150085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-14 23:46:03.150118: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 23:46:03.150167: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-14 23:46:03.150203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-14 23:46:03.150241: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-14 23:46:03.150275: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-14 23:46:03.150322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-14 23:46:03.150356: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-14 23:46:03.150396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-14 23:46:03.154679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 243 41 120 20 82% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 335-338, 368, 372, 413, 437, 439, 446
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 69 317 45 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->346, 349-356, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6252 1095 2478 285 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.17%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
===== 18 failed, 1096 passed, 13 skipped, 11 warnings in 772.39s (0:12:52) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6498648267293293010.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 4d32576175eab491f12be8bd7592166fc9dcbaf8, no merge conflicts.
Running as SYSTEM
Setting status of 4d32576175eab491f12be8bd7592166fc9dcbaf8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2829/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 4d32576175eab491f12be8bd7592166fc9dcbaf8^{commit} # timeout=10
Checking out Revision 4d32576175eab491f12be8bd7592166fc9dcbaf8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4d32576175eab491f12be8bd7592166fc9dcbaf8 # timeout=10
Commit message: "Addressing PR comments"
 > git rev-list --no-walk c8dd7eb2a0de01818574913bb0ed04af58e7f0aa # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins86938168095787545.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-16 07:39:50.510965: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-16 07:39:51.828519: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-16 07:39:51.829612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-16 07:39:51.830621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-16 07:39:51.830652: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-16 07:39:51.830702: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-16 07:39:51.830736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-16 07:39:51.830770: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-16 07:39:51.830802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-16 07:39:51.830848: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-16 07:39:51.830879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-16 07:39:51.830916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-16 07:39:51.834775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py F [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = Authors
0 User_A
1 User_B
2 User_C
3 User_E

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
__________________________ test_feature_column_utils ___________________________

def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]
  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")

tests/unit/test_tf_feature_columns.py:23:


nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow
features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))
nvtabular/ops/categorify.py:231: in init
if encode_type == "joint" and vocabs:


self = vocab_1 vocab_2
0 a 1
1 b 2
2 c 3
3 d 4

def __nonzero__(self):
  raise ValueError(
        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1329: ValueError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 243 47 120 20 79% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 83 88 12 34% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 84 317 44 83% 230, 232, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6252 1121 2478 281 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 79.74%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...
===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 772.97s (0:12:52) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8811147866039403348.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa, no merge conflicts.
Running as SYSTEM
Setting status of 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2830/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa^{commit} # timeout=10
Checking out Revision 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3ffc73f7038fc7fdc32657d0fb34e1167516e8aa # timeout=10
Commit message: "Quick fix to try to make the tests pass"
 > git rev-list --no-walk 4d32576175eab491f12be8bd7592166fc9dcbaf8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins291720411122994120.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8
       cwd: /var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Complete output (18 lines):
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 280, in 
      main()
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 154, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 135, in _get_build_requires
      self.run_setup()
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 258, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 150, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 56, in 
      cmdclass = versioneer.get_cmdclass()
  AttributeError: module 'versioneer' has no attribute 'get_cmdclass'
  ----------------------------------------
WARNING: Discarding file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular. Command errored out with exit status 1: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8 Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp42_0mrg8 Check the logs for full command output.
WARNING: You are using pip version 21.0.1; however, version 21.1.3 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1995729761163832366.sh

@marcromeyn
Copy link
Contributor Author

@benfred The build is failing now with AttributeError: module 'versioneer' has no attribute 'get_cmdclass' which is unrelated to the changes in this PR. Any idea how to resolve this?

@benfred
Copy link
Member

benfred commented Jul 16, 2021

@benfred The build is failing now with AttributeError: module 'versioneer' has no attribute 'get_cmdclass' which is unrelated to the changes in this PR. Any idea how to resolve this?

I a different PR I had to change the cpu CI to work around this (https://github.com/NVIDIA/NVTabular/pull/926/files#diff-f48959cf62357a95aff9d53b6b9cdbbf3aa81317d24eb321c296a7d9fd6866b7R43). I've pushed the same change here to see if it helps

@benfred
Copy link
Member

benfred commented Jul 17, 2021

rerun tests

1 similar comment
@benfred
Copy link
Member

benfred commented Jul 18, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit c5a82bef60fe1d123f389440f5c06ba383c03b9a, no merge conflicts.
Running as SYSTEM
Setting status of c5a82bef60fe1d123f389440f5c06ba383c03b9a to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2845/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse c5a82bef60fe1d123f389440f5c06ba383c03b9a^{commit} # timeout=10
Checking out Revision c5a82bef60fe1d123f389440f5c06ba383c03b9a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c5a82bef60fe1d123f389440f5c06ba383c03b9a # timeout=10
Commit message: "Update cpu-ci.yml"
 > git rev-list --no-walk 3779be9b16585f589c38fb944304c2ee61a3aac4 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7111287436070886359.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular-0.5.3+57.gc5a82be
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-18 17:59:59.918255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-18 18:00:01.106670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-18 18:00:01.107891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-18 18:00:01.109058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-18 18:00:01.109092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-18 18:00:01.109145: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-18 18:00:01.109183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-18 18:00:01.109221: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-18 18:00:01.109258: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-18 18:00:01.109310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-18 18:00:01.109345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-18 18:00:01.109388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-18 18:00:01.113816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py F [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py ssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f836afa6730>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_0'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f836af8fa90>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_1'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83827331f0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_2'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83d4e8beb0>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_3'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4b3ad00>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_4'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f836af8fb80>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_5'
tree_width = None, na_sentinel = None, cat_cache = 'host', dtype = None
on_host = True, encode_type = 'joint', name_sep = '_', search_sorted = False
num_buckets = None, vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4af96a0>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_6'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83d4f2c580>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_7'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c6b520>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_8'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c35e80>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_9'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c51e20>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_10'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4819940>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_11'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int32'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4bc3190>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_12'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f836aea1f10>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_13'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4c53100>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_14'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f4944850>
freq_threshold = 0
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_15'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f47e2310>
freq_threshold = 1
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_16'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]
  cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

tests/unit/test_ops.py:459:


self = <nvtabular.ops.categorify.Categorify object at 0x7f83f40288b0>
freq_threshold = 2
out_path = '/tmp/pytest-of-jenkins/pytest-3/test_categorify_lists_vocabs1_17'
tree_width = None, na_sentinel = None, cat_cache = 'host'
dtype = <class 'numpy.int64'>, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E, max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
__________________________ test_feature_column_utils ___________________________

def test_feature_column_utils():
    cols = [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_1", ["a", "b", "c", "d"]
            ),
            16,
        ),
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
                "vocab_2", ["1", "2", "3", "4"]
            ),
            32,
        ),
    ]
  workflow, _ = nvtf.make_feature_column_workflow(cols, "target")

tests/unit/test_tf_feature_columns.py:23:


nvtabular/framework_utils/tensorflow/feature_column_utils.py:229: in make_feature_column_workflow
features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))


self = <nvtabular.ops.categorify.Categorify object at 0x7f833a7cbd00>
freq_threshold = 0, out_path = None, tree_width = None, na_sentinel = None
cat_cache = 'host', dtype = None, on_host = True, encode_type = 'joint'
name_sep = '_', search_sorted = False, num_buckets = None
vocabs = vocab_1 vocab_2
0 a 1
1 b 2
2 c 3
3 d 4
max_size = 0

def __init__(
    self,
    freq_threshold=0,
    out_path=None,
    tree_width=None,
    na_sentinel=None,
    cat_cache="host",
    dtype=None,
    on_host=True,
    encode_type="joint",
    name_sep="_",
    search_sorted=False,
    num_buckets=None,
    vocabs=None,
    max_size=0,
):

    # We need to handle three types of encoding here:
    #
    #   (1) Conventional encoding. There are no multi-column groups. So,
    #       each categorical column is separately transformed into a new
    #       "encoded" column (1-to-1).  The unique values are calculated
    #       separately for each column.
    #
    #   (2) Multi-column "Joint" encoding (there are multi-column groups
    #       in `columns` and `encode_type="joint"`).  Still a
    #       1-to-1 transofrmation of categorical columns.  However,
    #       we concatenate column groups to determine uniques (rather
    #       than getting uniques of each categorical column separately).
    #
    #   (3) Multi-column "Group" encoding (there are multi-column groups
    #       in `columns` and `encode_type="combo"`). No longer
    #       a 1-to-1 transformation of categorical columns. Each column
    #       group will be transformed to a single "encoded" column.  This
    #       means the unique "values" correspond to unique combinations.
    #       Since the same column may be included in multiple groups,
    #       replacement is not allowed for this transform.

    # Set column_groups if the user has passed in a list of columns.
    # The purpose is to capture multi-column groups. If the user doesn't
    # specify `columns`, there are no multi-column groups to worry about.
    self.column_groups = None
    self.name_sep = name_sep

    # For case (2), we need to keep track of the multi-column group name
    # that will be used for the joint encoding of each column in that group.
    # For case (3), we also use this "storage name" to signify the name of
    # the file with the required "combination" groupby statistics.
    self.storage_name = {}

    # Only support two kinds of multi-column encoding
    if encode_type not in ("joint", "combo"):
        raise ValueError(f"encode_type={encode_type} not supported.")
    if encode_type == "joint" and vocabs is not None:
      raise ValueError("Passing in vocabs is not supported with a joint encoding.")

E ValueError: Passing in vocabs is not supported with a joint encoding.

nvtabular/ops/categorify.py:232: ValueError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:164: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 243 47 120 20 79% 33-35, 40-42, 48-58, 62-63, 86, 94, 105, 111, 116->118, 129, 152-155, 194, 210, 217, 248->253, 251, 254, 257->261, 294, 305-308, 324-326, 333-342, 368, 372, 413, 437, 439, 446
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 83 88 12 34% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 231-284
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 158 120 15 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 687, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 277 33 122 23 84% 238, 240, 253, 262, 280-294, 397->466, 402-405, 410->420, 415-416, 427->425, 441->445, 456, 516->520, 563, 688-689, 693->695, 695->704, 705, 712-713, 719, 725, 820-821, 937-942, 948, 998
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 23 156 13 94% 33-34, 88-89, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 83 317 43 83% 230, 247, 251, 259, 267, 269, 281, 296, 315-316, 331, 334-358, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 83 4 36 5 92% 108, 110, 152, 169->171, 205
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6252 1120 2478 280 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 79.76%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - V...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
FAILED tests/unit/test_tf_feature_columns.py::test_feature_column_utils - Val...
===== 19 failed, 1095 passed, 13 skipped, 11 warnings in 767.19s (0:12:47) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1272602320064861379.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7, no merge conflicts.
Running as SYSTEM
Setting status of 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2861/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7^{commit} # timeout=10
Checking out Revision 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1ef8e4361074cc0a5a3faaaf8ac5efb0326b38f7 # timeout=10
Commit message: "Update nvtabular/ops/categorify.py"
 > git rev-list --no-walk e1595564fabec0c6f8756584c85cb18a20644485 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1311705690513581580.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular-0.5.3+58.g1ef8e43
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-19 20:23:30.423923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:23:31.801837: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 20:23:31.803007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:23:31.804050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:23:31.804110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:23:31.804179: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 20:23:31.804223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 20:23:31.804266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 20:23:31.804305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 20:23:31.804360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 20:23:31.804401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 20:23:31.804450: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 20:23:31.808386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))

Notebook error:
JSONDecodeError in examples/getting-started-movielens/03-Training-with-PyTorch.ipynb:
Expecting value: line 1 column 1 (char 0)
Terminated
make: *** [Makefile:20: html] Error 143
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6417902073353816443.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5, no merge conflicts.
Running as SYSTEM
Setting status of 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2865/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5^{commit} # timeout=10
Checking out Revision 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 55e5ca947fe08fc27c246d5f6f7dcc24b43762f5 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk e1595564fabec0c6f8756584c85cb18a20644485 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4922912435786826593.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (1.0.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (2021.4.1)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (5.4.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.53.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (1.1.5)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.20)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+60.g55e5ca9) (0.0.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+60.g55e5ca9) (2021.4.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2021.6.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.6.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (0.11.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.2.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (6.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (5.8.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (2.0.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (8.0.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.0.2)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (57.4.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+60.g55e5ca9) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+60.g55e5ca9) (1.20.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (2021.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+60.g55e5ca9) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+60.g55e5ca9) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+60.g55e5ca9) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
108 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-19 20:53:30.774899: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:53:32.714569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 20:53:32.715798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:53:32.716946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 20:53:32.716984: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 20:53:32.717048: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 20:53:32.717085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 20:53:32.717122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 20:53:32.717156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 20:53:32.717207: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 20:53:32.717242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 20:53:32.717285: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 20:53:32.721900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1127 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 59%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py . [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
..Terminated
Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8784313865889049440.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit c3f615c853d66cd641df7578bdbb5fa5aa6d1667, no merge conflicts.
Running as SYSTEM
Setting status of c3f615c853d66cd641df7578bdbb5fa5aa6d1667 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2874/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse c3f615c853d66cd641df7578bdbb5fa5aa6d1667^{commit} # timeout=10
Checking out Revision c3f615c853d66cd641df7578bdbb5fa5aa6d1667 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c3f615c853d66cd641df7578bdbb5fa5aa6d1667 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk a6739dec03b2670528472355adbf1d16b917eb28 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5506574157971313495.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (0.0.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (2021.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+63.gc3f615c) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (5.4.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (1.1.5)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (1.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+63.gc3f615c) (0.53.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.6.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (0.11.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2021.6.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2.4.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (57.4.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (5.8.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (8.0.1)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (6.1)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+63.gc3f615c) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+63.gc3f615c) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+63.gc3f615c) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+63.gc3f615c) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+63.gc3f615c) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-19 22:39:36.341045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 22:39:37.593701: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-19 22:39:37.594814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 22:39:37.595833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-19 22:39:37.595867: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-19 22:39:37.595932: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-19 22:39:37.595973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-19 22:39:37.596015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-19 22:39:37.596055: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-19 22:39:37.596110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-19 22:39:37.596150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-19 22:39:37.596198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-19 22:39:37.600251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1128 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 58%]
........................................................................ [ 65%]
................... [ 67%]
tests/unit/test_s3.py .. [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .....................Build timed out (after 60 minutes). Marking the build as failed.
Terminated
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5120461984064153790.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit feff50e7ba0870f9fd51c12f7e4957dffdda0b1f, no merge conflicts.
Running as SYSTEM
Setting status of feff50e7ba0870f9fd51c12f7e4957dffdda0b1f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2884/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse feff50e7ba0870f9fd51c12f7e4957dffdda0b1f^{commit} # timeout=10
Checking out Revision feff50e7ba0870f9fd51c12f7e4957dffdda0b1f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f feff50e7ba0870f9fd51c12f7e4957dffdda0b1f # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk 04f42e04b4040cc6fe412f6d7cac7d8da5d1db22 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6034723035166010996.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (2021.4.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (1.1.5)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (0.53.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+65.gfeff50e) (1.0.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (0.20)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+65.gfeff50e) (0.0.1)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2021.6.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.6.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (0.11.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (5.8.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (6.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.0.2)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (8.0.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (57.4.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+65.gfeff50e) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+65.gfeff50e) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+65.gfeff50e) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+65.gfeff50e) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+65.gfeff50e) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 03:13:37.149431: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 03:13:38.351703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 03:13:38.352822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 03:13:38.353868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 03:13:38.353898: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 03:13:38.353957: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 03:13:38.353994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 03:13:38.354030: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 03:13:38.354066: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 03:13:38.354112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 03:13:38.354147: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 03:13:38.354185: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 03:13:38.358233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1129 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 46%]
........................................................................ [ 52%]
........................................................................ [ 58%]
........................................................................ [ 65%]
................... [ 66%]
tests/unit/test_s3.py .. [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py sssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-20/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [2, 3], [3]]

E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]]
E At index 0 diff: [0] != [1]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 245 42 120 20 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 157 120 14 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 283 35 124 23 84% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 563 69 317 45 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->346, 349-356, 435-436, 454-457, 530->532, 653, 689, 718->721, 722-724, 731-732, 745-747, 748->716, 764, 772, 774, 781->exit, 804, 807->810, 818, 843, 848, 864-867, 878, 882, 884, 896-899, 977, 979, 1008->1031, 1014->1031, 1032-1037, 1074, 1092->1097, 1096, 1106->1103, 1111->1103, 1119, 1127-1137
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 7 38 6 90% 20-21, 113, 115, 117, 159, 176->178, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6266 1098 2482 284 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.19%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
===== 18 failed, 1097 passed, 14 skipped, 11 warnings in 829.60s (0:13:49) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7042895182003705964.sh

@benfred
Copy link
Member

benfred commented Jul 20, 2021

@marcromeyn The unittests are failing here with what looks like an off by one error E assert [[0], [0, 3], [1, 2], [2]] == [[1], [1, 4], [2, 3], [3]] . The tests sorta assume that the index '0' is reserved for out of vocabulary/ unknown items - is this being handled?

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit f6183acc644a7d041a9bb2ebccd730fb4f81ef9c, has merge conflicts.
Running as SYSTEM
!!! PR mergeability status has changed !!!  
PR now has NO merge conflicts
Setting status of f6183acc644a7d041a9bb2ebccd730fb4f81ef9c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2891/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse f6183acc644a7d041a9bb2ebccd730fb4f81ef9c^{commit} # timeout=10
Checking out Revision f6183acc644a7d041a9bb2ebccd730fb4f81ef9c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f6183acc644a7d041a9bb2ebccd730fb4f81ef9c # timeout=10
Commit message: "Fixing prepend in dispatch._add_to_series"
 > git rev-list --no-walk 9a86adc67b82f2095ac56823719fbc8e54f66281 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins2068987169614134547.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (1.1.5)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (0.20)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (1.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+66.gf6183ac) (0.53.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+66.gf6183ac) (0.0.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (0.11.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2021.6.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.2.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (6.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (8.0.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (57.4.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (2.0.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.0.2)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (5.8.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+66.gf6183ac) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+66.gf6183ac) (1.20.2)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+66.gf6183ac) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+66.gf6183ac) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+66.gf6183ac) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 14:53:05.981958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 14:53:07.194806: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 14:53:07.195877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 14:53:07.196875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 14:53:07.196907: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 14:53:07.196956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 14:53:07.196991: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 14:53:07.197025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 14:53:07.197057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 14:53:07.197103: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 14:53:07.197136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 14:53:07.197174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 14:53:07.201385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1129 items

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 12%]
tests/unit/test_dataloader_backend.py . [ 12%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 23%]
ssss.................................................. [ 28%]
tests/unit/test_notebooks.py ...... [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 39%]
........................................................................ [ 46%]
........................................................................ [ 52%]
........................................................................ [ 58%]
........................................................................ [ 65%]
................... [ 66%]
tests/unit/test_s3.py .. [ 67%]
tests/unit/test_tf_dataloader.py ....................................... [ 70%]
.................................s [ 73%]
tests/unit/test_tf_feature_columns.py . [ 73%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 80%]
tests/unit/test_tools.py ...................... [ 82%]
tests/unit/test_torch_dataloader.py .................................... [ 85%]
.............................................. [ 89%]
tests/unit/test_triton_inference.py sssss.................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 18 82 5 87% 54, 87, 128, 152-165, 214, 301
nvtabular/dispatch.py 245 42 120 20 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 0 28 0 100%
nvtabular/framework_utils/torch/utils.py 75 4 30 2 94% 64, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 157 120 14 43% 118-168, 213-274, 305, 307, 331-343, 347-363, 367-370, 374, 396-412, 416-420, 506-528, 532-599, 608->611, 611->607, 640-650, 654-655, 659, 669, 675, 677, 679, 681, 683, 685, 690, 694-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-266
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 283 35 124 23 84% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 12 138 9 95% 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 22 50 7 85% 57, 65-68, 78, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 564 69 317 45 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 455-458, 531->533, 654, 690, 719->722, 723-725, 732-733, 746-748, 749->717, 765, 773, 775, 782->exit, 805, 808->811, 819, 844, 849, 865-868, 879, 883, 885, 897-900, 978, 980, 1009->1032, 1015->1032, 1033-1038, 1075, 1093->1098, 1097, 1107->1104, 1112->1104, 1120, 1128-1138
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 57 2 20 1 96% 92, 118
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 7 38 6 90% 20-21, 113, 115, 117, 159, 176->178, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 3 2 1 87% 25, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 43 44 8 49% 30-31, 35-36, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 6267 1098 2482 284 80%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 80.19%
========== 1115 passed, 14 skipped, 11 warnings in 916.56s (0:15:16) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5399356434971752398.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 456607ae4dbbeb1d348c9a005ea253141c1a6f33, no merge conflicts.
Running as SYSTEM
Setting status of 456607ae4dbbeb1d348c9a005ea253141c1a6f33 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2892/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 456607ae4dbbeb1d348c9a005ea253141c1a6f33^{commit} # timeout=10
Checking out Revision 456607ae4dbbeb1d348c9a005ea253141c1a6f33 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 456607ae4dbbeb1d348c9a005ea253141c1a6f33 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk f6183acc644a7d041a9bb2ebccd730fb4f81ef9c # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5500282410223543448.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (5.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (2021.4.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (0.53.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+68.g456607a) (0.0.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (1.1.5)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+68.g456607a) (1.0.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.6.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.2.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (2021.6.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (0.11.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (2.4.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (57.4.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (6.1)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (5.8.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (8.0.1)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (2.0.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+68.g456607a) (0.36.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+68.g456607a) (1.20.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (2021.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+68.g456607a) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+68.g456607a) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+68.g456607a) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+68.g456607a) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._arange' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._encode_list_column' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._flatten_list_column' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._from_host' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._hash_series' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._is_list_dtype' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._parquet_writer_dispatch' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._read_parquet_dispatch' imported but unused
./nvtabular/ops/categorify.py:37:1: F401 'nvtabular.dispatch._series_has_nulls' imported but unused
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins628072434382999635.sh

@marcromeyn
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 419b07f18b99926b9227a14a483b21758dbb4cae, no merge conflicts.
Running as SYSTEM
Setting status of 419b07f18b99926b9227a14a483b21758dbb4cae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2895/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 419b07f18b99926b9227a14a483b21758dbb4cae^{commit} # timeout=10
Checking out Revision 419b07f18b99926b9227a14a483b21758dbb4cae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
Commit message: "Fixing flake8"
 > git rev-list --no-walk 49044acc44eb8f7e64e2ba31509bbd363c6cf6e6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1907607570036027871.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (5.4.1)
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.0.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.20)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.1.5)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (0.53.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2021.6.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.11.1)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (6.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (8.0.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.0.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (57.4.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (5.8.0)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+69.g419b07f) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 15:37:33.799125: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 15:37:35.001494: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 15:37:35.002584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 15:37:35.003578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 15:37:35.003607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 15:37:35.003654: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 15:37:35.003687: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 15:37:35.003722: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 15:37:35.003754: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 15:37:35.003800: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 15:37:35.003832: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 15:37:35.003869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 15:37:35.007886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1100 items / 2 skipped / 1098 selected

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 13%]
tests/unit/test_dataloader_backend.py . [ 13%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 24%]
ssss.................................................. [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 40%]
........................................................................ [ 46%]
........................................................................ [ 53%]
........................................................................ [ 59%]
........................................................................ [ 66%]
................... [ 68%]
tests/unit/test_s3.py .. [ 68%]
tests/unit/test_tf_dataloader.py ....................................... [ 71%]
.................................s [ 75%]
tests/unit/test_tf_feature_columns.py . [ 75%]
tests/unit/test_tf_layers.py ........................................... [ 79%]
................................... [ 82%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .................................... [ 87%]
.............................................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 42 82 4 72% 54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301
nvtabular/dispatch.py 245 42 120 21 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 13 85 6 90% 60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 1 28 1 97% 108
nvtabular/framework_utils/torch/utils.py 75 13 30 3 79% 22, 25-33, 64, 118-120, 132->115
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 269 120 0 3% 30-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 87 58 0 0% 27-150
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-267
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 283 35 124 23 84% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 9 16 4 64% 42-49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 13 138 11 95% 98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 23 50 8 84% 57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 81 15 16 2 76% 25-27, 30-36, 111, 149-150, 190, 193
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 565 71 317 47 84% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->854, 866-869, 880, 884, 886, 898-901, 979, 981, 1010->1033, 1016->1033, 1034-1039, 1076, 1094->1099, 1098, 1108->1105, 1113->1105, 1121, 1129-1139
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 63 6 22 1 89% 62-66, 101, 127
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 8 38 8 87% 20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 13 18 3 58% 59, 63, 77, 88-103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 4 2 1 84% 25, 99, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 44 44 9 47% 30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 5975 1369 2482 272 74%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 74.48%
========== 1091 passed, 11 skipped, 11 warnings in 640.17s (0:10:40) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6983651131543558821.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 419b07f18b99926b9227a14a483b21758dbb4cae, no merge conflicts.
Running as SYSTEM
Setting status of 419b07f18b99926b9227a14a483b21758dbb4cae to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2896/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 419b07f18b99926b9227a14a483b21758dbb4cae^{commit} # timeout=10
Checking out Revision 419b07f18b99926b9227a14a483b21758dbb4cae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
Commit message: "Fixing flake8"
 > git rev-list --no-walk 419b07f18b99926b9227a14a483b21758dbb4cae # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2269438400353680660.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: tdqm in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.0.1)
Requirement already satisfied: numba>=0.53.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (0.53.1)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: pandas<1.3.0dev0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (1.1.5)
Requirement already satisfied: distributed==2021.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: dask==2021.4.1 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (2021.4.1)
Requirement already satisfied: versioneer in /var/jenkins_home/.local/lib/python3.8/site-packages (from nvtabular==0.5.3+69.g419b07f) (0.20)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.8/dist-packages (from nvtabular==0.5.3+69.g419b07f) (5.4.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.6.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2021.6.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.11.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.2)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.7.0)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (5.8.0)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (57.4.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (8.0.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (6.1)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (1.20.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53.1->nvtabular==0.5.3+69.g419b07f) (0.36.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (2.8.1)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask==2021.4.1->nvtabular==0.5.3+69.g419b07f) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas<1.3.0dev0,>=1.0->nvtabular==0.5.3+69.g419b07f) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed==2021.4.1->nvtabular==0.5.3+69.g419b07f) (1.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from tdqm->nvtabular==0.5.3+69.g419b07f) (4.61.2)
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 15:50:57.598634: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 15:50:58.827801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 15:50:58.828892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 15:50:58.829898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 15:50:58.829928: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 15:50:58.829999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 15:50:58.830033: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 15:50:58.830066: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 15:50:58.830097: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 15:50:58.830143: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 15:50:58.830174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 15:50:58.830215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 15:50:58.834628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1100 items / 2 skipped / 1098 selected

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 13%]
tests/unit/test_dataloader_backend.py . [ 13%]
tests/unit/test_io.py .................................................. [ 17%]
....................................................................ssss [ 24%]
ssss.................................................. [ 29%]
tests/unit/test_ops.py ................................................. [ 33%]
........................................................................ [ 40%]
........................................................................ [ 46%]
........................................................................ [ 53%]
........................................................................ [ 59%]
........................................................................ [ 66%]
................... [ 68%]
tests/unit/test_s3.py .. [ 68%]
tests/unit/test_tf_dataloader.py ....................................... [ 71%]
.................................s [ 75%]
tests/unit/test_tf_feature_columns.py . [ 75%]
tests/unit/test_tf_layers.py ........................................... [ 79%]
................................... [ 82%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .................................... [ 87%]
.............................................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 42 82 4 72% 54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301
nvtabular/dispatch.py 245 42 120 21 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 13 85 6 90% 60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 1 28 1 97% 108
nvtabular/framework_utils/torch/utils.py 75 13 30 3 79% 22, 25-33, 64, 118-120, 132->115
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 269 120 0 3% 30-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 87 58 0 0% 27-150
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-267
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 179 7 68 11 93% 110, 113, 149, 224, 384->382, 412->415, 423, 427->429, 429->425, 434, 436
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 283 35 124 23 84% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 695-696, 700->702, 702->711, 712, 719-720, 726, 732, 827-828, 944-949, 955, 1005
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 9 16 4 64% 42-49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 13 138 11 95% 98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 23 50 8 84% 57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 81 15 16 2 76% 25-27, 30-36, 111, 149-150, 190, 193
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 565 71 317 47 84% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->854, 866-869, 880, 884, 886, 898-901, 979, 981, 1010->1033, 1016->1033, 1034-1039, 1076, 1094->1099, 1098, 1108->1105, 1113->1105, 1121, 1129-1139
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 63 6 22 1 89% 62-66, 101, 127
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 8 38 8 87% 20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 13 18 3 58% 59, 63, 77, 88-103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 4 2 1 84% 25, 99, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 44 44 9 47% 30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 5975 1369 2482 272 74%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 74.48%
========== 1091 passed, 11 skipped, 11 warnings in 791.23s (0:13:11) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7499799071554537368.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af, no merge conflicts.
Running as SYSTEM
Setting status of ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2902/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af^{commit} # timeout=10
Checking out Revision ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ad2f61efc5e7c2c0ecb9b89a32664c43c768d0af # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk deed53a14aead05524deeb50fd45ba75523b07ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3664444352533196477.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d
       cwd: /var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Complete output (16 lines):
  Traceback (most recent call last):
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in 
      main()
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 154, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 135, in _get_build_requires
      self.run_setup()
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 150, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 66, in 
      define_macros=[("VERSION_INFO", versioneer.get_version())],
  AttributeError: module 'versioneer' has no attribute 'get_version'
  ----------------------------------------
WARNING: Discarding file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular. Command errored out with exit status 1: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/bin/python /var/jenkins_home/.local/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpayr2gb3d Check the logs for full command output.
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins817958748879255520.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit e4cf7089ffbabdd33fc49a2c19c46123c3b524d0, no merge conflicts.
Running as SYSTEM
Setting status of e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2916/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse e4cf7089ffbabdd33fc49a2c19c46123c3b524d0^{commit} # timeout=10
Checking out Revision e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e4cf7089ffbabdd33fc49a2c19c46123c3b524d0 # timeout=10
Commit message: "Merge branch 'main' into feature-cols-categorify"
 > git rev-list --no-walk 40ccda0b43ac0bdfb3fc79837f66d7276fddcc77 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8599164028311122005.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.0)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+73.ge4cf708 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.5.3+73.ge4cf708 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.5.3+73.ge4cf708
Searching for pyarrow==1.0.1
Best match: pyarrow 1.0.1
Adding pyarrow 1.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tdqm==0.0.1
Best match: tdqm 0.0.1
Adding tdqm 0.0.1 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for numba==0.53.1
Best match: numba 0.53.1
Adding numba 0.53.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pandas==1.1.5
Best match: pandas 1.1.5
Adding pandas 1.1.5 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Adding distributed 2021.4.1 to easy-install.pth file
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Adding dask 2021.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Adding PyYAML 5.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Adding tqdm 4.61.2 to easy-install.pth file
Installing tqdm script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==57.4.0
Best match: setuptools 57.4.0
Adding setuptools 57.4.0 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.36.0
Best match: llvmlite 0.36.0
Adding llvmlite 0.36.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Adding psutil 5.8.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Adding toolz 0.11.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for click==8.0.1
Best match: click 8.0.1
Adding click 8.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Adding msgpack 1.0.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Adding cloudpickle 1.6.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Adding zict 2.0.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Adding tblib 1.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tornado==6.1
Best match: tornado 6.1
Adding tornado 6.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Adding sortedcontainers 2.4.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for partd==1.2.0
Best match: partd 1.2.0
Adding partd 1.2.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for fsspec==2021.7.0
Best match: fsspec 2021.7.0
Adding fsspec 2021.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Adding HeapDict 1.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for locket==0.2.1
Best match: locket 0.2.1
Adding locket 0.2.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Finished processing dependencies for nvtabular==0.5.3+73.ge4cf708
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 20:40:52.645434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 20:40:54.960475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 20:40:54.961883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 20:40:54.963186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 20:40:54.963282: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 20:40:54.963385: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 20:40:54.963454: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 20:40:54.963525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 20:40:54.963596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 20:40:54.963693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 20:40:54.963763: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 20:40:54.963847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 20:40:54.968808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1112 items / 2 skipped / 1110 selected

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 13%]
tests/unit/test_dataloader_backend.py . [ 13%]
tests/unit/test_io.py .................................................. [ 17%]
........................................................................ [ 24%]
........ssssssss.................................................. [ 30%]
tests/unit/test_ops.py ................................................. [ 34%]
........................................................................ [ 40%]
.................................................FFFFFFFFFFFFFFFFFF..... [ 47%]
........................................................................ [ 53%]
........................................................................ [ 60%]
........................................................................ [ 66%]
................... [ 68%]
tests/unit/test_s3.py .. [ 68%]
tests/unit/test_tf_dataloader.py ....................................... [ 72%]
.................................s [ 75%]
tests/unit/test_tf_feature_columns.py . [ 75%]
tests/unit/test_tf_layers.py ........................................... [ 79%]
................................... [ 82%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .................................... [ 87%]
.............................................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=================================== FAILURES ===================================
_________________ test_categorify_lists[vocabs1-None-False-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_0')
freq_threshold = 0, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_1')
freq_threshold = 1, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-None-False-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_2')
freq_threshold = 2, cpu = False, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_3')
freq_threshold = 0, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_4')
freq_threshold = 1, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
__________________ test_categorify_lists[vocabs1-None-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_5')
freq_threshold = 2, cpu = True, dtype = None
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_6')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_7')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_8')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_9')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_10')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int32-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_11')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int32'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-0] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_12')
freq_threshold = 0, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_13')
freq_threshold = 1, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-False-2] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_14')
freq_threshold = 2, cpu = False, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-0] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_15')
freq_threshold = 0, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-1] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_16')
freq_threshold = 1, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
_________________ test_categorify_lists[vocabs1-int64-True-2] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_categorify_lists_vocabs1_17')
freq_threshold = 2, cpu = True, dtype = <class 'numpy.int64'>
vocabs = Authors
0 User_A
1 User_B
2 User_C
3 User_E

@pytest.mark.parametrize("freq_threshold", [0, 1, 2])
@pytest.mark.parametrize("cpu", [False, True])
@pytest.mark.parametrize("dtype", [None, np.int32, np.int64])
@pytest.mark.parametrize("vocabs", [None, pd.DataFrame({"Authors": [f"User_{x}" for x in "ABCE"]})])
def test_categorify_lists(tmpdir, freq_threshold, cpu, dtype, vocabs):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
        }
    )
    cat_names = ["Authors", "Engaging User"]
    label_name = ["Post"]

    cat_features = cat_names >> ops.Categorify(
        out_path=str(tmpdir), freq_threshold=freq_threshold, dtype=dtype, vocabs=vocabs
    )

    workflow = nvt.Workflow(cat_features + label_name)
    df_out = workflow.fit_transform(nvt.Dataset(df, cpu=cpu)).to_ddf().compute()

    # Columns are encoded independently
    if cpu:
        assert df_out["Authors"][0].dtype == np.dtype(dtype) if dtype else np.dtype("int64")
        compare = [list(row) for row in df_out["Authors"].tolist()]
    else:
        assert df_out["Authors"].dtype == cudf.core.dtypes.ListDtype(dtype if dtype else "int64")
        compare = df_out["Authors"].to_arrow().to_pylist()

    if freq_threshold < 2 or vocabs is not None:
      assert compare == [[1], [1, 4], [3, 2], [2]]

E assert [[1], [1, 4], [2, 3], [3]] == [[1], [1, 4], [3, 2], [2]]
E At index 2 diff: [2, 3] != [3, 2]
E Use -v to get the full diff

tests/unit/test_ops.py:475: AssertionError
=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 42 82 4 72% 54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301
nvtabular/dispatch.py 245 42 120 21 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 13 85 6 90% 60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 1 28 1 97% 108
nvtabular/framework_utils/torch/utils.py 75 13 30 3 79% 22, 25-33, 64, 118-120, 132->115
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 269 120 0 3% 30-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 87 58 0 0% 27-150
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-267
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 180 7 70 11 93% 110, 113, 149, 225, 385->383, 413->416, 424, 428->430, 430->426, 435, 437
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 289 35 126 21 86% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 710-711, 738, 745-746, 752, 758, 853-854, 970-975, 981, 1031
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 13 138 11 95% 98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 23 50 8 84% 57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 81 15 16 2 76% 25-27, 30-36, 111, 149-150, 190, 193
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 573 71 323 48 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->855, 866->870, 877-880, 891, 895, 897, 909-912, 990, 992, 1021->1044, 1027->1044, 1045-1050, 1087, 1105->1110, 1109, 1119->1116, 1124->1116, 1132, 1140-1150
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 63 6 22 1 89% 62-66, 101, 127
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 8 38 8 87% 20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 13 18 3 58% 59, 63, 77, 88-103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 4 2 1 84% 25, 99, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 44 44 9 47% 30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 5990 1366 2492 272 75%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 74.66%
=========================== short test summary info ============================
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-False-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-0] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-1] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-None-True-2] - a...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int32-True-2] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-0]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-1]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-False-2]
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-0] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-1] - ...
FAILED tests/unit/test_ops.py::test_categorify_lists[vocabs1-int64-True-2] - ...
==== 18 failed, 1085 passed, 11 skipped, 11 warnings in 1013.92s (0:16:53) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7770121283389326806.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #935 of commit 11f5b0379ff6074590a18147d6732f1644d1ca21, no merge conflicts.
Running as SYSTEM
Setting status of 11f5b0379ff6074590a18147d6732f1644d1ca21 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2918/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/935/*:refs/remotes/origin/pr/935/* # timeout=10
 > git rev-parse 11f5b0379ff6074590a18147d6732f1644d1ca21^{commit} # timeout=10
Checking out Revision 11f5b0379ff6074590a18147d6732f1644d1ca21 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 11f5b0379ff6074590a18147d6732f1644d1ca21 # timeout=10
Commit message: "Fix to match frequency sorting categorify changes"
 > git rev-list --no-walk c4d3124cdebfa8d674b600f95ecff529443eecb0 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4765283923681026855.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.1.3)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (0.36.2)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.0)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.5.3+74.g11f5b03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.5.3+74.g11f5b03 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.5.3+74.g11f5b03
Searching for pyarrow==1.0.1
Best match: pyarrow 1.0.1
Adding pyarrow 1.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tdqm==0.0.1
Best match: tdqm 0.0.1
Adding tdqm 0.0.1 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for numba==0.53.1
Best match: numba 0.53.1
Adding numba 0.53.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pandas==1.1.5
Best match: pandas 1.1.5
Adding pandas 1.1.5 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Adding distributed 2021.4.1 to easy-install.pth file
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Adding dask 2021.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Adding PyYAML 5.4.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Adding tqdm 4.61.2 to easy-install.pth file
Installing tqdm script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for llvmlite==0.36.0
Best match: llvmlite 0.36.0
Adding llvmlite 0.36.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==57.4.0
Best match: setuptools 57.4.0
Adding setuptools 57.4.0 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tornado==6.1
Best match: tornado 6.1
Adding tornado 6.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Adding toolz 0.11.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Adding msgpack 1.0.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Adding sortedcontainers 2.4.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Adding psutil 5.8.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Adding cloudpickle 1.6.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Adding zict 2.0.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Adding tblib 1.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for click==8.0.1
Best match: click 8.0.1
Adding click 8.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for fsspec==2021.7.0
Best match: fsspec 2021.7.0
Adding fsspec 2021.7.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for partd==1.2.0
Best match: partd 1.2.0
Adding partd 1.2.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Adding HeapDict 1.0.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for locket==0.2.1
Best match: locket 0.2.1
Adding locket 0.2.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Finished processing dependencies for nvtabular==0.5.3+74.g11f5b03
Running black --check
All done! ✨ 🍰 ✨
109 files would be left unchanged.
Running flake8
Running isort
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
/usr/local/lib/python3.8/dist-packages/isort/main.py:141: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples/scaling-criteo/imgs
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:459:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:66:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module bench.datasets.tools.train_hugectr
bench/datasets/tools/train_hugectr.py:28:13: I1101: Module 'hugectr' has no 'solver_parser_helper' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
bench/datasets/tools/train_hugectr.py:41:16: I1101: Module 'hugectr' has no 'optimizer' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
2021-07-20 22:16:03.309271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 22:16:04.593825: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-20 22:16:04.594912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 22:16:04.595907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-DGXS-16GB computeCapability: 6.0
coreClock: 1.4805GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-07-20 22:16:04.595940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-20 22:16:04.595990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-20 22:16:04.596025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-20 22:16:04.596060: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-20 22:16:04.596094: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-20 22:16:04.596142: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-20 22:16:04.596175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-20 22:16:04.596214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-20 22:16:04.600206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1112 items / 2 skipped / 1110 selected

tests/unit/test_column_group.py .. [ 0%]
tests/unit/test_column_similarity.py ........................ [ 2%]
tests/unit/test_cpu_workflow.py ...... [ 2%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 13%]
tests/unit/test_dataloader_backend.py . [ 13%]
tests/unit/test_io.py .................................................. [ 17%]
........................................................................ [ 24%]
........ssssssss.................................................. [ 30%]
tests/unit/test_ops.py ................................................. [ 34%]
........................................................................ [ 40%]
........................................................................ [ 47%]
........................................................................ [ 53%]
........................................................................ [ 60%]
........................................................................ [ 66%]
................... [ 68%]
tests/unit/test_s3.py .. [ 68%]
tests/unit/test_tf_dataloader.py ....................................... [ 72%]
.................................s [ 75%]
tests/unit/test_tf_feature_columns.py . [ 75%]
tests/unit/test_tf_layers.py ........................................... [ 79%]
................................... [ 82%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .................................... [ 87%]
.............................................. [ 91%]
tests/unit/test_workflow.py ............................................ [ 95%]
................................................ [100%]

=============================== warnings summary ===============================
tests/unit/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)

tests/unit/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/test_ops.py::test_groupby_op[id-False]
tests/unit/test_ops.py::test_groupby_op[id-True]
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 157 42 82 4 72% 54, 87, 128, 152-165, 207-214, 218-221, 225, 240-258, 301
nvtabular/dispatch.py 245 42 120 21 81% 33-35, 40-42, 48-58, 62-63, 83, 90, 98, 109, 115, 120->122, 125->127, 133, 156-159, 198, 214, 221, 252->257, 255, 258, 261->265, 298, 309-312, 339-342, 372, 376, 417, 441, 443, 450
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 132 78 88 15 38% 29, 98, 102, 113-129, 139, 142-157, 161, 165-166, 172-197, 206-216, 219-226, 228->231, 232, 237-277, 280
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 13 85 6 90% 60, 68->49, 122, 179, 231-239, 242, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 30 1 12 1 95% 47
nvtabular/framework_utils/torch/models.py 45 1 28 1 97% 108
nvtabular/framework_utils/torch/utils.py 75 13 30 3 79% 22, 25-33, 64, 118-120, 132->115
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 279 269 120 0 3% 30-700
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 87 58 0 0% 27-150
nvtabular/inference/triton/model.py 140 140 66 0 0% 27-267
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 180 7 70 11 93% 110, 113, 149, 225, 385->383, 413->416, 424, 428->430, 430->426, 435, 437
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 289 35 126 21 86% 43-44, 245, 247, 260, 269, 287-301, 404->473, 409-412, 417->427, 422-423, 434->432, 448->452, 463, 523->527, 570, 710-711, 738, 745-746, 752, 758, 853-854, 970-975, 981, 1031
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 492 21 156 12 95% 33-34, 92-100, 124->126, 213-215, 338-343, 381-386, 502->509, 570->575, 576-577, 697, 701, 705, 743, 760, 764, 771->773, 891->896, 901->911, 938
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 173 13 66 5 92% 24-25, 51, 79, 125, 128, 207, 216, 219, 262, 283-285
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 327 13 138 11 95% 98, 102->94, 142-143, 233->235, 245-249, 295-296, 335->339, 410, 414-415, 445, 550, 558
nvtabular/loader/tensorflow.py 155 23 50 8 84% 57, 65-68, 78, 82, 88, 296, 332, 347-349, 378-380, 390-398, 401-404
nvtabular/loader/tf_utils.py 55 27 20 5 44% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70, 85-90, 100-113
nvtabular/loader/torch.py 81 15 16 2 76% 25-27, 30-36, 111, 149-150, 190, 193
nvtabular/ops/init.py 21 0 0 0 100%
nvtabular/ops/bucketize.py 32 10 18 3 62% 52-54, 58, 61-64, 83-86
nvtabular/ops/categorify.py 573 71 323 48 85% 230, 232, 247, 251, 259, 267, 269, 296, 315-316, 331, 342->347, 350-357, 436-437, 454-459, 532->534, 655, 691, 720->723, 724-726, 733-734, 747-749, 750->718, 766, 774, 776, 783->exit, 806, 809->812, 820, 845-847, 850, 852->855, 866->870, 877-880, 891, 895, 897, 909-912, 990, 992, 1021->1044, 1027->1044, 1045-1050, 1087, 1105->1110, 1109, 1119->1116, 1124->1116, 1132, 1140-1150
nvtabular/ops/clip.py 18 2 6 3 79% 43, 51->53, 54
nvtabular/ops/column_similarity.py 103 24 36 5 72% 19-20, 76->exit, 106, 178-179, 188-190, 198-214, 231->234, 235, 245
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 25 0 8 1 97% 66->68
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 63 6 22 1 89% 62-66, 101, 127
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 92 4 56 6 92% 71, 80, 82, 92->94, 104->109, 180
nvtabular/ops/hash_bucket.py 29 2 18 2 87% 69, 99
nvtabular/ops/hashed_cross.py 28 3 13 4 83% 50, 63, 77->exit, 78
nvtabular/ops/join_external.py 89 8 38 8 87% 20-21, 113, 115, 117, 159, 163->167, 176->178, 203, 212
nvtabular/ops/join_groupby.py 84 5 30 2 94% 106, 109->118, 194-195, 198-199
nvtabular/ops/lambdaop.py 39 13 18 3 58% 59, 63, 77, 88-103
nvtabular/ops/list_slice.py 63 24 26 1 56% 21-22, 52-53, 100-114, 122-133
nvtabular/ops/logop.py 8 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 70 8 14 2 86% 60->59, 67, 75-76, 109-110, 132-133, 137
nvtabular/ops/operator.py 29 4 2 1 84% 25, 99, 104, 109
nvtabular/ops/rename.py 23 3 14 3 84% 45, 66-68
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 146 11 64 5 90% 147, 167->171, 174->183, 228-229, 232-233, 242-248, 339->342
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 323
nvtabular/tools/dataset_inspector.py 49 7 18 1 79% 31-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 94 44 44 9 47% 30-31, 35-36, 45, 49, 60-61, 63-65, 68, 71, 77, 83, 89-125, 144, 148->152
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow.py 156 11 73 4 93% 28-29, 45, 131, 145-147, 251, 280-281, 369

TOTAL 5990 1366 2492 272 75%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 74.66%
========== 1103 passed, 11 skipped, 11 warnings in 628.57s (0:10:28) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4750676581666921359.sh

@benfred benfred merged commit df8b4db into main Jul 20, 2021
@benfred benfred deleted the feature-cols-categorify branch July 20, 2021 22:33
@@ -227,7 +226,7 @@ def _get_parents(column):
features += features_replaced_buckets

if len(categorifies) > 0:
features += categorifies.keys() >> Categorify()
features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line fails if features have vocabularies of varying size. In particular, the sample notebook "using-feature-columns.ipynb" fails on this line with the following error. Am I missing anything, or this is a bug here?

I used docker image "merlin-tensorflow-training" that had nvtabular 0.6.1 installed.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_783/4041583971.py in <module>
----> 1 online_workflow, feature_columns = make_feature_column_workflow(feature_columns, "AdoptionSpeed")

/nvtabular/nvtabular/framework_utils/tensorflow/feature_column_utils.py in make_feature_column_workflow(feature_columns, label_name, category_dir)
    227 
    228     if len(categorifies) > 0:
--> 229         features += categorifies.keys() >> Categorify(vocabs=pd.DataFrame(categorifies))
    230 
    231     if len(hashes) > 0:

/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    466 
    467         elif isinstance(data, dict):
--> 468             mgr = init_dict(data, index, columns, dtype=dtype)
    469         elif isinstance(data, ma.MaskedArray):
    470             import numpy.ma.mrecords as mrecords

/usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    281             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    282         ]
--> 283     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    284 
    285 

/usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     76         # figure out the index, if necessary
     77         if index is None:
---> 78             index = extract_index(arrays)
     79         else:
     80             index = ensure_index(index)

/usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py in extract_index(data)
    395             lengths = list(set(raw_lengths))
    396             if len(lengths) > 1:
--> 397                 raise ValueError("arrays must all be same length")
    398 
    399             if have_dicts:

ValueError: arrays must all be same length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on how to handle this @marcromeyn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting! That looks like a bug to me - I opened #1062 to track

mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
 Allow to pass in vocabs in Categorify to fix make_feature_column_workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] make_feature_column_workflow doesn't work with categorical columns
5 participants