Skip to content

Fix for cudf 22.04#82

Merged
jperez999 merged 2 commits intomainfrom
cudf_2204_fix
May 4, 2022
Merged

Fix for cudf 22.04#82
jperez999 merged 2 commits intomainfrom
cudf_2204_fix

Conversation

@benfred
Copy link
Copy Markdown
Member

@benfred benfred commented May 4, 2022

Cudf 22.04 changed the df.sample method to accept
a 'ignore_index' instead of a 'keep_index' parameter.
Fix by changing the param based off of cudf version

Cudf 22.04 changed the df.sample method to accept
a 'ignore_index' instead of a 'keep_index' parameter.
Fix by changing the param based off of cudf version
@github-actions
Copy link
Copy Markdown

github-actions bot commented May 4, 2022

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-82

@nvidia-merlin-bot
Copy link
Copy Markdown

Click to view CI Results
GitHub pull request #82 of commit cde760dbd70d6b4a36133d65272f5f55ad7f7449, no merge conflicts.
Running as SYSTEM
Setting status of cde760dbd70d6b4a36133d65272f5f55ad7f7449 to PENDING with url https://10.20.13.93:8080/job/merlin_core/46/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/82/*:refs/remotes/origin/pr/82/* # timeout=10
 > git rev-parse cde760dbd70d6b4a36133d65272f5f55ad7f7449^{commit} # timeout=10
Checking out Revision cde760dbd70d6b4a36133d65272f5f55ad7f7449 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cde760dbd70d6b4a36133d65272f5f55ad7f7449 # timeout=10
Commit message: "Fix for cudf 22.04"
 > git rev-list --no-walk f7e89cc177414b232546a67b665592f33c347fcf # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins7837108209131990970.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 337 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 26%]
................................................................ [ 45%]
tests/unit/schema/test_column_schemas.py ............................... [ 54%]
........................................................................ [ 75%]
........................................................................ [ 97%]
[ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 72 warnings
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:552: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40513 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46789 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40801 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39713 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35763 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41965 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 337 passed, 1 skipped, 83 warnings in 53.32s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins5678432651910084224.sh

benfred added a commit to NVIDIA-Merlin/models that referenced this pull request May 4, 2022
The dataloader was including the shuffle code that is now hosted in merlin-core. This
change updates to use the merlin-core version, rather than redefine here.

The shuffle_df code had an issue with cudf 22.04 that is fixed in merlin-core by
NVIDIA-Merlin/core#82
@nvidia-merlin-bot
Copy link
Copy Markdown

Click to view CI Results
GitHub pull request #82 of commit cb5157148aa2ec6872698ede29cd960cc9f44aae, no merge conflicts.
Running as SYSTEM
Setting status of cb5157148aa2ec6872698ede29cd960cc9f44aae to PENDING with url https://10.20.13.93:8080/job/merlin_core/47/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/82/*:refs/remotes/origin/pr/82/* # timeout=10
 > git rev-parse cb5157148aa2ec6872698ede29cd960cc9f44aae^{commit} # timeout=10
Checking out Revision cb5157148aa2ec6872698ede29cd960cc9f44aae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cb5157148aa2ec6872698ede29cd960cc9f44aae # timeout=10
Commit message: "Merge branch 'main' into cudf_2204_fix"
 > git rev-list --no-walk cde760dbd70d6b4a36133d65272f5f55ad7f7449 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins816322271622302664.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.1.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 342 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 11%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 72 warnings
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44429 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43325 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45641 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44087 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37939 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44161 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 342 passed, 1 skipped, 83 warnings in 50.83s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins2862619308252523736.sh

jperez999 added a commit to NVIDIA-Merlin/models that referenced this pull request May 4, 2022
The dataloader was including the shuffle code that is now hosted in merlin-core. This
change updates to use the merlin-core version, rather than redefine here.

The shuffle_df code had an issue with cudf 22.04 that is fixed in merlin-core by
NVIDIA-Merlin/core#82

Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
@jperez999 jperez999 merged commit 552238d into main May 4, 2022
@benfred benfred deleted the cudf_2204_fix branch May 4, 2022 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants