Avoid nullable pandas dtypes in CPUParquetEngine#60
Avoid nullable pandas dtypes in CPUParquetEngine#60karlhigley merged 2 commits intoNVIDIA-Merlin:mainfrom
Conversation
Click to view CI ResultsGitHub pull request #60 of commit 5f326867755930c84d37b8dcf8ccfd463d9c759c, no merge conflicts.
Running as SYSTEM
Setting status of 5f326867755930c84d37b8dcf8ccfd463d9c759c to PENDING with url https://10.20.13.93:8080/job/merlin_core/11/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
> git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/60/*:refs/remotes/origin/pr/60/* # timeout=10
> git rev-parse 5f326867755930c84d37b8dcf8ccfd463d9c759c^{commit} # timeout=10
Checking out Revision 5f326867755930c84d37b8dcf8ccfd463d9c759c (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 5f326867755930c84d37b8dcf8ccfd463d9c759c # timeout=10
Commit message: "convert all pandas dtypes to non-nullable in CPUParquetEngine"
> git rev-list --no-walk abc37714f84ddca49a34b883569e514fb25f8bc2 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins810222065464690935.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.3.1)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 337 items / 1 skipped
|
|
Note that I am getting this pre-commit failure (probably related to a lack of test coverage for the change), but will need to come back later to address it: |
Documentation preview |
Click to view CI ResultsGitHub pull request #60 of commit 1eb0dc91dc1e0930acf4fc74d0139951cc21f4bd, no merge conflicts.
Running as SYSTEM
Setting status of 1eb0dc91dc1e0930acf4fc74d0139951cc21f4bd to PENDING with url https://10.20.13.93:8080/job/merlin_core/12/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
> git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/60/*:refs/remotes/origin/pr/60/* # timeout=10
> git rev-parse 1eb0dc91dc1e0930acf4fc74d0139951cc21f4bd^{commit} # timeout=10
Checking out Revision 1eb0dc91dc1e0930acf4fc74d0139951cc21f4bd (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 1eb0dc91dc1e0930acf4fc74d0139951cc21f4bd # timeout=10
Commit message: "Merge branch 'main' into parquet-cast-nullable"
> git rev-list --no-walk 5f326867755930c84d37b8dcf8ccfd463d9c759c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3679070402294277217.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.3.1)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 337 items / 1 skipped
|
This is (hopefully) a temporary fix for a lack of support for nullable dtypes in NVTabular. This change ensures that reading from parquet data (with
cpu=True) will not result in nullable pandas types.cc @albert17