Describe the bug
The single gpu tutorial notebook fails to launch a GPU based Dask cluster
Steps/Code to reproduce bug
- Launch notebook
- Run all steps in 0.Env Setup section
- Navigate to 4.Exact Deduplication section
- Launch GPU Dask cluster by running the following code in the cell
client = get_client(cluster_type = 'gpu', set_torch_to_use_rmm=False)
print(f"Number of dask worker:{get_num_workers(client)}")
client.run(pre_imports)
Returns the following error
NotImplementedError:
NeMo Curator does not support query planning yet.
Please disable query planning before importing
`dask.dataframe` or `dask_cudf`. This can be done via:
`export DASK_DATAFRAME__QUERY_PLANNING=False`, or
importing `dask.dataframe/dask_cudf` after importing
`nemo_curator`.
Expected behavior
The execution should succeed and output should resemble the below
Number of dask worker:1 {'tcp://127.0.0.1:36179': None}
**Environment overview **
- Environment location: Bare-metal
- Method of NeMo-Curator install: Docker
docker run \
--rm \
-it \
--gpus '"device=1"' \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8888:8888 \
-p 8787:8787 \
nvcr.io/nvidia/nemo:dev
Additional context
Setting the following env variable in the notebook's env setup step resolves the issue
os.environ["DASK_DATAFRAME__QUERY_PLANNING"] = "False"
Describe the bug
The single gpu tutorial notebook fails to launch a GPU based Dask cluster
Steps/Code to reproduce bug
Returns the following error
Expected behavior
The execution should succeed and output should resemble the below
Number of dask worker:1 {'tcp://127.0.0.1:36179': None}**Environment overview **
Additional context
Setting the following env variable in the notebook's env setup step resolves the issue
os.environ["DASK_DATAFRAME__QUERY_PLANNING"] = "False"