-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Processes never end with multiprocessing
Description
When I run a pipe with multiprocessing on slurm, some processes never end, even after computations are done. This behavior is observed with 1 GPU and 2 CPUs (running processes seen with nvidia-smi cmd) and also with 0 GPU and 2 CPUs (running processes seen with htop cmd). Then, I have to scancel manually those slurm jobs after computations are done and saved.
This behavior is not observed with "simple" processing.
Those are the warnings I get:
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
How to reproduce the bug
import spacy
import edsnlp
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.load("charlson_qualifier")
nlp.add_pipe(eds.sections())
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.aids(), before="charlson_qualifier")
nlp.add_pipe(eds.negation())
nlp.add_pipe(eds.family())
nlp.add_pipe(eds.hypothesis())
nlp.add_pipe(eds.history())
sample_path = ""
save_path = ""
data = edsnlp.data.read_parquet(sample_path, converter="omop")
data = data.set_processing(
num_cpu_workers=2,
num_gpu_workers=1,
show_progress=True,
process_start_method="spawn",
backend="multiprocessing",
)
data = data.map_pipeline(nlp)
edsnlp.data.write_parquet(data, save_path,
converter="ents",
overwrite=True,
write_in_worker=True
)
logger.info("Saved!")
#!/bin/bash
#SBATCH --job-name="test"
#SBATCH -t 3:30:00
#SBATCH --gres=gpu:v100:1
#SBATCH -N1-1
#SBATCH -c2
#SBATCH --mem=40000
#SBATCH -p gpuV100
#SBATCH --container-image /scratch/images/sparkhadoop.sqsh --container-mounts=/export/home/$USER:/export/home/$USER --container-mount-home --container-writable --container-workdir=/
#SBATCH --output=../../logs/slurm_jobs/slurm-%j-stdout.log
#SBATCH --error=../../logs/slurm_jobs/slurm-%j-stderr.log
source $HOME/.user_conda/miniconda/etc/profile.d/conda.sh
/etc/start.sh
nvidia-smi
[PYTHON PATH] [.py SCRIPT PATH]
Your Environment
- Operating System:
- Python Version Used: 3.17.12
- spaCy Version Used: 3.7.5
- EDS-NLP Version Used: 0.17.2
Metadata
Metadata
Assignees
Labels
No labels