-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
==============================
System Info
==============================
OS : Ubuntu 20.04.6 LTS (x86_64)
GCC version : (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version : Could not collect
CMake version : version 3.29.3
Libc version : glibc-2.31
==============================
PyTorch Info
==============================
PyTorch version : 2.6.0+cu124
Is debug build : False
CUDA used to build PyTorch : 12.4
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform : Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.31
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.1.105
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Nvidia driver version : 550.54.14
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60GHz
Stepping: 6
Frequency boost: enabled
CPU MHz: 2601.000
CPU max MHz: 2601.0000
CPU min MHz: 800.0000
BogoMIPS: 5200.00
Virtualization: VT-x
GPU4 NV8 NV8 NV8 NV8 X NV8 NV8 NV8 SYS SYS SYS SYS PXB PXB SYS SYS 32-63,96-127 1 N/A
GPU5 NV8 NV8 NV8 NV8 NV8 X NV8 NV8 SYS SYS SYS SYS PXB PXB SYS SYS 32-63,96-127 1 N/A
GPU6 NV8 NV8 NV8 NV8 NV8 NV8 X NV8 SYS SYS SYS SYS SYS SYS PXB PXB 32-63,96-127 1 N/A
GPU7 NV8 NV8 NV8 NV8 NV8 NV8 NV8 X SYS SYS SYS SYS SYS SYS PXB PXB 32-63,96-127 1 N/A
NIC0 PXB PXB SYS SYS SYS SYS SYS SYS X PIX SYS SYS SYS SYS SYS SYS
NIC1 PXB PXB SYS SYS SYS SYS SYS SYS PIX X SYS SYS SYS SYS SYS SYS
NIC2 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS X PIX SYS SYS SYS SYS
NIC3 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS PIX X SYS SYS SYS SYS
NIC4 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS X PIX SYS SYS
NIC5 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS PIX X SYS SYS
NIC6 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS X PIX
NIC7 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
==============================
Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.1 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526
NCCL_VERSION=2.17.1-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_PRODUCT_NAME=CUDA
NVIDIA_CUDA_END_OF_LIFE=1
CUDA_VERSION=12.1.0
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
I try to evaluate Qwen3-30B-A3B with vllm, but even i set the os.environ["VLLM_ENABLE_V1_MULTIPROCESSING"] = "0"
and set other seed like
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
And evaluate with:
llm = LLM(model=model_path,tensor_parallel_size=8, seed=42,enable_prefix_caching=False)
self.sampling_params = SamplingParams(temperature=0.0,
max_tokens=max_gen_tokens, stop=self.stop,skip_special_tokens=False,spaces_between_special_tokens=False,seed=42)
# pickup output for metric calculate
output = self.llm.generate(prompt_token_ids=context_encoding,sampling_params=self.sampling_params,use_tqdm=False)
Every time, when i reboot the docker container, I received different result, I was wondering what else can i do to make the result the same
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Wangmerlyn
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working