更详细的环境配置信息 #1

TPF2017 · 2024-01-31T07:29:15Z

您好，我下载了代码并安装了相应的包，但是还是无法直接运行。请问有更详细的环境配置信息吗，比如python版本，cuda版本。
Traceback (most recent call last): File "/home/tianpengfei1/Time-LLM/run_main.py", line 132, in <module> model = TimeLLM.Model(args).float() File "/home/tianpengfei1/Time-LLM/models/TimeLLM.py", line 53, in __init__ self.llama = LlamaModel.from_pretrained( File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2256, in from_pretrained quantization_config, kwargs = BitsAndBytesConfig.from_dict( File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/transformers/utils/quantization_config.py", line 189, in from_dict config = cls(**config_dict) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/transformers/utils/quantization_config.py", line 118, in __init__ self.post_init() File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/transformers/utils/quantization_config.py", line 144, in post_init if self.load_in_4bit and not version.parse(importlib.metadata.version("bitsandbytes")) >= version.parse( File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/importlib/metadata.py", line 569, in version return distribution(distribution_name).version File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/importlib/metadata.py", line 542, in distribution return Distribution.from_name(distribution_name) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/importlib/metadata.py", line 196, in from_name raise PackageNotFoundError(name) importlib.metadata.PackageNotFoundError: bitsandbytes ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9021) of binary: /home/tianpengfei1/anaconda3/envs/llmtime/bin/python Traceback (most recent call last): File "/home/tianpengfei1/anaconda3/envs/llmtime/bin/accelerate", line 8, in <module> sys.exit(main()) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/accelerate/commands/launch.py", line 932, in launch_command multi_gpu_launcher(args) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/accelerate/commands/launch.py", line 627, in multi_gpu_launcher distrib_run.run(args) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/tianpengfei1/anaconda3/envs/llmtime/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

The text was updated successfully, but these errors were encountered:

m6129 · 2024-02-03T07:49:46Z

Joining in. I need more detailed documentation. I also encountered problems during the library installation from requirements.txt, but different ones problems.
I found out that Python no higher than 3.9.x is needed and a GPU, but still, there were problems afterward.

akbism · 2024-02-04T08:23:43Z

For me, installation of the package Time-LLM works when I select the python package 3.8.
However, I am getting the following error when I try to execute " bash ./scripts/TimeLLM_ETTh1.sh":-

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

KimMeen · 2024-02-05T05:20:43Z

@TPF2017 and @m6129, can you give a try with the following configuration to see whether this works in your local environments:

Python=3.8.5
PyTorch=2.0.1
CUDA and pytorch-cuda=11.7
accelerate=0.21.0
transformers=4.29.2
deepspeed=0.10.0

Other dependencies:

numpy=1.24.3
pandas=1.5.3
scikit_learn=1.2.2
reformer_pytorch=1.4.4
tqdm=4.65.0

KimMeen · 2024-02-05T05:29:38Z

For me, installation of the package Time-LLM works when I select the python package 3.8. However, I am getting the following error when I try to execute " bash ./scripts/TimeLLM_ETTh1.sh":-

RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@akbism Could you first check whether num_process=8 is compatible with your local environment? This value should typically correspond to the number of GPUs utilized. Let me and @kwuking know if this issue persists.

KimMeen · 2024-02-08T14:28:09Z

This issue has been closed as there were no further questions.

gsamaras mentioned this issue Feb 6, 2024

RuntimeError: expected scalar type Float but found BFloat16 #3

Closed

KimMeen closed this as completed Feb 8, 2024

zhangtianhong-1998 mentioned this issue Jun 1, 2024

multi GPU error #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

更详细的环境配置信息 #1

更详细的环境配置信息 #1

TPF2017 commented Jan 31, 2024

m6129 commented Feb 3, 2024

akbism commented Feb 4, 2024 •

edited

Loading

KimMeen commented Feb 5, 2024

KimMeen commented Feb 5, 2024

KimMeen commented Feb 8, 2024

更详细的环境配置信息 #1

更详细的环境配置信息 #1

Comments

TPF2017 commented Jan 31, 2024

m6129 commented Feb 3, 2024

akbism commented Feb 4, 2024 • edited Loading

KimMeen commented Feb 5, 2024

KimMeen commented Feb 5, 2024

KimMeen commented Feb 8, 2024

akbism commented Feb 4, 2024 •

edited

Loading