Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

推理遇到数据集加载问题 #3731

Closed
1 task done
BinBrent opened this issue May 13, 2024 · 2 comments
Closed
1 task done

推理遇到数据集加载问题 #3731

BinBrent opened this issue May 13, 2024 · 2 comments
Labels
solved This problem has been already solved.

Comments

@BinBrent
Copy link

BinBrent commented May 13, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

显示无法读取数据文件,但命令行中已经输出了数据样本

Running tokenizer on dataset (num_proc=8): 100%|██████████| 50/50 [00:00<00:00, 119.29 examples/s]
Running tokenizer on dataset (num_proc=8): 100%|██████████| 50/50 [00:00<00:00, 118.64 examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home//.conda/envs/LLaMA/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(args, kwds))
File "/home/
/.conda/envs/LLaMA/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(kwargs)):
File "/home/
/.conda/envs/LLaMA/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3595, in _map_single
os.chmod(cache_file_name, 0o666 & ~umask)
FileNotFoundError: [Errno 2] No such file or directory: '/home/
*/.cache/huggingface/datasets/json/default-83535c0c955e8ee5/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7/cache-6939a289e8425e94_00000_of_00008.arrow'
"""

Expected behavior

input_ids:
[21586, 63, 684, 9226, 100, 14482, 100, 4440, 45, 28595, 731, 303, 1547, 303, 14991, 341, 10984, 7692, 451, 331, 1168, 451, 36822, 3966, 51, 465, 10485, 63, 303, 29113, 327, 687, 731, 418, 1168, 451, 29113, 451, 36822, 3966, 51, 465, 3777, 63, 303, 1916, 63, 906, 10984, 7692, 451, 341, 36822, 3966, 51, 303, 1547, 222, 35222, 63, 244]
inputs:
Human: def calculate_average_price(prices):
"""
Calculate the average price of a list of fashion items.

Args:
prices (list): A list of prices of fashion items.

Returns:
float: The average price of the fashion items.
"""

Assistant:

System Info

transformers 4.40.2
torch 2.2.0+cu11.8
deepspeed 0.14.2
datasets 2.19.1
accelerate 0.30.1

Others

No response

@BinBrent BinBrent changed the title LLama-3-70B-instruct推理遇到数据集加载问题 推理遇到数据集加载问题 May 13, 2024
@BinBrent
Copy link
Author

使用了full_multi_gpu里的llama3_full_predict.sh

 model
#model_name_or_path: /home/***/models/Meta-Llama-3-70B-Instruct
model_name_or_path: /***/models/starcoder2-15b-instruct-v0.1
# method
stage: sft
do_predict: true
finetuning_type: full

# dataset
dataset: inference
template: default
cutoff_len: 4096
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 8

# output
output_dir: saves/starcoder-instruct/predict
overwrite_output_dir: true

#fp16: true
# eval
per_device_eval_batch_size: 1
#auto_find_batch_size: true

@hiyouga
Copy link
Owner

hiyouga commented May 24, 2024

删除 cache 文件夹

@hiyouga hiyouga added the solved This problem has been already solved. label May 24, 2024
@hiyouga hiyouga closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

2 participants