-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
weclone-cli make-dataset
[WeClone] W | 06:15:05 | 警告:您的 settings.jsonc 文件版本 (0.2.23) 与项目建议的配置版本 (0.2.22) 不一致。
[WeClone] W | 06:15:05 | 这可能导致意外行为或错误。请从 settings.template.json 复制或更新您的 settings.jsonc 文件。
[WeClone] W | 06:15:05 | 配置文件更新日志:
[0.2.2] - 2025-05-01 - 增加llm清洗数据配置,blocked_words迁移到settings.jsonc统一配置文件。
[0.2.21] - 2025-05-01 - 增加在线llm清洗数据配置,兼容openai风格接口。
[0.2.22] - 2025-06-05 - 支持图片模态聊天记录微调。
[WeClone] I | 06:15:05 | Loading configuration from: ./settings.jsonc
[WeClone] I | 06:15:06 | Loading configuration from: ./settings.jsonc
[WeClone] I | 06:15:06 | 共发现 2 个 CSV 文件,开始处理,请耐心等待...
Traceback (most recent call last):
File "pandas/_libs/parsers.pyx", line 1120, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1272, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1285, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1535, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 2: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/autodl-tmp/WeClone/.venv/bin/weclone-cli", line 10, in
sys.exit(cli())
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/click/core.py", line 1830, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/root/autodl-tmp/WeClone/weclone/cli.py", line 33, in wrapper
return func(*args, **kwargs)
File "/root/autodl-tmp/WeClone/weclone/cli.py", line 51, in new_runtime_wrapper
return original_cmd_func(*args, **kwargs)
File "/root/autodl-tmp/WeClone/weclone/cli.py", line 76, in qa_generator
processor.main()
File "/root/autodl-tmp/WeClone/weclone/data/qa_generator.py", line 177, in main
chat_messages = self.load_csv(csv_file)
File "/root/autodl-tmp/WeClone/weclone/data/qa_generator.py", line 569, in load_csv
df = pd.read_csv(
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1066, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1127, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1272, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1285, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1535, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 2: invalid continuation byte