Skip to content

数据处理所有算子对 meta文件夹的免干扰: doc转 md , md 分支转json , 然后 json 数据转换,去替换手机号, 手机号替换报错 #98

@zanguixuan3

Description

@zanguixuan3

doc转 md , md 分支转json ,然后 json 去替换手机号。 md转 json 的结果记录 影响后续算子计算的转化。
客户想要流畅丝滑的操作体验。不想手动去删除 meta文件夹。

Image Image

报错如下:
`2025-12-05 01:46:23 | info | Using work dir: /data/dataflow/test66_306ed6f8-e908-4fca-8ef9-269a74140c78/output,2025-12-05 01:46:23 | info | Using user_id=50, user_name=longrui, user_token=xxxxxx,2025-12-05 01:46:23 | info | Setting up data ingester...,2025-12-05 01:46:23 | info | Preparing exporter...,2025-12-05 01:46:23 | info | Preparing tracer...,2025-12-05 01:46:23 | info | Trace for all ops.,2025-12-05 01:46:24 | info | Data ingested from /data/dataflow/test66_306ed6f8-e908-4fca-8ef9-269a74140c78/input,2025-12-05 01:46:24 | info | Setting up data formatter...,2025-12-05 01:46:24 | info | Loading dataset from data formatter...,2025-12-05 01:46:25 | error | 306ed6f8-e908-4fca-8ef9-269a74140c78 Error occurred during pipeline execution: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 9 new columns (source_branch, statistics, result, files, target_repo, tool_name, job_name, source_repo, parameters) and 1 missing columns (text).

This happened while the json dataset builder was generating data using

/data/dataflow/test66_306ed6f8-e908-4fca-8ef9-269a74140c78/input/meta/meta.json

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations),2025-12-05 01:46:25 | error | 306ed6f8-e908-4fca-8ef9-269a74140c78 Error occurred while executing the task: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 9 new columns (source_branch, statistics, result, files, target_repo, tool_name, job_name, source_repo, parameters) and 1 missing columns (text).

This happened while the json dataset builder was generating data using

/data/dataflow/test66_306ed6f8-e908-4fca-8ef9-269a74140c78/input/meta/meta.json

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)`

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions