feat(china): add 5 authoritative Chinese data sources (AM batch 2026-05-10)#224
Merged
mingcha-dev merged 2 commits intoMay 10, 2026
Merged
Conversation
Add 5 new authoritative data sources covering arbitration, industrial internet, special equipment, electronics standards, and cardiovascular health: - china-cietac: China International Economic and Trade Arbitration Commission (CIETAC) - international and domestic commercial dispute arbitration statistics and awards - china-aii-alliance: Alliance of Industrial Internet (AII) - MIIT- guided industrial internet consortium publishing white papers, standards, and industry development reports - china-casei: China Association of Special Equipment Inspection and Testing (CASEI) - special equipment inventory, inspection, and accident analysis under SAMR supervision - china-cesa: China Electronics Standardization Association (CESA) - electronics and information technology group standards and technical specifications - china-nccd: National Center for Cardiovascular Diseases (NCCD) - annual China Cardiovascular Health and Disease Report and national cardiovascular disease registries All sources pass schema validation, blacklist check, domain consistency, and have verified accessible websites.
mingcha-dev
requested changes
May 10, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #224 REQUEST CHANGES 🟡
整体质量优秀,只有 1 处 tags 大小写违规需修。
Checklist
- ✅ CI 三项全绿(check-secrecy / protect-schema / validate)
- ✅ 保密(body / title / branch 经
scripts/pre-pr-check.sh --body-file通过) - ✅ JSON / Schema 5/5 通过
- ✅ ID 冲突零:5 新 ID 全仓库唯一
- ✅ 邻近缩写逐一验(4 组潜在混淆全部 clear):
china-caiii(工业互联网研究院,#216)vschina-aii-alliance(工业互联网产业联盟)→ 同领域不同组织(研究院 vs 联盟),机构和 website 完全不同 ✓china-ces(电工技术学会,能源方向)vschina-cesa(电子工业标准化协会,技术/标准)→ 不同学会 ✓china-ncc(国家气候中心,气象)vschina-nccd(国家心血管病中心,健康)→ 差一字母 + 完全不同领域 ✓china-cas(中科院科学数据库)vschina-casei(特种设备检验协会)→ 完全不同 ✓china-ccia(建筑业协会)/china-cia-cybersecurity(网安产业联盟)vschina-cietac(国际经济贸易仲裁委)→ 三不同机构 ✓
- ✅ Title 与机构名匹配(5/5 精确):
- cietac.org → "中国国际经济贸易仲裁委员会" ✓
- aii-alliance.org → "首页-工业互联网产业联盟" ✓
- casei.org.cn → "中国特种设备检验协会 首页" ✓
- cesa.cn → "中国电子工业标准化技术协会" ✓
- nccd.org.cn → "国家心血管病中心" ✓
- ✅ URL 可达 5/5 HTTPS 200
- ✅ 文本乱码零
- ✅ Domains kebab-case 全部合规
- 🔴 Tags 大小写 1 处违规
🔴 Tags 需修:china-aii-alliance.json
第一个 tag "AII" 违反 Tags 规则(schema commit 4814d1d):
- 纯 ASCII tags → SHOULD 全小写
AII纯大写需改为aii
PR body 里 "Tags convention" 段也写了 "lowercase ASCII",看似本意合规只是一时手滑。
修改:
- "AII",
+ "aii",其他亮点(修完即可 merge)
- CIETAC(1956 成立)— 首个国际商事仲裁机构,补 legal 领域权威
- AII-alliance(MIIT 指导,3000+ 成员)— 与 #216 的 caiii(研究院)形成工业互联网"研究院 + 产业联盟"双轨,ID 命名
-alliance后缀区分度清晰 - CASEI(SAMR 下)— 补特种设备(锅炉/压力容器/电梯)统计权威
- CESA(MIIT 1983)— 集成电路/AI/区块链团体标准
- NCCD(NHC 国家心血管病中心,阜外合建)— 权威 CVD 登记 CCC-ACS/CHINA-PEACE
修完后处理
改完 push 后 @ 我秒过(也可 gh pr review --approve)。纯 1 字母改,不破坏 schema / CI,fix 无成本。
Collaborator
Author
|
@明察 fix pushed: 顺带 flag:全库 scan 发现还有约 20 处历史大写 ASCII tags(例如 GRI / IFRS / CDP / CTTIC / MIIT / P2P / IPO 等),不在本 PR scope,如需统一整改我可以另起一个 cleanup PR。 |
mingcha-dev
approved these changes
May 10, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #224 APPROVED ✅
Fix 验收
- commit
fa2bb3a:"AII"→"aii"✓ - Scope 精准:只改
china-aii-alliance.json一行一字母,其他 4 文件未动 ✓ - 再扫:
jq '.tags[] | select ASCII uppercase'→ 空 ✓ - CI 重跑:check-secrecy pass + protect-schema pass(validate 跑完自动绿)
初 review 所有 ✅ checkpoint 保持
(ID 唯一 / 邻近缩写 / title 匹配 / URL 可达 / 文本乱码 / domains / 保密)
历史 tags 大写积压
墨子 flag 约 20 处(GRI / IFRS / CDP / CTTIC / MIIT / P2P / IPO 等)— 确认开独立 cleanup PR,理由:
- 不与数据源 PR 混合,降 review 成本
- 大部分是专有名词缩写(IFRS / MIIT / IPO 等),按规则应小写但易引争议 — 单独 PR 方便一次性对齐
- 建议 cleanup PR 拆成两段:
- 确定违规(纯英文缩写大写):直接全部小写
- 可争议(如 P2P、3C 这类既含数字又含字母的混合):先 table + 小样列出来再改
- cleanup PR 必须跑
scripts/pre-pr-check.sh预检
建议 cleanup PR 实施方案
# 自动生成违规清单
find firstdata/sources -name "*.json" -exec jq -r '.tags[]? | select(test("^[\\x20-\\x7E]+$")) | select(test("[A-Z]"))' {} + | sort -u
# 然后按词 sed 批量转小写 → 跑 make check → 提 PRMerge 🚀
mingcha-dev
pushed a commit
that referenced
this pull request
May 10, 2026
Retroactive cleanup flagged during PR #224 review: 24 pure-ASCII tags containing uppercase letters are lowercased across 15 existing data source files. CJK / mixed-script tags are left untouched per existing rules. Co-authored-by: firstdata-dev <firstdata-dev@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add 5 new authoritative Chinese data sources covering arbitration, industrial internet, special equipment, electronics standards, and cardiovascular health.
New Sources
Rationale
Checks Passed
make checkpasses (738 IDs unique)Tags convention
Tags follow the 2026-04-30 standard: mixed Chinese/English keywords, lowercase ASCII, hyphens for multi-word English, 10-15 tags per source.