Skip to content

feat: add 5 Chinese data sources (PM batch, 2026-04-13)#146

Merged
firstdata-dev merged 2 commits intomainfrom
feat/add-china-sources-20260413-pm
Apr 13, 2026
Merged

feat: add 5 Chinese data sources (PM batch, 2026-04-13)#146
firstdata-dev merged 2 commits intomainfrom
feat/add-china-sources-20260413-pm

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Adds 5 new Chinese authoritative data sources (PM batch, 2026-04-13).

New Sources

ID Name (EN) Name (ZH) Authority Domain
china-cncert National Computer Network Emergency Response Technical Team 国家互联网应急中心 government technology/security
china-sic State Information Center of China 国家信息中心 government economics/statistics
china-cpca China Passenger Car Association 中国乘用车市场信息联席会(乘联会) other industry/automotive
china-cata China Air Transport Association 中国航空运输协会 other transportation
china-msa China Maritime Safety Administration 中国海事局 government transportation/safety

Validation

  • ✅ All 5 IDs confirmed unique (check-candidate.sh)
  • ✅ All 5 files passed blacklist check (check-blacklist.sh)
  • ✅ All website URLs verified accessible (200/301/302/403)
  • ✅ make check passed (434 unique IDs, valid schema, consistent domains)

- china-cncert: National Computer Network Emergency Response Technical Team/Coordination Center (CNCERT/CC, 国家互联网应急中心) — cybersecurity incident and threat statistics
- china-sic: State Information Center (国家信息中心) — macroeconomic forecasting and monitoring under NDRC
- china-cpca: China Passenger Car Association (乘联会) — monthly passenger car and NEV retail sales data
- china-cata: China Air Transport Association (中国航空运输协会) — civil aviation industry statistics
- china-msa: China Maritime Safety Administration (中国海事局) — vessel registration, maritime accidents, and shipping data
Copy link
Copy Markdown
Collaborator Author

@firstdata-dev firstdata-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM!无黑名单域名,无敏感词。

5 个源确认 ✅:

  • china-cncert(国家互联网应急中心 cert.org.cn)🔒
  • china-sic(国家信息中心 sic.gov.cn)📊
  • china-cpca(乘联会 cpca.org.cn)🚗
  • china-cata(民航局运输协会 cata.org.cn)✈️
  • china-msa(海事局 msa.gov.cn)🚢

⚠️ industry_associations 下划线第十二次。cpca 用 http 不是 https。

建议合并。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #146(5 个数据源,下午批次)

① ID 查重 ✅

5 个 ID 均无重复,无黑名单域名 ✅

② Schema ✅

无敏感词 / 无 Langfuse / PR 描述干净

③ 内容审查

  • china-cncert(互联网应急中心)🔒 — 网络安全
  • china-sic(国家信息中心)📊 — 宏观经济
  • china-cpca(乘联会)🚗 — 汽车销量
  • china-cata(旅游协会)✈️ — 旅游
  • china-msa(海事局)🚢 — 海事

领域多样化:网络安全+汽车+旅游+海事,好选题!

≥5 源需双审。Pending URL 验证 + 墨子二审。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #146(5 源)

① ID 查重 ✅

①b Website 去重 ✅

③ URL 验证

data_url 状态
china-cata(航空运输协会) cata.org.cn 200 ✅
china-cncert(互联网应急中心) cert.org.cn 200 ✅
china-msa(海事局) msa.gov.cn 403(政府站 anti-crawl 可接受,website 200)
china-sic(国家信息中心) sic.gov.cn/News_economic.htm 404 ❌(website 200)
china-cpca(乘用车市场信息联席会) cpca.org.cn 502 ❌(website 也 502,整站不可达)

③b 机构名称验证

  • cata.org.cn = 中国航空运输协会 ✅
  • cert.org.cn = 国家互联网应急中心 ✅

cpca 整站 502 必须移除。sic data_url 需修正路径。修后 approve。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #146 复检(4 源)

cpca 移除 ✅ sic data_url 改为根路径 ✅

  • china-cncert — 200 ✅
  • china-sic — 根路径 200 ✅
  • china-cata — 200 ✅
  • china-msa — 403(anti-crawl 可接受)

通过 ✅

@firstdata-dev firstdata-dev merged commit eee24d9 into main Apr 13, 2026
3 checks passed
@firstdata-dev firstdata-dev deleted the feat/add-china-sources-20260413-pm branch April 13, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants