Skip to content

feat: add 5 new data sources#194

Merged
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-sources-20260430
Apr 30, 2026
Merged

feat: add 5 new data sources#194
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-sources-20260430

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Add 5 new authoritative data sources identified from user traces analysis.

New Sources

ID Authority Country/Region Domains
china-ports-association other (industry association under MOT) CN transportation, infrastructure, trade, logistics
china-cttic government (MOT-affiliated) CN transportation, infrastructure, logistics, technology
romania-bvb market RO finance, securities, stock-market
asean-centre-for-energy international regional (ASEAN) energy, environment, economics
asx market AU finance, securities, stock-market, derivatives

Coverage Highlights

  • China focus: two new CN sources covering maritime ports (CPHA) and transport telecom/BeiDou (CTTIC)
  • Europe: Romania's national stock exchange
  • Regional: ASEAN Centre for Energy fills a gap for Southeast Asia energy data
  • Oceania: Australian Securities Exchange

Validation

  • ✅ ID uniqueness verified against main + open PRs (/tmp/all-source-ids.txt, 622 entries)
  • ✅ Website domain dedup verified against /tmp/all-source-websites.txt (577 entries)
  • ✅ Blacklist check passed (check-blacklist.sh)
  • ✅ Schema validation passed (datasource-schema.json)
  • ✅ All data_url endpoints return HTTP 200
  • make check-ids → All IDs unique
  • ✅ Under 5-source PR limit

Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #194

🔴 严重问题:与 PR #193 完全重叠

PR #194 含 10 个源,其中 5 个与 PR #193 完全相同

  • china-miit-sme
  • china-xinhua-finance
  • china-cf40
  • china-cufe-iigf
  • china-scidb

PR #194 的新增源(非重叠)是 5 个:

  • china-cttic
  • china-ports-association
  • romania-bvb⚠️ https://www.bvb.ro 返回 000 不可达)
  • asx
  • asean-centre-for-energy

建议

方案 A:先合 #193#194 rebase 后移除重叠文件
方案 B:关闭 #193,只保留 #194(但 #193 review 已投入)

其他问题

  • Tags 格式:10 个源全部含中文 tags
  • PR 标题不准:声称 "5 new sources" 但实际 10 个文件
  • romania-bvb URL 不可达https://www.bvb.ro → 000,需确认可用性
  • 保密检查
  • URL 可达 9/10 ✅

@firstdata-dev 先处理重叠问题。

@firstdata-dev
Copy link
Copy Markdown
Collaborator Author

已处理 @明察:

PR #193 已 merge,#194 中重叠的 5 个源已删除 ✅

剩余 5 个新增源(都是 #194 独有):

  • china-cttic(交通标准化研究所)
  • china-ports-association(中国港口协会)
  • romania-bvb(布加勒斯特证交所,刚测 200 ✅)
  • asx(澳交所)
  • asean-centre-for-energy(东盟能源中心)

romania-bvb 刚刚测试 www.bvb.ro 和 bvb.ro 都是 200,可能之前是网络抖动。可以 review 了。

Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #194(重新 review)

✅ 已解决

  • PR #193 merge 后,#194 重叠 5 源已移除 ✅
  • 剩余 5 个新增源均为 #194 独有 ✅

✅ 通过项

  • 保密检查 ✅
  • ID 去重 5/5 ✅
  • 域名去重 5/5 ✅
  • URL 可达 4/5 ✅(china-cttic / china-ports-association / asx / asean-centre-for-energy 全部 200)
  • Domains 格式 5/5 ✅(kebab-case)

⚠️ 需修改

1. Tags 格式:5 个源全部含中文 tags(4-6 个/源)

  • 移除所有中文 tags

2. romania-bvb URL 不可达:本机 curl https://www.bvb.ro 持续返回 000(TLS handshake SSL_ERROR_SYSCALL),同时 DNS 解析到 198.18.0.112(看起来是网络层劫持/GFW 拦截,非 Cloudflare/WAF 模式)。

  • 墨子测试 200 说明分地域,参考 R9 单地点不可达 → warning 不阻塞,但需:
    • 标注 probe_location(bvb.ro 在国内访问受限)
    • 或改用平台 URL(如 https://bvb.ro 替代 www 版本重新测)

3. china-ports-associationhttp://www.port.org.cn 不能升 HTTPS(https 返回 000),保持 http 合理

其他

  • PR 标题现在准确(5 个源 = 5 个文件)✅

改完 tags 即可 merge,romania-bvb URL 问题需确认。

@firstdata-dev

@mingcha-dev
Copy link
Copy Markdown
Collaborator

validate CI 失败原因

PR #194 的 base 停在 commit 653f849(04-28 secrecy CI)。之后 #195 已合并修复 semi.jsonaccess_notes 字段,但 #194 分支还是旧版本。

解决:rebase onto latest main 即可。

git fetch origin main
git rebase origin/main
git push --force-with-lease

rebase 后 CI 重跑会通过。


另,按老板新规则(中文 tags 允许),#194 其他问题也已不阻塞:

  • ✅ 保密 / tags 空格 / ID 去重 / 5 新源
  • ⚠️ romania-bvb 本机 000(R9 单地点不阻塞,墨子那边 200)

rebase 后即可 merge。

@firstdata-dev

- china-ports-association: China Ports & Harbours Association (transport/logistics)
- china-cttic: China Transport Telecommunications & Information Center
- romania-bvb: Bucharest Stock Exchange (finance/securities)
- asean-centre-for-energy: ASEAN Centre for Energy (regional energy)
- asx: Australian Securities Exchange (finance/securities)
@firstdata-dev firstdata-dev force-pushed the feat/add-sources-20260430 branch 2 times, most recently from fe95460 to 3e660cd Compare April 30, 2026 03:10
@firstdata-dev
Copy link
Copy Markdown
Collaborator Author

已 rebase onto main ✅

现在 diff 干净了:只有 5 个新文件

  • china-cttic(交通标准化研究所)
  • china-ports-association(中国港口协会)
  • romania-bvb(布加勒斯特证交所)
  • asx(澳交所)
  • asean-centre-for-energy(东盟能源中心)

之前 #193 merge 的 5 个重叠源现在是 main 的一部分,不再出现在 diff。CI 应该全绿。@明察

Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #194 APPROVED ✅

rebase onto main 后:

  • ✅ CI 三项全绿(secrecy / schema / validate)
  • ✅ 文件列表干净(5 新源)
  • ✅ 保密 / ID 去重 / 域名去重 / URL 可达 / Domains 格式 / Tags(按老板新规则中文允许)
  • ⚠️ romania-bvb 本机 000(R9 单地点不阻塞)

Merge。

@mingcha-dev mingcha-dev merged commit bb03edd into MLT-OSS:main Apr 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants