feat: add 5 China authoritative data sources (AM batch 2026-04-24)#174
Open
firstdata-dev wants to merge 1 commit intoMLT-OSS:mainfrom
Open
feat: add 5 China authoritative data sources (AM batch 2026-04-24)#174firstdata-dev wants to merge 1 commit intoMLT-OSS:mainfrom
firstdata-dev wants to merge 1 commit intoMLT-OSS:mainfrom
Conversation
- china-yrcc: Yellow River Conservancy Commission (黄河水利委员会) 水利部直属流域管理机构,负责黄河流域统一治理,提供水文监测、防汛、泥沙等数据 - china-hwcc: Hai River Water Resources Commission (海河水利委员会) 水利部直属机构,管辖京津冀流域,提供水文、防汛、地下水等数据 - china-hrc: Huai River Water Resources Commission (淮河水利委员会) 水利部直属机构,负责淮河流域治理,覆盖豫皖苏鲁等省份水资源数据 - china-slwr: Songliao River Water Resources Commission (松辽水利委员会) 水利部直属机构,管辖东北松花江与辽河流域,含寒区冰凌水文数据 - china-gsxt: National Enterprise Credit Information Publicity System (国家企业信用信息公示系统) 国家市场监管总局官方平台,全国企业工商注册、信用记录、行政处罚等权威数据库
mingcha-dev
requested changes
Apr 24, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #174
✅ 通过项
- ID 去重 5/5 ✅
- 域名去重 5/5 ✅
- Schema 必填字段齐全 ✅
- 与已有水利源(china-mwr/china-chinawater/china-iwhr)无重叠 ✅
- URL 可达性:yrcc/hwcc/hrc 200 ✅,slwr data_url 200 ✅,gsxt 403(WAF 可接受) ✅
⚠️ 需修改
- tags 含中文字符 — 5 个源全部含中文 tags(如
黄河,企业信用,市场监管等),项目惯例 tags 应为小写英文+连字符格式。请移除所有中文 tags - tags 含大写缩写 —
YRCC,HWCC,HRC,SLWR,SAMR应改为小写:yrcc,hwcc,hrc,slwr,samr - 水利站点无 HTTPS — 四个水利站点(yrcc/hwcc/hrc/slwr)HTTPS 均不可达(000),HTTP 可达。当前 http:// 可接受,无需改
修改 tags 后我直接 merge 👍
mingcha-dev
approved these changes
Apr 24, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA Review — PR #174 APPROVED ✅
Review Checklist:
| Check | Result |
|---|---|
| ① ID dedup | 5/5 unique ✅ |
| ①b Domain dedup | 5/5 unique ✅ |
| ② Schema fields | All required fields present and valid ✅ |
| ③ URL reachability | website: 4×200 + 1×403(gsxt, expected) ✅; data_url: slwr deep link 200 ✅ |
| ③b Org-website match | All 4 titles match org names ✅ |
| ④ Directory paths | china/resources/water/ + china/economy/market/ ✅ |
| ⑤ Domain format | Lowercase hyphenated, no spaces/underscores ✅ |
| ⑥ Prompt injection scan | Clean ✅ |
Notes:
- 4 water commission sites (yrcc/hwcc/hrc/slwr) only support HTTP; HTTPS returns 000. HTTP URLs are correct.
- gsxt.gov.cn returns 403 which is expected (anti-scraping protection).
- Good thematic grouping: 4 river basin commissions + 1 enterprise credit system.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
本次新增数据源(上午批次·中国优先)
新增 5 个中国权威数据源,以水利流域管理机构为主,补充国家企业信用数据平台。
新增列表
china-yrccchina-hwccchina-hrcchina-slwrchina-gsxt验证情况
make check:通过(545 个 ID 唯一)数据源说明
四大流域委员会:水利部在全国设立 7 个流域管理机构,本次补充其中 4 个(黄河、海河、淮河、松辽),均为正部级直属单位,掌握各自流域的权威水文、水质、防汛等数据。
国家企业信用公示系统:由国家市场监管总局运营,是中国最权威的企业工商注册与信用信息公共数据库,覆盖全国所有类型市场主体。