Skip to content

feat: Add Papers With Code and Hugging Face datasets#5

Closed
Claw000 wants to merge 1 commit intoMLT-OSS:mainfrom
Claw000:add-ml-datasets
Closed

feat: Add Papers With Code and Hugging Face datasets#5
Claw000 wants to merge 1 commit intoMLT-OSS:mainfrom
Claw000:add-ml-datasets

Conversation

@Claw000
Copy link
Copy Markdown
Contributor

@Claw000 Claw000 commented Feb 23, 2026

Summary

This PR adds two essential AI/ML data source entries:

1. Papers With Code Datasets

  • URL: https://paperswithcode.com/datasets
  • Authority Level: research
  • Highlights:
    • 8,000+ ML datasets with standardized metadata
    • Links to academic papers and code implementations
    • Benchmark results and leaderboards
    • Has API

2. Hugging Face Datasets

  • URL: https://huggingface.co/datasets
  • Authority Level: market
  • Highlights:
    • 100,000+ datasets (largest open ML dataset hub)
    • Python datasets library for programmatic access
    • Covers NLP, vision, audio, multimodal
    • Standardized dataset cards with licensing

Checklist

  • JSON follows schema
  • URLs verified
  • Bilingual (en/zh) descriptions
  • Appropriate authority_level assigned

Submitted by: Claw (via OpenClaw)
Related Issue: #4

- papers-with-code-datasets: 8,000+ ML datasets with benchmarks
- huggingface-datasets: 100,000+ datasets with Python API

Both are essential resources for ML/AI researchers.
@Claw000
Copy link
Copy Markdown
Contributor Author

Claw000 commented Feb 25, 2026

按要求关闭,将重新提交以适配新的 GitHub Action 检测

@Claw000 Claw000 closed this Feb 25, 2026
firstdata-dev added a commit that referenced this pull request Mar 31, 2026
…ion quality guidelines

## What this PR does

Adds comprehensive Limitations documentation for all 5 MCP tools based on
verified testing and schema analysis. Also adds the missing Example for
report_feedback (the only tool without one) and establishes a 6-dimension
description quality checklist for future tool additions.

## Changes

### SKILL.md — MCP Tools Reference (new section)
- Common Limitations: authentication, daily quota, network dependency
- search_source: 200 max results, keyword substring matching behavior,
  space-in-keyword pitfall, domain substring matching, no boolean operators
- get_source: silent error behavior (isError:false with error objects),
  recommended batch size
- ask_agent: query constraints, non-idempotent, 2-8s response time,
  web_search trigger warning
- get_access_guide: incomplete instruction coverage, 3-20s response time,
  operation specificity requirement
- report_feedback: message length, non-idempotent, two usage examples
  (broken link + outdated content)

### Description Quality Guidelines (new section)
- Core principle: 'Write it right before writing it all'
- 6-dimension checklist for PR review

### mcp-tool-descriptions-draft.md (new file)
- Server-side description text ready to paste into Python code
- Verification evidence table with test results and schema references

## Verification Evidence

Every limitation is backed by schema analysis or live testing:
- search_source limit 200: inputSchema maximum:200
- Keywords not auto-tokenized: tested ['中国 GDP']→0, ['中国','GDP']→173
- get_source silent error: tested invalid ID returns error object, isError:false
- ask_agent timing: 3 runs measured 1.8s, 2.9s, 7.4s
- get_access_guide timing: 3 runs measured 3.0s, 17.6s, 19.1s
- Token quota: TokenVerifyResponse schema has quota_allowed/remaining_daily
- Trial quota 30/day: verified via /api/trial/session-info

## 6-Dimension Self-Assessment (post-change)

| Dimension | search_source | get_source | ask_agent | get_access_guide | report_feedback |
|-----------|:---:|:---:|:---:|:---:|:---:|
| Purpose | ✅ | ✅ | ✅ | ✅ | ✅ |
| Guidelines | ✅ | ✅ | ✅ | ✅ | ✅ |
| Examples | ✅ | ✅ | ✅ | ✅ | ✅ (NEW) |
| Limitations | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) |
| Parameters | ✅ | ✅ | ✅ | ✅ | ✅ |
| Return Format | ✅ | ✅ | ✅ | ✅ | ✅ |

Target: 5/5 tools × 6/6 dimensions = 30/30 ✅

Refs: MCP Search Quality Research #5, arXiv 2602.14878, arXiv 2602.18914
firstdata-dev added a commit that referenced this pull request Mar 31, 2026
…ion quality guidelines (#112)

* docs: add MCP tool limitations, report_feedback example, and description quality guidelines

## What this PR does

Adds comprehensive Limitations documentation for all 5 MCP tools based on
verified testing and schema analysis. Also adds the missing Example for
report_feedback (the only tool without one) and establishes a 6-dimension
description quality checklist for future tool additions.

## Changes

### SKILL.md — MCP Tools Reference (new section)
- Common Limitations: authentication, daily quota, network dependency
- search_source: 200 max results, keyword substring matching behavior,
  space-in-keyword pitfall, domain substring matching, no boolean operators
- get_source: silent error behavior (isError:false with error objects),
  recommended batch size
- ask_agent: query constraints, non-idempotent, 2-8s response time,
  web_search trigger warning
- get_access_guide: incomplete instruction coverage, 3-20s response time,
  operation specificity requirement
- report_feedback: message length, non-idempotent, two usage examples
  (broken link + outdated content)

### Description Quality Guidelines (new section)
- Core principle: 'Write it right before writing it all'
- 6-dimension checklist for PR review

### mcp-tool-descriptions-draft.md (new file)
- Server-side description text ready to paste into Python code
- Verification evidence table with test results and schema references

## Verification Evidence

Every limitation is backed by schema analysis or live testing:
- search_source limit 200: inputSchema maximum:200
- Keywords not auto-tokenized: tested ['中国 GDP']→0, ['中国','GDP']→173
- get_source silent error: tested invalid ID returns error object, isError:false
- ask_agent timing: 3 runs measured 1.8s, 2.9s, 7.4s
- get_access_guide timing: 3 runs measured 3.0s, 17.6s, 19.1s
- Token quota: TokenVerifyResponse schema has quota_allowed/remaining_daily
- Trial quota 30/day: verified via /api/trial/session-info

## 6-Dimension Self-Assessment (post-change)

| Dimension | search_source | get_source | ask_agent | get_access_guide | report_feedback |
|-----------|:---:|:---:|:---:|:---:|:---:|
| Purpose | ✅ | ✅ | ✅ | ✅ | ✅ |
| Guidelines | ✅ | ✅ | ✅ | ✅ | ✅ |
| Examples | ✅ | ✅ | ✅ | ✅ | ✅ (NEW) |
| Limitations | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) |
| Parameters | ✅ | ✅ | ✅ | ✅ | ✅ |
| Return Format | ✅ | ✅ | ✅ | ✅ | ✅ |

Target: 5/5 tools × 6/6 dimensions = 30/30 ✅

Refs: MCP Search Quality Research #5, arXiv 2602.14878, arXiv 2602.18914

* refine: keyword wording (guiding > restrictive) + quota query limitation

Address review feedback:
1. Keyword space behavior: reworded from restrictive ('NOT auto-tokenized')
   to guiding ('pass each term as a separate array element'), with 'New Zealand'
   design rationale per 明鉴's suggestion
2. Token quota: added explicit note that no client-facing API exists to query
   remaining quota at runtime, per 明鉴's question

* refine: source_ids batch size as practical guideline, not hard limit

* fix: align report_feedback examples between SKILL.md and draft

Draft had shortened versions of the examples; now both files have
identical text as required by the draft file's own header.

* fix: align examples (short version) + add quota query mechanism

1. Examples: unified to short version per review (server-side descriptions
   should be concise)
2. Quota: replaced 'no client-facing API' with actual mechanism —
   Token verification API (POST /api/token/verify) returns remaining_daily,
   but this is a separate HTTP call, not available via MCP tool invocation

* fix: AND→OR logic (verified), draft header wording, add OR evidence

Critical fix:
- Multiple keywords use OR logic, NOT AND. Verified:
  GDP=100, health=78, GDP+health=138 (>max → OR)
  trade=123, agriculture=45, trade+agriculture=131 (>max → OR)
- Draft header: 'must remain identical' → 'condensed from SKILL.md,
  semantics must match'
- Added OR logic verification to evidence table

* add: search_source response time (~1s) to draft

Per 明鉴 review: Agent needs response time info for all tools,
not just the slow ones, to make informed tool selection decisions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant