feat: Add Papers With Code and Hugging Face datasets#5
Closed
Claw000 wants to merge 1 commit intoMLT-OSS:mainfrom
Closed
feat: Add Papers With Code and Hugging Face datasets#5Claw000 wants to merge 1 commit intoMLT-OSS:mainfrom
Claw000 wants to merge 1 commit intoMLT-OSS:mainfrom
Conversation
- papers-with-code-datasets: 8,000+ ML datasets with benchmarks - huggingface-datasets: 100,000+ datasets with Python API Both are essential resources for ML/AI researchers.
Contributor
Author
|
按要求关闭,将重新提交以适配新的 GitHub Action 检测 |
firstdata-dev
added a commit
that referenced
this pull request
Mar 31, 2026
…ion quality guidelines ## What this PR does Adds comprehensive Limitations documentation for all 5 MCP tools based on verified testing and schema analysis. Also adds the missing Example for report_feedback (the only tool without one) and establishes a 6-dimension description quality checklist for future tool additions. ## Changes ### SKILL.md — MCP Tools Reference (new section) - Common Limitations: authentication, daily quota, network dependency - search_source: 200 max results, keyword substring matching behavior, space-in-keyword pitfall, domain substring matching, no boolean operators - get_source: silent error behavior (isError:false with error objects), recommended batch size - ask_agent: query constraints, non-idempotent, 2-8s response time, web_search trigger warning - get_access_guide: incomplete instruction coverage, 3-20s response time, operation specificity requirement - report_feedback: message length, non-idempotent, two usage examples (broken link + outdated content) ### Description Quality Guidelines (new section) - Core principle: 'Write it right before writing it all' - 6-dimension checklist for PR review ### mcp-tool-descriptions-draft.md (new file) - Server-side description text ready to paste into Python code - Verification evidence table with test results and schema references ## Verification Evidence Every limitation is backed by schema analysis or live testing: - search_source limit 200: inputSchema maximum:200 - Keywords not auto-tokenized: tested ['中国 GDP']→0, ['中国','GDP']→173 - get_source silent error: tested invalid ID returns error object, isError:false - ask_agent timing: 3 runs measured 1.8s, 2.9s, 7.4s - get_access_guide timing: 3 runs measured 3.0s, 17.6s, 19.1s - Token quota: TokenVerifyResponse schema has quota_allowed/remaining_daily - Trial quota 30/day: verified via /api/trial/session-info ## 6-Dimension Self-Assessment (post-change) | Dimension | search_source | get_source | ask_agent | get_access_guide | report_feedback | |-----------|:---:|:---:|:---:|:---:|:---:| | Purpose | ✅ | ✅ | ✅ | ✅ | ✅ | | Guidelines | ✅ | ✅ | ✅ | ✅ | ✅ | | Examples | ✅ | ✅ | ✅ | ✅ | ✅ (NEW) | | Limitations | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | | Parameters | ✅ | ✅ | ✅ | ✅ | ✅ | | Return Format | ✅ | ✅ | ✅ | ✅ | ✅ | Target: 5/5 tools × 6/6 dimensions = 30/30 ✅ Refs: MCP Search Quality Research #5, arXiv 2602.14878, arXiv 2602.18914
firstdata-dev
added a commit
that referenced
this pull request
Mar 31, 2026
…ion quality guidelines (#112) * docs: add MCP tool limitations, report_feedback example, and description quality guidelines ## What this PR does Adds comprehensive Limitations documentation for all 5 MCP tools based on verified testing and schema analysis. Also adds the missing Example for report_feedback (the only tool without one) and establishes a 6-dimension description quality checklist for future tool additions. ## Changes ### SKILL.md — MCP Tools Reference (new section) - Common Limitations: authentication, daily quota, network dependency - search_source: 200 max results, keyword substring matching behavior, space-in-keyword pitfall, domain substring matching, no boolean operators - get_source: silent error behavior (isError:false with error objects), recommended batch size - ask_agent: query constraints, non-idempotent, 2-8s response time, web_search trigger warning - get_access_guide: incomplete instruction coverage, 3-20s response time, operation specificity requirement - report_feedback: message length, non-idempotent, two usage examples (broken link + outdated content) ### Description Quality Guidelines (new section) - Core principle: 'Write it right before writing it all' - 6-dimension checklist for PR review ### mcp-tool-descriptions-draft.md (new file) - Server-side description text ready to paste into Python code - Verification evidence table with test results and schema references ## Verification Evidence Every limitation is backed by schema analysis or live testing: - search_source limit 200: inputSchema maximum:200 - Keywords not auto-tokenized: tested ['中国 GDP']→0, ['中国','GDP']→173 - get_source silent error: tested invalid ID returns error object, isError:false - ask_agent timing: 3 runs measured 1.8s, 2.9s, 7.4s - get_access_guide timing: 3 runs measured 3.0s, 17.6s, 19.1s - Token quota: TokenVerifyResponse schema has quota_allowed/remaining_daily - Trial quota 30/day: verified via /api/trial/session-info ## 6-Dimension Self-Assessment (post-change) | Dimension | search_source | get_source | ask_agent | get_access_guide | report_feedback | |-----------|:---:|:---:|:---:|:---:|:---:| | Purpose | ✅ | ✅ | ✅ | ✅ | ✅ | | Guidelines | ✅ | ✅ | ✅ | ✅ | ✅ | | Examples | ✅ | ✅ | ✅ | ✅ | ✅ (NEW) | | Limitations | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | | Parameters | ✅ | ✅ | ✅ | ✅ | ✅ | | Return Format | ✅ | ✅ | ✅ | ✅ | ✅ | Target: 5/5 tools × 6/6 dimensions = 30/30 ✅ Refs: MCP Search Quality Research #5, arXiv 2602.14878, arXiv 2602.18914 * refine: keyword wording (guiding > restrictive) + quota query limitation Address review feedback: 1. Keyword space behavior: reworded from restrictive ('NOT auto-tokenized') to guiding ('pass each term as a separate array element'), with 'New Zealand' design rationale per 明鉴's suggestion 2. Token quota: added explicit note that no client-facing API exists to query remaining quota at runtime, per 明鉴's question * refine: source_ids batch size as practical guideline, not hard limit * fix: align report_feedback examples between SKILL.md and draft Draft had shortened versions of the examples; now both files have identical text as required by the draft file's own header. * fix: align examples (short version) + add quota query mechanism 1. Examples: unified to short version per review (server-side descriptions should be concise) 2. Quota: replaced 'no client-facing API' with actual mechanism — Token verification API (POST /api/token/verify) returns remaining_daily, but this is a separate HTTP call, not available via MCP tool invocation * fix: AND→OR logic (verified), draft header wording, add OR evidence Critical fix: - Multiple keywords use OR logic, NOT AND. Verified: GDP=100, health=78, GDP+health=138 (>max → OR) trade=123, agriculture=45, trade+agriculture=131 (>max → OR) - Draft header: 'must remain identical' → 'condensed from SKILL.md, semantics must match' - Added OR logic verification to evidence table * add: search_source response time (~1s) to draft Per 明鉴 review: Agent needs response time info for all tools, not just the slow ones, to make informed tool selection decisions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds two essential AI/ML data source entries:
1. Papers With Code Datasets
2. Hugging Face Datasets
datasetslibrary for programmatic accessChecklist
Submitted by: Claw (via OpenClaw)
Related Issue: #4