add article fact checking skills for Claude and OpenClaw by seancoding-day · Pull Request #371 · MigoXLab/dingo

seancoding-day · 2026-03-24T03:34:50Z

No description provided.

SDK wrapper for ArticleFactChecker with format detection, plaintext JSONL wrapping, and structured JSON output.

- Remove unused List import, use NoReturn for error_exit - Initialize temp_path before try blocks in tests to prevent NameError

Defines skill frontmatter, prerequisites, usage flow, and result presentation guidelines.

Covers model selection, claim types, tuning parameters, environment variables, and troubleshooting.

- Default model: gpt-4o-mini → gpt-5.4-mini - Model table: add gpt-5.4, gpt-5.4-nano, o3, o4-mini, deepseek-chat - Remove CSV from format detection (unsupported for article wrapping) - Add model selection guide and alternative providers section

Port dingo-verify to Agent Skills / OpenClaw format: - Use {baseDir} instead of ${CLAUDE_SKILL_DIR} - Remove Claude Code-only fields (argument-hint, allowed-tools) - Add metadata.openclaw with requires.env, bins, primaryEnv, emoji - Add license, compatibility fields per Agent Skills spec - Reuse fact_check.py and advanced-config.md unchanged (portable) Distributable via ClawHub as skills/dingo-verify/

Security (fact_check.py): - Add validate_article_path(): block /proc//sys//dev/, symlinks, unsupported extensions - Secure temp file: NamedTemporaryFile (O_CREAT|O_EXCL, mode 0o600, full-entropy name) - Add 10 MB file size limit before full read to prevent OOM - Bound --max-claims (1-200) and --max-concurrent (1-20) to prevent API cost attack - Fix exception handler: do not leak str(e) which may contain SDK/config internals - Separate ValueError handler for user-facing validation errors - Tavily warning: plain text to stderr instead of JSON (reserve JSON for errors) Code quality (fact_check.py): - Module docstring: remove "Claude Code Skill" branding - extract_detail_report: remove redundant len() check - LangChain hint: use pip install "dingo-python[agent]" (portable, no repo path) ClawSkill spec (skills/dingo-verify/SKILL.md): - Python version: 3.9+ → 3.10+ (matches dingo python_requires) - compatibility: repo-relative path → pip install "dingo-python[agent]" - Add install array: [{kind: uv, package: dingo-python[agent]}] for macOS UI - Description: clarify TAVILY_API_KEY is optional but recommended Tests (21 passing): - Add TestWrapPlaintextEmpty.test_file_too_large_raises_value_error - Add TestValidateArticlePath (5 tests: valid md, valid jsonl, csv rejected, /proc/ rejected, symlink rejected)

fact_check.py: - frozenset → frozenset[str] (precise element type) - validate_article_path: extract p = pathlib.Path(path) (single construction) - main(): remove 5 defensive 'if result else' expressions (execute() never returns None) test_fact_check_script.py: - Hoist all method-level 'from fact_check import' to module top (15 → 1 import block) - Fix weak assertion: 'endswith(.md) or result' → 'os.path.isabs(result)' - Remove orphaned blank lines from method bodies

Root cause (from screenshot): AI gave up on ArticleFactChecker because SKILL.md had no input prep instructions (plaintext needs JSONL wrapping), no full config params, and no runnable example. Changes: - clawhub/scripts/fact_check.py: bundled wrapper script with path validation, secure temp files, bounded args, structured JSON output - clawhub/references/advanced-config.md: model selection, claim types, env vars, output artifacts, troubleshooting - clawhub/SKILL.md: add "Fact-Checking Articles" section with - script quick-start (one-liner via {baseDir}/scripts/fact_check.py) - manual SDK snippet with JSONL wrapping pattern - if __name__ == "__main__" guard note (multiprocessing requirement) - output structure and result interpretation guide - clawhub/_meta.json: add python3 to bins, dingo-python[agent] to packages, TAVILY/OPENAI env vars with descriptions Architecture: clawhub/dingo-data-quality = comprehensive entry skill; skills/dingo-verify = lightweight specialist for Claude Code / OpenClaw.

…verification Enhance arxiv_search with optional fetch_affiliations config that scrapes the ltx_authors section from arXiv HTML paper pages, providing authoritative author+institution text (e.g. "1 Shanghai AI Laboratory 2 Abaka AI") that the arXiv Atom API and feedparser do not expose. Changes: - arxiv_search.py: add ArxivConfig.fetch_affiliations field (default False), _fetch_html_affiliations() method, and per-result HTML enrichment loop in execute(); fix 429 error message (was generic "Search failed: HTTPError") - agent_article_fact_checker.py: update TOOLS_DESCRIPTION, WORKFLOW_STEPS and PER_CLAIM_VERIFICATION_PROMPT so agents treat affiliations_text as the authoritative source for institutional/attribution claims - Enable fetch_affiliations=True in all three entry points: .claude/skills/dingo-verify/scripts/fact_check.py, skills/dingo-verify/scripts/fact_check.py, examples/agent/agent_article_fact_checking_example.py Result: institutional claims previously UNVERIFIABLE (e.g. "OmniDocBench released by Tsinghua/Alibaba/Shanghai AI Lab") now correctly judged FALSE from paper affiliation data without requiring Tavily web search. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-03-24T03:35:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Dingo's capabilities by integrating a robust article fact-checking skill. It introduces an autonomous agent (ArticleFactChecker) that can extract and verify factual claims from various document formats using web and academic search. The changes streamline the user experience through a dedicated script and enhance the underlying verification logic, particularly for institutional claims, by improving data sourcing from arXiv. This empowers users to quickly assess the accuracy of information, fostering greater trust and reliability in data analysis workflows.

Highlights

New Fact-Checking Skill: Introduced a new 'dingo-verify' skill for both Claude and OpenClaw, enabling users to fact-check articles and verify factual claims using Dingo's ArticleFactChecker agent.
Dedicated Script for Ease of Use: A new Python script ('fact_check.py') was added for both Claude and OpenClaw skills, streamlining the execution of the ArticleFactChecker by handling input validation, format detection, configuration, and structured report generation.
Enhanced ArXiv Search for Attribution: The 'arxiv_search' tool within Dingo was improved with a 'fetch_affiliations' option, allowing the agent to scrape authoritative author and institution data directly from arXiv HTML pages, significantly boosting the accuracy of institutional and attribution claim verification.
Agent Logic Update: The ArticleFactChecker agent's prompt templates were updated to prioritize and effectively utilize the newly available 'affiliations_text' from 'arxiv_search' for more reliable claim verification.
Comprehensive Documentation: Updated the main Dingo skill documentation ('clawhub/SKILL.md') and added advanced configuration guides ('references/advanced-config.md' for both Claude and OpenClaw skills) to support the new fact-checking capabilities.
Metadata and Dependencies: The skill metadata ('clawhub/_meta.json') was updated to reflect necessary 'python3' binary and 'dingo-python[agent]' package requirements, along with the optional 'TAVILY_API_KEY' for web search.
Robust Testing: New unit tests were added for the 'fact_check.py' script to ensure the reliability of its core functionalities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new fact-checking skill for Claude and OpenClaw, which is a great addition. The implementation includes a new Python script with good security practices, comprehensive documentation, and tests. However, there is a significant issue with file duplication across the .claude/skills/dingo-verify/, skills/dingo-verify/, and clawhub/ directories. This will create a maintenance burden, as changes will need to be synchronized across all copies. It's highly recommended to refactor this to avoid duplication, perhaps by using symlinks or a build process that copies files from a single source of truth. I've also included a few suggestions for improving error handling, performance, and the robustness of a code example in the documentation.

gemini-code-assist · 2026-03-24T03:38:06Z

.claude/skills/dingo-verify/scripts/fact_check.py

+    except Exception:
+        # Do not echo exception message to avoid leaking SDK internals or config values
+        error_exit("Execution failed. Check Dingo SDK logs in the output directory.")
+        return 1  # unreachable


This broad except Exception: can hide bugs and make debugging difficult. While the intention to avoid leaking information is good, it would be better to catch more specific exceptions that you expect from the Dingo SDK, and then have this broad except as a final fallback. This would provide better error handling without sacrificing security.

For example:

except DingoExecutionError as e: # Assuming a specific exception from the SDK error_exit(f"Dingo execution failed: {e}", "Check your Dingo configuration and input file.") return 1 except Exception: # Do not echo exception message to avoid leaking SDK internals or config values error_exit("An unexpected error occurred. Check Dingo SDK logs in the output directory.") return 1 # unreachable

gemini-code-assist · 2026-03-24T03:38:06Z

clawhub/SKILL.md

+# IMPORTANT: wrap article into JSONL — plaintext is read line-by-line otherwise
+article_text = open("article.md", encoding="utf-8").read()
+tmp = tempfile.NamedTemporaryFile(mode="w", suffix=".jsonl", delete=False, encoding="utf-8")
+tmp.write(json.dumps({"content": article_text}, ensure_ascii=False) + "\n")
+tmp.close()
+
+config = {
+    "input_path": tmp.name,
+    "dataset": {"source": "local", "format": "jsonl"},
+    "executor": {"max_workers": 1},
+    "evaluator": [{
+        "fields": {"content": "content"},
+        "evals": [{
+            "name": "ArticleFactChecker",
+            "config": {
+                "key": os.environ["OPENAI_API_KEY"],
+                "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"),
+                "api_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"),
+                "parameters": {
+                    "temperature": 0,
+                    "agent_config": {
+                        "max_concurrent_claims": 5,
+                        "max_iterations": 50,
+                        "tools": {
+                            "claims_extractor": {
+                                "api_key": os.environ["OPENAI_API_KEY"],
+                                "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"),
+                                "base_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"),
+                                "max_claims": 50
+                            },
+                            "arxiv_search": {"max_results": 5},
+                            **({"tavily_search": {"api_key": os.environ["TAVILY_API_KEY"]}}
+                               if os.getenv("TAVILY_API_KEY") else {})
+                        }
+                    }
+                }
+            }
+        }]
+    }]
+}
+
+if __name__ == "__main__":
+    result = Executor.exec_map["local"](InputArgs(**config)).execute()
+    print(f"Score: {result.score:.1f}%  |  Output: {result.output_path}")
+    os.unlink(tmp.name)
+```


The code example for manual SDK usage has a potential resource leak. The temporary file created with tempfile.NamedTemporaryFile(delete=False) is not guaranteed to be cleaned up if an exception occurs before os.unlink(tmp.name) is called. Also, the file is created at the module level, so if this script is imported, it will create a temp file that is never deleted.

It's better to create and clean up the temporary file within a try...finally block inside the if __name__ == "__main__" guard to ensure it's always deleted.

if __name__ == "__main__": tmp_path = None try: # IMPORTANT: wrap article into JSONL — plaintext is read line-by-line otherwise article_text = open("article.md", encoding="utf-8").read() with tempfile.NamedTemporaryFile(mode="w", suffix=".jsonl", delete=False, encoding="utf-8") as tmp_file: tmp_file.write(json.dumps({"content": article_text}, ensure_ascii=False) + "\n") tmp_path = tmp_file.name config = { "input_path": tmp_path, "dataset": {"source": "local", "format": "jsonl"}, "executor": {"max_workers": 1}, "evaluator": [{ "fields": {"content": "content"}, "evals": [{ "name": "ArticleFactChecker", "config": { "key": os.environ["OPENAI_API_KEY"], "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"), "api_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"), "parameters": { "temperature": 0, "agent_config": { "max_concurrent_claims": 5, "max_iterations": 50, "tools": { "claims_extractor": { "api_key": os.environ["OPENAI_API_KEY"], "model": os.getenv("OPENAI_MODEL", "gpt-5.4-mini"), "base_url": os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1"), "max_claims": 50 }, "arxiv_search": {"max_results": 5}, **({"tavily_search": {"api_key": os.environ["TAVILY_API_KEY"]}} if os.getenv("TAVILY_API_KEY") else {}) } } } } }] }] } result = Executor.exec_map["local"](InputArgs(**config)).execute() print(f"Score: {result.score:.1f}% | Output: {result.output_path}") finally: if tmp_path and os.path.exists(tmp_path): os.unlink(tmp_path)

gemini-code-assist · 2026-03-24T03:38:06Z

dingo/model/llm/agent/tools/arxiv_search.py

+            if cls.config.fetch_affiliations and results:
+                for i, entry_id in enumerate(entry_ids):
+                    html_url = entry_id.replace('/abs/', '/html/')
+                    affiliations_text = cls._fetch_html_affiliations(
+                        html_url, timeout=cls.config.timeout
+                    )
+                    if affiliations_text is not None:
+                        results[i]['affiliations_text'] = affiliations_text


Fetching affiliations involves making a network request for each paper. This is done sequentially in a loop, which can be slow if there are many results. To improve performance, consider fetching these pages in parallel using a ThreadPoolExecutor.

gemini-code-assist · 2026-03-24T03:38:06Z

dingo/model/llm/agent/tools/arxiv_search.py

+        except Exception as exc:
+            log.debug("Failed to fetch HTML affiliations from %s: %s", html_url, exc)
+            return None


Catching a broad Exception can hide underlying issues and make debugging harder. It's better to catch more specific exceptions from the requests library, such as requests.exceptions.RequestException, to handle network-related errors more gracefully and avoid catching unrelated exceptions.

Suggested change

except Exception as exc:

log.debug("Failed to fetch HTML affiliations from %s: %s", html_url, exc)

return None

except _requests.exceptions.RequestException as exc:

log.debug("Failed to fetch HTML affiliations from %s: %s", html_url, exc)

return None

seancoding-day and others added 10 commits March 24, 2026 10:50

feat(skill): add dingo-verify fact_check.py script with tests

2fb55b2

SDK wrapper for ArticleFactChecker with format detection, plaintext JSONL wrapping, and structured JSON output.

refactor(skill): address code quality review findings

c0042e0

- Remove unused List import, use NoReturn for error_exit - Initialize temp_path before try blocks in tests to prevent NameError

feat(skill): add dingo-verify SKILL.md with Claude instructions

abbc2e1

Defines skill frontmatter, prerequisites, usage flow, and result presentation guidelines.

docs(skill): add dingo-verify advanced configuration reference

227961f

Covers model selection, claim types, tuning parameters, environment variables, and troubleshooting.

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

e06084 merged commit 08a5bfc into MigoXLab:dev Mar 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add article fact checking skills for Claude and OpenClaw#371

add article fact checking skills for Claude and OpenClaw#371
e06084 merged 10 commits intoMigoXLab:devfrom
seancoding-day:feature/add-article-skills

seancoding-day commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Uh oh!

gemini-code-assist bot Mar 24, 2026

Uh oh!

gemini-code-assist bot Mar 24, 2026

Uh oh!

gemini-code-assist bot Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seancoding-day commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants