Truth Shield is a Python fact-checking pipeline that extracts verifiable claims from text, searches public sources, scores evidence relevance, and returns a verdict with confidence and explainability data.
It includes:
- A Flask backend API
- A static web dashboard in
front-end/ - A CLI entry point for JSON-based runs
- Claim extraction from input text using Groq LLM
- Multi-source search (DuckDuckGo)
- Page fetching with
httpxand Playwright fallback for dynamic pages - Content extraction and metadata parsing
- Semantic evidence matching with sentence embeddings
- Final verdict generation with confidence, tags, and top sources
- Python 3.11+
- Flask, Flask-CORS
- Groq API
ddgs(DuckDuckGo search)httpx, Playwrighttrafilatura, BeautifulSoup4sentence-transformers, scikit-learn, numpy- Pydantic
.
|-- app.py # Flask app and API endpoints
|-- main.py # CLI entry point
|-- pipeline.py # Main async pipeline orchestration
|-- models.py # Pydantic input/output models
|-- config.py # Central configuration
|-- core/ # Verdict engine and credibility logic
|-- search/ # Search aggregation and providers
|-- fetcher/ # HTTP and Playwright fetchers
|-- extractor/ # Text and metadata extraction
|-- scoring/ # Embeddings and evidence matching
|-- utils/ # URL/language/paywall utilities
|-- front-end/ # Static UI assets
`-- test_*.py # Integration and backend tests
- Python 3.11 or newer
- A valid Groq API key (
GROQ_API_KEY) - Network access (Groq + web search)
Use the startup script to automate setup and launch:
python start.pyWhat the script does:
- asks for
GROQ_API_KEYat the beginning (if missing in.env) - saves or updates
GROQ_API_KEYin.env - creates
.venvif it does not exist - upgrades
pip - checks dependencies from
requirements.txtand installs only missing/incompatible ones - checks Playwright Chromium and installs it only if missing
- starts
app.py
First run can take a few minutes because package and browser installation is performed automatically.
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1Windows (cmd):
python -m venv .venv
.venv\Scripts\activate.batmacOS/Linux:
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtpython -m playwright installCreate a .env file in the project root:
GROQ_API_KEY=gsk_your_key_herepython app.pyDefault server URL:
http://127.0.0.1:5001
Open:
http://127.0.0.1:5001
The backend serves static assets from front-end/.
Run pipeline with JSON input file:
python main.py --input input_example.json --output results.jsonPrint output to stdout:
python main.py --input input_example.jsonQuick script:
python run_test.pyThis reads input_example.json and writes raw_output.json.
Serves front-end/index.html.
Direct verdict from a claim and source results.
Example payload:
{
"claim": "Sample claim",
"results": [
{
"url": "https://example.com",
"text": "Article text",
"metadata": { "title": "Example" }
}
]
}Preprocessing endpoint used by the UI.
Example payload:
{
"mode": "testo",
"data": "Text to analyze"
}or
{
"mode": "url",
"data": "https://example.com/news"
}Full flow:
- Claim extraction with Groq
- Search and fetch pipeline
- Evidence scoring
- Final verdict mapping for frontend
Example payload:
{
"mode": "testo",
"data": "Text to fact-check"
}Main schema definitions are in models.py.
Input root model: PipelineInput
metadataoriginal_sourceanalysis.claims_to_verify[]
Output root model: PipelineOutput
timestamptotal_claimstotal_sources_foundresults[]with per-claim sources and scoring data
Sample files:
input_example.jsonresults.jsonoutput_per_ui.json
Run all unittest tests:
python -m unittest discover -vRun specific tests:
python -m unittest test_backend_end_to_end.py -v
python -m unittest test_backend_live_external.py -v
python -m unittest core.test_fase4 -vNotes:
test_backend_live_external.pyis optional and skipped unless enabled.- To enable live test:
set RUN_LIVE_E2E=1
python -m unittest test_backend_live_external.py -vOn PowerShell:
$env:RUN_LIVE_E2E="1"
python -m unittest test_backend_live_external.py -vGlobal defaults are in config.py, including:
- Search limits and retry policy
- HTTP timeout and concurrency
- Playwright timeout and scroll behavior
- Chunking and embedding thresholds
- URL tracking parameter cleanup
- Paywall keyword heuristics
Environment variable:
GROQ_API_KEY(required)
High-level flow:
- User submits text/URL
- Claims and search queries are generated
- Search results are aggregated and deduplicated
- Pages are fetched (
httpx, fallback Playwright) - Text and metadata are extracted
- Relevant chunks are scored by semantic similarity
- Core engine builds final verdict and explainability output
Main orchestrator:
pipeline.py
Verdict logic:
core/engine.pycore/motore_verdetto.pycore/classificatore_evidenze.py
- Requires external services (Groq API and web search availability)
- Dynamic sites can increase latency due to Playwright fallback
- Language and paywall detection are heuristic-based
- Live tests depend on network and valid credentials
- Create a branch
- Keep changes focused
- Run relevant tests
- Open a pull request with a clear description
No license file is currently defined in this repository.
Add a LICENSE file before public distribution.