Truth Shield

Truth Shield is a Python fact-checking pipeline that extracts verifiable claims from text, searches public sources, scores evidence relevance, and returns a verdict with confidence and explainability data.

It includes:

A Flask backend API
A static web dashboard in front-end/
A CLI entry point for JSON-based runs

Features

Claim extraction from input text using Groq LLM
Multi-source search (DuckDuckGo)
Page fetching with httpx and Playwright fallback for dynamic pages
Content extraction and metadata parsing
Semantic evidence matching with sentence embeddings
Final verdict generation with confidence, tags, and top sources

Tech Stack

Python 3.11+
Flask, Flask-CORS
Groq API
ddgs (DuckDuckGo search)
httpx, Playwright
trafilatura, BeautifulSoup4
sentence-transformers, scikit-learn, numpy
Pydantic

Project Structure

.
|-- app.py                     # Flask app and API endpoints
|-- main.py                    # CLI entry point
|-- pipeline.py                # Main async pipeline orchestration
|-- models.py                  # Pydantic input/output models
|-- config.py                  # Central configuration
|-- core/                      # Verdict engine and credibility logic
|-- search/                    # Search aggregation and providers
|-- fetcher/                   # HTTP and Playwright fetchers
|-- extractor/                 # Text and metadata extraction
|-- scoring/                   # Embeddings and evidence matching
|-- utils/                     # URL/language/paywall utilities
|-- front-end/                 # Static UI assets
`-- test_*.py                  # Integration and backend tests

Prerequisites

Python 3.11 or newer
A valid Groq API key (GROQ_API_KEY)
Network access (Groq + web search)

Quick Start

One-Command Startup (Windows, macOS, Linux)

Use the startup script to automate setup and launch:

python start.py

What the script does:

asks for GROQ_API_KEY at the beginning (if missing in .env)
saves or updates GROQ_API_KEY in .env
creates .venv if it does not exist
upgrades pip
checks dependencies from requirements.txt and installs only missing/incompatible ones
checks Playwright Chromium and installs it only if missing
starts app.py

First run can take a few minutes because package and browser installation is performed automatically.

1) Create and activate a virtual environment

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Windows (cmd):

python -m venv .venv
.venv\Scripts\activate.bat

macOS/Linux:

python -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

3) Install Playwright browsers

python -m playwright install

4) Configure environment variables

Create a .env file in the project root:

GROQ_API_KEY=gsk_your_key_here

5) Run the web server

python app.py

Default server URL:

http://127.0.0.1:5001

Usage

Web Dashboard

Open:

http://127.0.0.1:5001

The backend serves static assets from front-end/.

CLI

Run pipeline with JSON input file:

python main.py --input input_example.json --output results.json

Print output to stdout:

python main.py --input input_example.json

Quick script:

python run_test.py

This reads input_example.json and writes raw_output.json.

API Endpoints

`GET /`

Serves front-end/index.html.

`POST /api/verify`

Direct verdict from a claim and source results.

Example payload:

{
  "claim": "Sample claim",
  "results": [
    {
      "url": "https://example.com",
      "text": "Article text",
      "metadata": { "title": "Example" }
    }
  ]
}

`POST /elabora`

Preprocessing endpoint used by the UI.

Example payload:

{
  "mode": "testo",
  "data": "Text to analyze"
}

or

{
  "mode": "url",
  "data": "https://example.com/news"
}

`POST /elabora_completo`

Full flow:

Claim extraction with Groq
Search and fetch pipeline
Evidence scoring
Final verdict mapping for frontend

Example payload:

{
  "mode": "testo",
  "data": "Text to fact-check"
}

Input and Output Models

Main schema definitions are in models.py.

Input root model: PipelineInput

metadata
original_source
analysis.claims_to_verify[]

Output root model: PipelineOutput

timestamp
total_claims
total_sources_found
results[] with per-claim sources and scoring data

Sample files:

input_example.json
results.json
output_per_ui.json

Testing

Run all unittest tests:

python -m unittest discover -v

Run specific tests:

python -m unittest test_backend_end_to_end.py -v
python -m unittest test_backend_live_external.py -v
python -m unittest core.test_fase4 -v

Notes:

test_backend_live_external.py is optional and skipped unless enabled.
To enable live test:

set RUN_LIVE_E2E=1
python -m unittest test_backend_live_external.py -v

On PowerShell:

$env:RUN_LIVE_E2E="1"
python -m unittest test_backend_live_external.py -v

Configuration

Global defaults are in config.py, including:

Search limits and retry policy
HTTP timeout and concurrency
Playwright timeout and scroll behavior
Chunking and embedding thresholds
URL tracking parameter cleanup
Paywall keyword heuristics

Environment variable:

GROQ_API_KEY (required)

Architecture Summary

High-level flow:

User submits text/URL
Claims and search queries are generated
Search results are aggregated and deduplicated
Pages are fetched (httpx, fallback Playwright)
Text and metadata are extracted
Relevant chunks are scored by semantic similarity
Core engine builds final verdict and explainability output

Main orchestrator:

pipeline.py

Verdict logic:

core/engine.py
core/motore_verdetto.py
core/classificatore_evidenze.py

Known Limitations

Requires external services (Groq API and web search availability)
Dynamic sites can increase latency due to Playwright fallback
Language and paywall detection are heuristic-based
Live tests depend on network and valid credentials

Contributing

Create a branch
Keep changes focused
Run relevant tests
Open a pull request with a clear description

License

No license file is currently defined in this repository. Add a LICENSE file before public distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Truth Shield

Features

Tech Stack

Project Structure

Prerequisites

Quick Start

One-Command Startup (Windows, macOS, Linux)

1) Create and activate a virtual environment

2) Install dependencies

3) Install Playwright browsers

4) Configure environment variables

5) Run the web server

Usage

Web Dashboard

CLI

API Endpoints

`GET /`

`POST /api/verify`

`POST /elabora`

`POST /elabora_completo`

Input and Output Models

Testing

Configuration

Architecture Summary

Known Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
core		core
extractor		extractor
fetcher		fetcher
front-end		front-end
scoring		scoring
search		search
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
check_stats.py		check_stats.py
config.py		config.py
debug_full.txt		debug_full.txt
debug_log.txt		debug_log.txt
doc.txt		doc.txt
input.json		input.json
input_example.json		input_example.json
main.py		main.py
models.py		models.py
out.txt		out.txt
output_per_ui.json		output_per_ui.json
pipeline.py		pipeline.py
py_debug.log		py_debug.log
raw_output.json		raw_output.json
requirements.txt		requirements.txt
results.json		results.json
risultati_test.json		risultati_test.json
run_test.py		run_test.py
start.py		start.py
test_backend_end_to_end.py		test_backend_end_to_end.py
test_backend_live_external.py		test_backend_live_external.py
test_evidence_scoring.py		test_evidence_scoring.py
test_integrazione_totale.py		test_integrazione_totale.py
test_output.json		test_output.json
test_output2.json		test_output2.json
test_playwright_err22.py		test_playwright_err22.py
test_playwright_stdout.py		test_playwright_stdout.py
test_req2.py		test_req2.py
test_stealth.py		test_stealth.py
test_veloce.py		test_veloce.py

Folders and files

Latest commit

History

Repository files navigation

Truth Shield

Features

Tech Stack

Project Structure

Prerequisites

Quick Start

One-Command Startup (Windows, macOS, Linux)

1) Create and activate a virtual environment

2) Install dependencies

3) Install Playwright browsers

4) Configure environment variables

5) Run the web server

Usage

Web Dashboard

CLI

API Endpoints

GET /

POST /api/verify

POST /elabora

POST /elabora_completo

Input and Output Models

Testing

Configuration

Architecture Summary

Known Limitations

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`POST /api/verify`

`POST /elabora`

`POST /elabora_completo`

Packages