Factube

Factube is an automated fact-checking platform that leverages generative AI to analyze YouTube video transcripts and identify claims that require verification. The system employs web search integration and multi-model AI redundancy to ensure accurate, current fact-checking against real-time information.

Overview

Factube addresses the challenge of identifying potentially false or misleading claims in video content by automating the fact-checking process. Rather than relying solely on LLM training data—which becomes outdated—the platform performs real-time web searches to verify claims against current information before rendering verdicts.

The architecture separates concerns into a React-based frontend for user interaction and a Node.js/Hono backend for transcript extraction, web search integration, and AI-powered fact-checking.

Architecture

Technology Stack

Frontend:

React 19 with TypeScript
Vite for build tooling and development server
TailwindCSS 4 for styling
React Router for navigation
Motion for animations
Hugeicons for icon components
YouTubei.js for YouTube metadata extraction

Backend:

Node.js with TypeScript
Hono framework for HTTP routing and middleware
Generative AI models: Google Gemini, Groq (Mixtral), and OpenRouter (Llama)
Supabase PostgreSQL for conclusion caching
YouTube Transcript Plus for transcript extraction
DuckDuckGo API for web search (no key required)

Generative AI Architecture

Factube employs a multi-model approach to ensure reliability and accuracy:

Primary Models

Google Gemini (Primary) - Flash model with thinking capabilities and native web search support
Groq Mixtral (Fallback) - Low-latency open-weight model
OpenRouter Llama 70B (Final fallback) - Robust open-source model

AI-Powered Workflow

The fact-checking pipeline operates as follows:

Transcript Extraction: System retrieves YouTube video transcript using YouTubei.js
Claim Extraction: Identifies hard factual claims (statistics, dates, attributions, scientific claims)
Web Search Context: Performs targeted searches for major claims using DuckDuckGo
Fact-Checking with AI: Sends transcript + search results to generative AI with explicit instructions to prioritize current search data over training data
Verdict Assignment: AI assigns verdicts (true/false/misleading/unverifiable) with confidence scores
Persistence: Results are cached in Supabase to avoid re-processing

Prompt Engineering

The system uses carefully crafted system and user prompts that:

Emphasize prioritizing web search results over model training data
Define clear verdict categories
Request structured JSON output
Include confidence scoring
Extract and classify ignored content (humor, opinions, anecdotes)

Project Structure

factube/
├── client/                          # React frontend
│   ├── src/
│   │   ├── components/
│   │   │   ├── Navbar.tsx
│   │   │   ├── VideoDetails.tsx    # Video metadata display
│   │   │   ├── TranscriptContent.tsx # Fact-checking results display
│   │   │   ├── StatusScreen.tsx    # Loading/error states
│   │   │   └── Footer.tsx
│   │   ├── pages/
│   │   │   ├── HomePage.tsx
│   │   │   ├── About.tsx
│   │   │   └── NotFound.tsx
│   │   ├── lib/
│   │   │   ├── fetchVideoDetails.ts
│   │   │   └── youtubeParser.ts
│   │   ├── App.tsx
│   │   └── main.tsx
│   ├── package.json
│   ├── vite.config.ts
│   └── tsconfig.json
│
├── server/                         # Node.js/Hono backend
│   ├── src/
│   │   ├── lib/
│   │   │   ├── geminiClient.ts     # Gemini AI integration
│   │   │   ├── groqClient.ts       # Groq AI integration
│   │   │   ├── openRouterClient.ts # OpenRouter AI integration
│   │   │   ├── supabaseClient.ts   # Database connection
│   │   │   ├── webSearch.ts        # Web search utility
│   │   │   └── prompt.ts           # AI system/user prompts
│   │   ├── services/
│   │   │   ├── factCheck.ts        # Orchestrates AI models with fallbacks
│   │   │   ├── getTranscript.ts    # YouTube transcript extraction
│   │   │   ├── getVideoDetails.ts  # Video metadata retrieval
│   │   │   ├── ytParser.ts         # URL parsing utilities
│   │   │   └── databaseAction.ts   # Supabase operations
│   │   └── index.ts                # Server entry point
│   ├── package.json
│   └── tsconfig.json
│
└── README.md

Setup and Installation

Prerequisites

Node.js 18+ and pnpm
YouTube video URLs for testing
API keys for:
- Google Gemini API
- Groq API
- OpenRouter API
- Supabase PostgreSQL instance

Environment Configuration

Create a .env file in the server/ directory:

# AI Models
GEMINI_API_KEY=your_gemini_api_key
GROQ_API_KEY=your_groq_api_key
OPENROUTER_API_KEY=your_openrouter_api_key

# Database
SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key

# Frontend communication
CLIENT_URL=http://localhost:5173

Installation

# Install client dependencies
cd client
pnpm install

# Install server dependencies
cd ../server
pnpm install

Development

In separate terminals:

# Terminal 1: Start backend (port 3000)
cd server
pnpm run dev

# Terminal 2: Start frontend (port 5173)
cd client
pnpm run dev

Navigate to http://localhost:5173 in your browser.

Build for Production

# Build client
cd client
pnpm run build

# Build server
cd server
pnpm run build

Start production server with pnpm start.

API Endpoints

GET /api/video_details

Retrieves YouTube video metadata.

Query Parameters:

q (required): YouTube video ID

Response:

{
  "basic_info": {
    "title": "string",
    "author": "string",
    "thumbnail": [{"url": "string"}],
    "like_count": "number"
  },
  "secondary_info": {
    "owner": {
      "subscriber_count": {"text": "string"}
    }
  }
}

POST /url

Triggers fact-checking for a YouTube video. Returns cached results if available, otherwise extracts transcript and performs AI fact-checking.

Query Parameters:

q (required): YouTube video ID

Response:

{
  "conclusion": {
    "overall_verdict": "true|false|misleading|unverifiable",
    "summary": "string",
    "claims": [
      {
        "id": "number",
        "timestamp_seconds": "number",
        "claim": "string",
        "verdict": "true|false|misleading|unverifiable",
        "explanation": "string",
        "source": "string|null",
        "confidence": "number (0.0-1.0)"
      }
    ],
    "ignored": [
      {
        "timestamp_seconds": "number",
        "reason": "humor|opinion|anecdote|filler",
        "text": "string"
      }
    ]
  }
}

Fact-Checking Logic

Claim Classification

The system extracts and classifies content as:

Claims: Verifiable factual statements requiring fact-checking
Ignored: Opinions, humor, anecdotes, filler content (not fact-checked)

Verdict Definitions

True: Factually accurate according to current web search results
False: Factually incorrect according to current web search results
Misleading: Technically true but presented deceptively or lacks critical context
Unverifiable: Cannot be confirmed despite web search attempts

Confidence Scoring

Confidence (0.0 to 1.0) reflects the certainty of the verdict based on search result quality and relevance. Low confidence indicates ambiguous findings or limited search results.

Error Handling and Resilience

The system implements graceful degradation:

Model Fallback Chain: If Gemini fails (quota exceeded, rate limit), system automatically attempts Groq, then OpenRouter
Search Failure Handling: Web search failures are logged but don't block fact-checking; AI proceeds with training data if search unavailable
Database Caching: Results are cached to reduce redundant API calls and provide faster repeated lookups
Rate Limiting: API implements IP-based rate limiting (10 requests per minute) to prevent abuse

Performance Considerations

Transcript extraction typically takes 2-5 seconds depending on video length
AI fact-checking with web search context averages 5-15 seconds
Database lookups are sub-100ms for cached results
Web search operations are parallelized for major claims to minimize latency

Development Guidelines

Code Style

TypeScript for type safety across the stack
ESLint configuration enforces consistent formatting
Functional components in React with hooks
Error handling with try-catch blocks and explicit error messaging

Git Workflow

Feature branches off main
Descriptive commit messages
Pull requests require review before merge

Testing Recommendations

Unit tests for prompt engineering logic
Integration tests for API endpoints
End-to-end tests for fact-checking workflows
Load testing for concurrent requests

Known Limitations

YouTube Transcripts: Only works with videos that have transcripts available; some videos lack manual or auto-generated transcripts
Web Search Context: DuckDuckGo API has rate limits; high-volume deployments should consider premium search APIs
AI Model Accuracy: Fact-checking quality depends on model capabilities; edge cases may require manual review
Language Support: Currently optimized for English-language content
Real-time Updates: Supabase caching means updates require manual invalidation or cache expiry logic

Future Enhancements

Multi-language support with automatic translation
Advanced caching with TTL-based invalidation
Custom search providers (Google Search, Bing) with fallback chains
Claim ranking by relevance and impact
Citation extraction from search results
Fact-checker dashboard for result analytics
API authentication and usage tracking

Contributing

Contributions are welcome. Ensure all changes:

Pass TypeScript compilation
Follow ESLint rules
Include error handling
Are tested manually with various YouTube URLs
Update documentation if API contracts change

License

MIT LICENSE

Support

For issues, feature requests, or questions, please open an issue on the project repository.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
client		client
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Factube

Overview

Architecture

Technology Stack

Generative AI Architecture

Primary Models

AI-Powered Workflow

Prompt Engineering

Project Structure

Setup and Installation

Prerequisites

Environment Configuration

Installation

Development

Build for Production

API Endpoints

GET /api/video_details

POST /url

Fact-Checking Logic

Claim Classification

Verdict Definitions

Confidence Scoring

Error Handling and Resilience

Performance Considerations

Development Guidelines

Code Style

Git Workflow

Testing Recommendations

Known Limitations

Future Enhancements

Contributing

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages