Skip to content

Mongo-db-hackathon/safeguard-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

46 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ Safeguard AI

AI-Powered Video Intelligence for Law Enforcement & Training

Ask natural-language questions about incident videos and get instant, timestamped, explainable answers.

Python FastAPI React TypeScript MongoDB


๐ŸŒŸ Overview

Safeguard AI is an intelligent video analysis system designed specifically for law enforcement, investigators, and training professionals. It transforms hours of body camera footage, traffic stops, and training sessions into searchable, queryable data that can be accessed through simple natural language questions.

๐ŸŽฏ The Problem We Solve

  • Manual Review is Time-Consuming: Officers and investigators spend countless hours reviewing video footage frame by frame
  • Critical Details Are Missed: Important moments can be overlooked in lengthy recordings
  • Training Is Inefficient: Finding specific examples in training videos is tedious and imprecise
  • Documentation Is Incomplete: Incident reports often lack precise timestamps and visual context

โœจ Our Solution

Safeguard AI uses advanced AI and machine learning to:

  • ๐Ÿ” Semantic Search: Ask questions like "When did the suspect reach for their waistband?" and get exact timestamps
  • ๐ŸŽฅ Frame Analysis: Every frame is analyzed using vision AI to understand what's happening visually
  • ๐Ÿ“ Transcript Intelligence: Audio transcripts are processed and linked to visual data for comprehensive understanding
  • ๐Ÿง  Contextual Reasoning: Our AI agent understands context and provides explainable answers with evidence
  • โšก Real-Time Response: Get answers in seconds, not hours

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Frontend (React + TypeScript)            โ”‚
โ”‚                     Natural Language Query Interface             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ”‚ REST API
                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Backend (FastAPI)                             โ”‚
โ”‚                  Request Handling & Routing                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚                             โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   LLM Agent System         โ”‚   โ”‚   Video Processing Pipeline   โ”‚
โ”‚                            โ”‚   โ”‚                               โ”‚
โ”‚  โ€ข Query Router            โ”‚   โ”‚  โ€ข Frame Extraction           โ”‚
โ”‚  โ€ข Reasoner Agent          โ”‚   โ”‚  โ€ข Vision AI Analysis         โ”‚
โ”‚  โ€ข LangChain Integration   โ”‚   โ”‚  โ€ข Audio Transcription        โ”‚
โ”‚  โ€ข Gemini & OpenAI         โ”‚   โ”‚  โ€ข Embedding Generation       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚                             โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    MongoDB Atlas                                 โ”‚
โ”‚                                                                  โ”‚
โ”‚  Collections:                                                    โ”‚
โ”‚  โ€ข video_intelligence_metadata  (Video info & paths)            โ”‚
โ”‚  โ€ข frame_intelligence_metadata  (Frame analysis + embeddings)   โ”‚
โ”‚  โ€ข video_intelligence_transcripts (Audio transcripts)           โ”‚
โ”‚                                                                  โ”‚
โ”‚  Vector Search Indexes:                                          โ”‚
โ”‚  โ€ข Frame embeddings (visual similarity)                          โ”‚
โ”‚  โ€ข Transcript embeddings (semantic search)                       โ”‚
โ”‚  โ€ข Hybrid search (scalar + vector)                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Key Features

1. Intelligent Video Processing

  • Automatically extracts frames at configurable intervals (default: 2 seconds)
  • Generates detailed descriptions using vision AI (OpenAI GPT-4 Vision)
  • Creates vector embeddings for semantic similarity search
  • Processes audio and generates searchable transcripts

2. Advanced Search Capabilities

  • Vector Search: Find visually similar moments across videos
  • Full-Text Search: Search through transcripts and frame descriptions
  • Hybrid Search: Combine vector and text search with adjustable weights
  • Multi-Modal: Search across both visual and audio content simultaneously

3. AI-Powered Query Understanding

  • Natural language query interpretation
  • Context-aware reasoning using LangChain agents
  • Query routing to optimize search strategy
  • Explainable results with confidence scores

4. User-Friendly Interface

  • Clean, intuitive chat interface
  • Real-time video playback at relevant timestamps
  • Visual feedback and loading states
  • Responsive design for desktop and mobile

๐Ÿ“ฆ Tech Stack

Backend

  • FastAPI - Modern, high-performance web framework
  • Python 3.10+ - Core language
  • LangChain - AI agent orchestration
  • LangGraph - Agent workflow management
  • OpenAI GPT-4 - Vision analysis and reasoning
  • Google Gemini - Alternative LLM for reasoning
  • Voyage AI - Vector embeddings
  • MongoDB - Vector database and data storage
  • OpenCV - Video frame extraction
  • yt-dlp - Video processing utilities

Frontend

  • React 19 - UI framework
  • TypeScript - Type-safe development
  • Vite - Build tool and dev server
  • CSS3 - Styling

AI/ML Stack

  • LangChain - Agent framework
  • Vector Embeddings - Semantic search
  • Vision AI - Frame understanding
  • Speech-to-Text - Audio transcription
  • Hybrid Search - Multi-modal retrieval

๐Ÿ› ๏ธ Installation & Setup

Prerequisites

  • Python 3.10+
  • Node.js 18+ and npm
  • MongoDB Atlas account (or local MongoDB instance)
  • API Keys:
    • OpenAI API key
    • Voyage AI API key
    • Google Gemini API key (optional)

1. Clone the Repository

git clone https://github.com/yourusername/safeguard-ai.git
cd safeguard-ai

2. Backend Setup

# Create and activate virtual environment
python -m venv env
source env/bin/activate  # On macOS/Linux
# env\Scripts\activate  # On Windows

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# VOYAGE_API_KEY=your_voyage_key
# GEMINI_API_KEY=your_gemini_key
# MONGODB_URI=your_mongodb_connection_string

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Configure API endpoint (if needed)
# Edit src/App.tsx to point to your backend URL

4. MongoDB Setup

  1. Create a MongoDB Atlas cluster (or use local MongoDB)
  2. Create a database named video_intelligence
  3. Create collections:
    • video_intelligence_metadata
    • frame_intelligence_metadata
    • video_intelligence_transcripts
  4. Create vector search indexes (see Configuration section)

โš™๏ธ Configuration

MongoDB Vector Search Indexes

You need to create vector search indexes for semantic search to work:

Frame Embeddings Index

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1024,
        "similarity": "cosine"
      }
    }
  }
}

Transcript Embeddings Index

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1024,
        "similarity": "cosine"
      }
    }
  }
}

Environment Variables

Create a .env file in the root directory:

# API Keys
OPENAI_API_KEY=sk-...
VOYAGE_API_KEY=...
GEMINI_API_KEY=...

# MongoDB
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/
MONGODB_DB_NAME=video_intelligence

# Video Storage
VIDEO_FOLDER=/path/to/your/videos
FRAMES_FOLDER=/path/to/frame/output

# API Configuration
BACKEND_PORT=8000
FRONTEND_PORT=5173

๐ŸŽฎ Usage

Starting the Application

Terminal 1: Backend Server

# From project root
source env/bin/activate
cd backend
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Terminal 2: Frontend Dev Server

# From project root
cd frontend
npm run dev

The application will be available at:

Processing Videos

  1. Add videos to your videos/ folder

  2. Process a video:

from llm.video_to_image import extract_frames
from llm.gen_frame_desc import generate_frame_descriptions
from llm.process_frames import process_and_store_frames
from transcripts.audio import process_video_transcript

# Extract frames
extract_frames("videos/incident_video.mp4", interval_seconds=2)

# Generate AI descriptions
generate_frame_descriptions()

# Store in MongoDB with embeddings
process_and_store_frames()

# Process audio transcript
process_video_transcript("videos/incident_video.mp4")
  1. Query the video:
    • Open the frontend at http://localhost:5173
    • Type your question: "When did the officer issue a warning?"
    • Get instant results with video timestamps

๐Ÿ“š Project Structure

safeguard-ai/
โ”œโ”€โ”€ backend/              # FastAPI backend server
โ”‚   โ”œโ”€โ”€ main.py          # API endpoints and CORS setup
โ”‚   โ””โ”€โ”€ requirements.txt # Backend dependencies
โ”‚
โ”œโ”€โ”€ frontend/            # React + TypeScript UI
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ App.tsx     # Main application component
โ”‚   โ”‚   โ”œโ”€โ”€ App.css     # Styles
โ”‚   โ”‚   โ””โ”€โ”€ main.tsx    # Entry point
โ”‚   โ”œโ”€โ”€ public/         # Static assets
โ”‚   โ””โ”€โ”€ package.json    # Frontend dependencies
โ”‚
โ”œโ”€โ”€ llm/                # AI/ML core logic
โ”‚   โ”œโ”€โ”€ inference.py    # Semantic search functions
โ”‚   โ”œโ”€โ”€ mongo_client_1.py # MongoDB connection & collections
โ”‚   โ”œโ”€โ”€ video_to_image.py # Frame extraction
โ”‚   โ”œโ”€โ”€ gen_frame_desc.py # Vision AI descriptions
โ”‚   โ”œโ”€โ”€ process_frames.py # Embedding generation & storage
โ”‚   โ”œโ”€โ”€ get_voyage_embed.py # Voyage AI embeddings
โ”‚   โ”œโ”€โ”€ retreival_2.py  # Hybrid search logic
โ”‚   โ”œโ”€โ”€ agent.py        # LangChain agent setup
โ”‚   โ””โ”€โ”€ query_model/    # Query processing
โ”‚       โ”œโ”€โ”€ router.py   # Query routing logic
โ”‚       โ””โ”€โ”€ reasoner.py # AI reasoning agent
โ”‚
โ”œโ”€โ”€ transcripts/        # Audio processing
โ”‚   โ”œโ”€โ”€ video2audio.py # Audio extraction
โ”‚   โ””โ”€โ”€ audio.py       # Transcription & storage
โ”‚
โ”œโ”€โ”€ videos/            # Video storage
โ”œโ”€โ”€ frames/            # Extracted frames
โ””โ”€โ”€ requirements.txt   # Python dependencies

๐Ÿ” How It Works

1. Video Ingestion

When a video is uploaded:

  1. Video is stored with metadata (name, path, duration)
  2. Frames are extracted every N seconds
  3. Audio is extracted and transcribed
  4. All data is linked via video_id

2. Frame Analysis

For each frame:

  1. Image is encoded to base64
  2. Sent to OpenAI GPT-4 Vision for description
  3. Description is converted to vector embedding (Voyage AI)
  4. Stored in MongoDB with timestamp and video reference

3. Transcript Processing

For audio:

  1. Audio is extracted from video
  2. Transcribed using speech-to-text
  3. Split into chunks with timestamps
  4. Each chunk gets vector embedding
  5. Stored in MongoDB with video reference

4. Query Processing

When a user asks a question:

  1. Query Router analyzes the question type
  2. Embedding is generated for the query
  3. Hybrid Search executes:
    • Vector search finds semantically similar frames/transcripts
    • Full-text search finds keyword matches
    • Results are merged and ranked
  4. Reasoner Agent synthesizes results:
    • Analyzes top matches
    • Generates natural language answer
    • Provides timestamps and confidence scores
  5. Response returns answer + video clips

5. Result Presentation

  • Chat interface displays AI-generated answer
  • Video player shows relevant clips at exact timestamps
  • Users can review evidence and context

๐ŸŽฏ Use Cases

Law Enforcement

  • ๐Ÿ“น Body Camera Review: Quickly find specific moments in hours of footage
  • ๐Ÿš” Traffic Stop Analysis: Search for compliance issues or noteworthy interactions
  • ๐Ÿ” Evidence Discovery: Locate specific actions, objects, or statements
  • ๐Ÿ“Š Pattern Detection: Identify recurring situations across multiple incidents

Training

  • ๐Ÿ‘ฎ Scenario Review: Find specific training scenarios for debriefing
  • ๐Ÿ“š Best Practices: Search for examples of proper procedures
  • โš ๏ธ Learning Moments: Identify situations that need additional training
  • ๐ŸŽ“ Curriculum Development: Build training libraries with searchable content

Investigations

  • ๐Ÿ•ต๏ธ Case Building: Gather timestamped evidence efficiently
  • ๐Ÿ“ Report Generation: Create detailed reports with video references
  • ๐Ÿ”— Cross-Reference: Link related incidents across multiple videos
  • โฑ๏ธ Timeline Construction: Build accurate timelines of events

๐Ÿ” Privacy & Security

  • Data Encryption: All data stored in MongoDB Atlas with encryption at rest
  • API Security: API endpoints should be secured with authentication (implement JWT)
  • Video Storage: Videos stored locally; can be configured for secure cloud storage
  • Audit Trail: All queries and access can be logged for compliance
  • GDPR Compliance: Personal data handling follows best practices
  • Role-Based Access: (Recommended implementation) Different permission levels

โš ๏ธ Important: This is a prototype. For production deployment, implement proper authentication, encryption, and access controls.


๐Ÿšง Roadmap

  • Authentication & Authorization - JWT-based user management
  • Multi-Tenant Support - Department/organization isolation
  • Advanced Analytics - Dashboard with insights and trends
  • Mobile App - iOS/Android apps for field use
  • Real-Time Processing - Live video analysis from body cameras
  • Export Features - Generate reports, clips, and documentation
  • Integration APIs - Connect with RMS, CAD systems
  • Advanced AI Models - Object detection, face recognition, action recognition
  • Collaborative Features - Annotations, comments, sharing
  • Performance Optimization - Caching, CDN, edge processing

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • OpenAI - GPT-4 Vision for frame analysis
  • Voyage AI - High-quality embeddings
  • MongoDB - Vector search capabilities
  • LangChain - Agent framework
  • FastAPI - Excellent web framework
  • React - Modern UI development

๐Ÿ“ง Contact

For questions, support, or collaboration:


Built with โค๏ธ for safer communities

โญ Star this repo if you find it useful!

About

AI-Powered Video Intelligence for Law Enforcement & Training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •