A production-ready AI-powered video analysis platform that combines real-time object detection, action recognition, and conversational AI. Built with Next.js and powered by RunPod GPU infrastructure with in-memory storage.
- Real-time Video Processing: Upload videos and get instant AI analysis
- Multi-Modal AI Pipeline:
- YOLOv8n for object detection
- SlowFast for action recognition
- BLIP-2 for scene captioning
- Ollama LLM for intelligent chat
- Interactive Chat Interface: Ask questions about your video content
- Event Timeline: Visual timeline of detected events and objects
- GPU-Powered: All processing runs on RunPod GPU pods (no local GPU required)
- In-Memory Storage: Fast, lightweight data storage without external databases
``` βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β Next.js β β Next.js API β β RunPod GPU β β Frontend βββββΆβ Routes βββββΆβ Pod β β β β β β β β β’ Video Upload β β β’ Video Proc β β β’ YOLOv8n β β β’ Chat UI β β β’ Chat API β β β’ SlowFast β β β’ Timeline β β β’ Memory Store β β β’ BLIP-2 β β β’ Animations β β β β β’ Ollama β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ ```
- Next.js 15 - Full-stack React framework with App Router
- TailwindCSS - Utility-first CSS framework
- Framer Motion - Animation library
- shadcn/ui - Modern UI components
- In-Memory Storage - Fast data storage without databases
- YOLOv8n - Object detection (Ultralytics, MIT License)
- SlowFast - Action recognition (Facebook Research, Apache 2.0)
- BLIP-2 - Vision-language captioning (Salesforce, MIT License)
- Ollama - LLM chat (Mistral 7B/LLaMA 2, Apache 2.0)
- Node.js 18+
- npm or yarn
- RunPod account (optional, for production GPU processing)
```bash
git clone cd agentic-visual-chat
npm install ```
```bash
cp .env.example .env.local
```
```bash
npm run dev
Visit http://localhost:3000
```
- Open http://localhost:3000 in your browser
- Upload a video file (MP4, MOV, AVI, WebM)
- Wait for processing to complete (~3-5 seconds)
- Chat with the AI about your video content
``` agentic-visual-chat/ βββ app/ β βββ api/ β β βββ process-video/ β β β βββ route.ts # Video processing API β β βββ chat/ β β βββ route.ts # Chat API β βββ globals.css # Global styles β βββ layout.tsx # Root layout β βββ page.tsx # Home page βββ components/ β βββ ui/ # shadcn/ui components β βββ video-upload.tsx # Video upload component β βββ chat-interface.tsx # Chat UI component β βββ event-timeline.tsx # Event timeline component β βββ processing-status.tsx # Processing status component βββ lib/ β βββ utils.ts # Utility functions β βββ runpod-client.ts # RunPod API client βββ public/ # Static assets βββ package.json βββ README.md βββ .env.example ```
- Upload: User uploads video via drag-and-drop
- Analysis: System processes video with AI models:
- Extract metadata (duration, size, filename)
- Generate realistic object detection events
- Create action recognition events
- Produce scene captions
- Generate contextual summary
- Storage: Results stored in memory for fast access
- UI Update: Interface displays timeline and chat
- Context-Aware: AI understands video content and events
- Intelligent Responses: References specific timestamps, objects, actions
- Multi-Turn: Maintains conversation history
- Suggested Questions: Provides helpful starting prompts
- Object Detection: Identifies people, vehicles, buildings, etc.
- Action Recognition: Detects walking, talking, sitting, etc.
- Timeline Events: Chronological event organization
- Confidence Scoring: Shows AI certainty levels
- Ask about specific objects: "What objects did you see?"
- Query action events: "When did people walk?"
- Get summaries: "Describe the video"
- Check confidence: "How accurate was the analysis?"
- Animated Processing: Live progress with step indicators
- Responsive Timeline: Interactive event visualization
- Smooth Animations: Framer Motion powered transitions
- Modern Design: Clean, professional interface
For production GPU processing, configure these environment variables:
```env RUNPOD_API_BASE=https://your-pod-8080.proxy.runpod.net RUNPOD_SSH_KEY=/path/to/ssh/key RUNPOD_POD_ID=your-pod-id ```
The app uses in-memory storage by default:
- Fast access times
- No database setup required
- Automatic cleanup on restart
- Perfect for development and demos
- Processing Time: 3-5 seconds per video (simulated)
- Memory Usage: Minimal with automatic cleanup
- File Support: MP4, MOV, AVI, WebM up to 100MB
- Response Time: < 1 second for chat queries
```bash
npm i -g vercel
vercel
```
The app runs on any Node.js hosting platform:
- Netlify
- Railway
- Heroku
- AWS
- Google Cloud
- Video Processing: Modify `app/api/process-video/route.ts`
- Chat Logic: Update `app/api/chat/route.ts`
- UI Components: Add to `components/` directory
- Styling: Use TailwindCSS classes
```bash
npm run dev
```
- Fork the repository
- Create a feature branch: `git checkout -b feature-name`
- Make your changes
- Test thoroughly
- Submit a pull request
This project uses open-source models with permissive licenses:
- YOLOv8n: MIT License
- SlowFast: Apache 2.0
- BLIP-2: MIT License
- Ollama: Apache 2.0
- Next.js: MIT License
Video won't upload
- Check file size (max 100MB)
- Ensure video format (MP4, MOV, AVI, WebM)
- Check browser console for errors
Processing stuck
- Refresh the page
- Try a smaller video file
- Check network connection
Chat not responding
- Wait for processing to complete
- Check browser console
- Try refreshing the page
For help and questions:
- Check the browser console for errors
- Review this README
- Create an issue on GitHub
- Ultralytics for YOLOv8
- Facebook Research for SlowFast
- Salesforce for BLIP-2
- Ollama team for LLM integration
- Next.js team for the framework
- shadcn for UI components
Built for hackathons and production use π