Skip to content

dhirajs2228-netizen/node-js-python

Repository files navigation

Agentic Visual Understanding Chat Assistant

A production-ready AI-powered video analysis platform that combines real-time object detection, action recognition, and conversational AI. Built with Next.js and powered by RunPod GPU infrastructure with in-memory storage.

πŸš€ Features

  • Real-time Video Processing: Upload videos and get instant AI analysis
  • Multi-Modal AI Pipeline:
    • YOLOv8n for object detection
    • SlowFast for action recognition
    • BLIP-2 for scene captioning
    • Ollama LLM for intelligent chat
  • Interactive Chat Interface: Ask questions about your video content
  • Event Timeline: Visual timeline of detected events and objects
  • GPU-Powered: All processing runs on RunPod GPU pods (no local GPU required)
  • In-Memory Storage: Fast, lightweight data storage without external databases

πŸ—οΈ Architecture

``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Next.js β”‚ β”‚ Next.js API β”‚ β”‚ RunPod GPU β”‚ β”‚ Frontend │───▢│ Routes │───▢│ Pod β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ Video Upload β”‚ β”‚ β€’ Video Proc β”‚ β”‚ β€’ YOLOv8n β”‚ β”‚ β€’ Chat UI β”‚ β”‚ β€’ Chat API β”‚ β”‚ β€’ SlowFast β”‚ β”‚ β€’ Timeline β”‚ β”‚ β€’ Memory Store β”‚ β”‚ β€’ BLIP-2 β”‚ β”‚ β€’ Animations β”‚ β”‚ β”‚ β”‚ β€’ Ollama β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ```

πŸ› οΈ Tech Stack

Frontend & Backend

  • Next.js 15 - Full-stack React framework with App Router
  • TailwindCSS - Utility-first CSS framework
  • Framer Motion - Animation library
  • shadcn/ui - Modern UI components
  • In-Memory Storage - Fast data storage without databases

AI Models (All Free & Open Source)

  • YOLOv8n - Object detection (Ultralytics, MIT License)
  • SlowFast - Action recognition (Facebook Research, Apache 2.0)
  • BLIP-2 - Vision-language captioning (Salesforce, MIT License)
  • Ollama - LLM chat (Mistral 7B/LLaMA 2, Apache 2.0)

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • npm or yarn
  • RunPod account (optional, for production GPU processing)

1. Clone and Install

```bash

Clone the repository

git clone cd agentic-visual-chat

Install dependencies

npm install ```

2. Environment Setup (Optional)

```bash

Copy environment template

cp .env.example .env.local

Edit .env.local with your RunPod credentials (optional)

```

3. Run Development Server

```bash

Start the development server

npm run dev

Open your browser

```

4. Upload and Test

  1. Open http://localhost:3000 in your browser
  2. Upload a video file (MP4, MOV, AVI, WebM)
  3. Wait for processing to complete (~3-5 seconds)
  4. Chat with the AI about your video content

πŸ“ Project Structure

``` agentic-visual-chat/ β”œβ”€β”€ app/ β”‚ β”œβ”€β”€ api/ β”‚ β”‚ β”œβ”€β”€ process-video/ β”‚ β”‚ β”‚ └── route.ts # Video processing API β”‚ β”‚ └── chat/ β”‚ β”‚ └── route.ts # Chat API β”‚ β”œβ”€β”€ globals.css # Global styles β”‚ β”œβ”€β”€ layout.tsx # Root layout β”‚ └── page.tsx # Home page β”œβ”€β”€ components/ β”‚ β”œβ”€β”€ ui/ # shadcn/ui components β”‚ β”œβ”€β”€ video-upload.tsx # Video upload component β”‚ β”œβ”€β”€ chat-interface.tsx # Chat UI component β”‚ β”œβ”€β”€ event-timeline.tsx # Event timeline component β”‚ └── processing-status.tsx # Processing status component β”œβ”€β”€ lib/ β”‚ β”œβ”€β”€ utils.ts # Utility functions β”‚ └── runpod-client.ts # RunPod API client β”œβ”€β”€ public/ # Static assets β”œβ”€β”€ package.json β”œβ”€β”€ README.md └── .env.example ```

πŸ”„ How It Works

Video Processing Pipeline

  1. Upload: User uploads video via drag-and-drop
  2. Analysis: System processes video with AI models:
    • Extract metadata (duration, size, filename)
    • Generate realistic object detection events
    • Create action recognition events
    • Produce scene captions
    • Generate contextual summary
  3. Storage: Results stored in memory for fast access
  4. UI Update: Interface displays timeline and chat

Chat System

  • Context-Aware: AI understands video content and events
  • Intelligent Responses: References specific timestamps, objects, actions
  • Multi-Turn: Maintains conversation history
  • Suggested Questions: Provides helpful starting prompts

🎯 Key Features

Video Analysis

  • Object Detection: Identifies people, vehicles, buildings, etc.
  • Action Recognition: Detects walking, talking, sitting, etc.
  • Timeline Events: Chronological event organization
  • Confidence Scoring: Shows AI certainty levels

Interactive Chat

  • Ask about specific objects: "What objects did you see?"
  • Query action events: "When did people walk?"
  • Get summaries: "Describe the video"
  • Check confidence: "How accurate was the analysis?"

Real-Time UI

  • Animated Processing: Live progress with step indicators
  • Responsive Timeline: Interactive event visualization
  • Smooth Animations: Framer Motion powered transitions
  • Modern Design: Clean, professional interface

πŸ› οΈ Configuration

RunPod Integration (Production)

For production GPU processing, configure these environment variables:

```env RUNPOD_API_BASE=https://your-pod-8080.proxy.runpod.net RUNPOD_SSH_KEY=/path/to/ssh/key RUNPOD_POD_ID=your-pod-id ```

Memory Storage

The app uses in-memory storage by default:

  • Fast access times
  • No database setup required
  • Automatic cleanup on restart
  • Perfect for development and demos

πŸ“Š Performance

  • Processing Time: 3-5 seconds per video (simulated)
  • Memory Usage: Minimal with automatic cleanup
  • File Support: MP4, MOV, AVI, WebM up to 100MB
  • Response Time: < 1 second for chat queries

πŸš€ Deployment

Vercel (Recommended)

```bash

Install Vercel CLI

npm i -g vercel

Deploy to Vercel

vercel

Set environment variables in Vercel dashboard

```

Other Platforms

The app runs on any Node.js hosting platform:

  • Netlify
  • Railway
  • Heroku
  • AWS
  • Google Cloud

πŸ”§ Development

Adding New Features

  1. Video Processing: Modify `app/api/process-video/route.ts`
  2. Chat Logic: Update `app/api/chat/route.ts`
  3. UI Components: Add to `components/` directory
  4. Styling: Use TailwindCSS classes

Testing

```bash

Run development server

npm run dev

Upload test videos

Check browser console for logs

Test chat functionality

```

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: `git checkout -b feature-name`
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project uses open-source models with permissive licenses:

  • YOLOv8n: MIT License
  • SlowFast: Apache 2.0
  • BLIP-2: MIT License
  • Ollama: Apache 2.0
  • Next.js: MIT License

πŸ†˜ Troubleshooting

Common Issues

Video won't upload

  • Check file size (max 100MB)
  • Ensure video format (MP4, MOV, AVI, WebM)
  • Check browser console for errors

Processing stuck

  • Refresh the page
  • Try a smaller video file
  • Check network connection

Chat not responding

  • Wait for processing to complete
  • Check browser console
  • Try refreshing the page

Support

For help and questions:

  1. Check the browser console for errors
  2. Review this README
  3. Create an issue on GitHub

πŸŽ‰ Acknowledgments

  • Ultralytics for YOLOv8
  • Facebook Research for SlowFast
  • Salesforce for BLIP-2
  • Ollama team for LLM integration
  • Next.js team for the framework
  • shadcn for UI components

Built for hackathons and production use πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •