A full-stack web application that leverages advanced AI to analyze and extract insights from virtually any type of media content. Built with cutting-edge technologies including React, Flask, LangChain, and Google's Gemini model, PolySensor transforms unstructured data into actionable intelligence through an intuitive web interface.
- Multi-Modal Analysis: Seamlessly processes documents, images, audio, and video files
- Advanced RAG Architecture: Utilizes Retrieval-Augmented Generation for superior contextual understanding
- Universal Text Extraction: Capable of extracting and analyzing textual information from almost any format
- Google Gemini Integration: Harnesses the power of one of the most advanced LLMs available
- LangChain Orchestration: Professional-grade AI workflow management
- Smart Context Awareness: Understands content relationships and nuances
From research papers to multimedia presentations, PolySensor delivers deep analytical insights across:
- Academic & Technical Documents
- Business Reports & Presentations
- Multimedia Content & Recordings
- Visual Data & Infographics
graph TB
subgraph "π Frontend Layer"
U[π€ User] --> FE[βοΈ React Frontend<br/>File Upload Interface]
FE --> API[π‘ API Request<br/>File + Metadata]
end
subgraph "π§ Backend Processing Layer"
API --> B[π File Type Router<br/>Flask API]
subgraph "π₯ Input Processing"
B --> C[π Document Processor]
B --> D[πΌοΈ Image Processor]
B --> E[π΅ Audio Processor]
B --> F[π¬ Video Processor]
C --> C1[Unstructured<br/>Partition]
D --> D1[PyTesseract<br/>OCR]
E --> E1[SpeechRecognition<br/>Library]
F --> F1[Frame Extraction<br/>+ Audio Separation]
end
subgraph "π€ AI Analysis"
C1 --> G[π Content Aggregator]
D1 --> G
E1 --> G
F1 --> G
G --> H[π¬ Prompt Engine<br/>LangChain Templates]
H --> I[π§ LLM Gateway<br/>Google Gemini API]
end
end
subgraph "π€ Response Layer"
I --> J[π Analysis Results<br/>JSON Response]
J --> FE2[βοΈ Frontend Display<br/>Markdown Rendering]
FE2 --> U2[π€ User Views<br/>Analysis & Export]
end
%% Styling with better contrast
classDef frontend fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000000
classDef backend fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#000000
classDef input fill:#fff3e0,stroke:#ef6c00,stroke-width:2px,color:#000000
classDef ai fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000000
classDef response fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
class U,FE,API,FE2,U2 frontend
class B,C,D,E,F,C1,D1,E1,F1 backend
class G,H input
class I ai
class J response
- Intuitive Drag-and-Drop Interface: Seamless file upload with real-time validation and preview
- Responsive Design: Modern React-based UI that works across devices and screen sizes
- Real-Time Processing Feedback: Loading indicators and error handling for smooth user experience
- Rich Markdown Rendering: Beautiful display of AI analysis results using markdown-to-jsx
- One-Click PDF Export: Export analysis results to PDF using html2canvas and jsPDF for easy sharing
- Flask API Server: Single RESTful endpoint (/analyze0) for file uploads with CORS enabled for frontend communication
- File Type Detection: Extension-based routing for documents, images, audio, and video files
- Document Processing: Uses Unstructured library to extract and convert content from 40+ document formats into JSON for AI analysis
- Media Validation: Length limits (1 minute for audio, 30 seconds for video) with premium upgrade messaging for longer content
- AI Integration: Google Gemini 2.5 Pro model with LangChain orchestration using specialized prompts for each media type
- Temporary Storage: Secure file handling with automatic cleanup of uploaded files and temp directories
- Error Handling: JSON responses for file validation errors, processing failures, and unsupported formats
- Content Analysis: Structured analysis output in table format for all media types with pattern detection and insights
![]() Raw file [Presentation File] as input |
![]() PolySensor analysis results, which is exportable to pdf format. |
Extension(s) | Description |
---|---|
.jpg , .jpeg |
JPEG images |
.png |
Portable Network Graphics |
.gif |
Graphics Interchange Format |
.webp |
WebP image |
.heic |
High Efficiency Image Format |
.tif , .tiff |
Tagged Image File Format |
.bmp |
Bitmap Image File |
Extension(s) | Description |
---|---|
.mp3 |
MPEG Audio Layer III |
.wav |
Waveform Audio File Format |
.flac |
Free Lossless Audio Codec |
.aiff |
Audio Interchange File Format |
Extension(s) | Description |
---|---|
.mp4 |
MPEG-4 Part 14 Video |
.mkv |
Matroska Video File |
Extension(s) | Description |
---|---|
.pdf |
Portable Document Format |
.docx |
Microsoft Word Open XML Document |
.doc |
Microsoft Word Document (older format) |
.txt |
Plain Text File |
.odt |
OpenDocument Text File |
.rtf |
Rich Text Format |
.md |
Markdown Documentation |
.epub |
Electronic Publication |
.hwp |
Hangul Word Processor File |
.abw , .zabw |
AbiWord Document |
.org |
Lotus Organizer Data File or Data Analysis File |
.rst |
reStructuredText File |
Extension(s) | Description |
---|---|
.xlsx |
Microsoft Excel Open XML Spreadsheet |
.xls |
Microsoft Excel Spreadsheet (older format) |
.csv |
Comma-Separated Values File |
.tsv |
Tab-Separated Values File |
.fods |
OpenDocument Flat XML Spreadsheet |
.dif |
Data Interchange Format File |
.dbf |
dBase Database File |
.et |
E-Text Spreadsheet |
Extension(s) | Description |
---|---|
.pptx |
Microsoft PowerPoint Open XML Presentation |
.ppt |
Microsoft PowerPoint Presentation (older format) |
.pptm |
Microsoft PowerPoint Macro-Enabled Presentation |
.pot |
Microsoft PowerPoint Template |
Extension(s) | Description |
---|---|
.msg |
Microsoft Outlook Message |
.eml |
Electronic Mail File |
.p7s |
PKCS #7 Signature File Format |
Extension(s) | Description |
---|---|
.xml |
Extensible Markup Language File |
.html , .htm |
Hypertext Markup Language File |
.md |
Markdown Documentation |
.cwk |
AppleWorks Document |
.mcw |
Microchip MPLAB Workspace |
.prn |
Print to File |
.eth |
Ethnograph Data File |
.pbd |
PowerBuilder Document |
.sdp |
Session Description Protocol File |
.mw |
MathWorks MATLAB Workspace File |
.sxg |
Signed Exchange File |
- Python 3.11 (Recommended for Unstructured) or can use higher version 3.11+
- Node.js 16+ and npm
- Google Gemini API key
- Tesseract OCR (for image/text extraction)
- Clone the repository
git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor
- Backend Setup
# Install Python dependencies
pip install -r requirements.txt
# Install Tesseract OCR
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Mac: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr
# Set up environment variables
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
- Frontend Setup
# Navigate to frontend directory
cd frontend
# Install Node.js dependencies
npm install
# Return to root directory
cd ..
- Start the Backend Server
python main.py
The Flask server will start on http://localhost:5000
- Start the Frontend (in a new terminal)
cd frontend
npm run dev
The React app will be available at http://localhost:5173
- Access the Application
Open your browser and navigate to
http://localhost:5173
to use the web interface. Upload files through the drag-and-drop interface and receive AI-powered analysis results.
For easier development and deployment, PolySensor supports Docker. This ensures consistent environments and handles system dependencies automatically.
- Docker and Docker Compose installed on your system
- Google Gemini API key
- Clone the repository
git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor
- Set up environment variables
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
- Run the application
docker-compose up --build
- Access the Application
- Frontend:
http://localhost:5173
- Backend API:
http://localhost:5000
- Start services:
docker-compose up
- Build and start:
docker-compose up --build
- Stop services:
docker-compose down
- View logs:
docker-compose logs -f [service_name]
- Rebuild specific service:
docker-compose up --build [service_name]
- Backend: The
render.yaml
is configured for Docker deployment on Render - Frontend: Deploy to Vercel as usual (static hosting)
PolySensor/
βββ main.py # Flask backend entry point
βββ data_handling.py # Content extraction and validation utilities
βββ prompts.py # AI prompt templates for different content types
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (API keys)
βββ .gitignore # Git ignore rules
βββ README.md # Project README
βββ DOCS/ # Documentation
βββ frontend/ # React frontend application
β βββ public/
β β βββ index.html # Main HTML template
β β βββ favicon.ico
β βββ src/
β β βββ components/ # Reusable React components
β β β βββ AnalysisResults/ # Displays analysis output with markdown support
β β β βββ FileUploader/ # Drag-and-drop file upload interface
β β β βββ Loading/ # Loading spinner component
β β βββ pages/
β β β βββ AnalyzePage.jsx # Main analysis page
β β βββ utils/
β β β βββ apiService.js # API communication utilities
β β βββ App.jsx # Main React application component
β β βββ App.css # Global styles
β β βββ main.jsx # React application entry point
β β βββ index.css # Base styles
β βββ package.json # Node.js dependencies and scripts
β βββ vite.config.js # Vite build configuration
βββ assets/ # Project assets
βββ utilities/ # Additional tools
PolySensor/
βββ main.py # Flask backend server
βββ data_handling.py # Media processing functions
βββ prompts.py # AI prompt templates
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (create this)
Backend:
main.py
: Flask API server with CORS support, handles file uploads and AI processingdata_handling.py
: Content extraction functions for documents, images, audio, and videoprompts.py
: Specialized prompts for different media types optimized for Gemini AI
Frontend:
App.jsx
: Main application router and state managementAnalyzePage.jsx
: Core analysis interface with file upload and results displayFileUploader.jsx
: Drag-and-drop file upload component with validationAnalysisResults.jsx
: Markdown rendering component with PDF export functionalityapiService.js
: Axios-based API client for backend communication
Get your Google Gemini API key from Google AI Studio and add it to your .env file:
GOOGLE_API_KEY=your_actual_api_key_here
You can modify the analysis prompts in prompts.py
to tailor the output to your specific needs:
# Example custom prompt
CUSTOM_ANALYSIS = '''
Analyze this content and focus on technical details:
Content: {content_data} // Place holder which contains the function output
Please provide:
1. Technical specifications
2. Implementation details
3. Potential improvements
'''
# Input: research_paper.pdf
# Output: Summary of key findings, methodology, and conclusions
# Input: diagram.png
# Output: Extracted text + analysis of visual content and structure
# Input: presentation.mp4
# Output: Combined analysis of slide content and spoken presentation
- Add file extension detection in
main.py
:
if file_path.lower().endswith(('.new_extension')):
new_data = new_extraction_function(file_path)
- Create extraction function in
text_extractor.py
:
def new_extraction_function(file_path):
# Implement extraction logic
return extracted_content
- Add prompt template in
prompts.py
:
NEW_TYPE = '''
Your analysis prompt for new file type...
'''
# Add tests to the repository and run with:
python -m pytest tests/
The AI provides structured analysis including:
- Key Points: Main takeaways from the content
- Summary: Concise overview of the material
- Actionable Insights: Practical recommendations
- Ambiguity Detection: Identification of unclear sections
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
"File does not exist" error
- Check file path spelling
- Use absolute paths if needed
OCR not working
- Verify Tesseract installation
- Check image quality and resolution
API key errors
- Ensure
.env
file is in the project root - Verify the API key has sufficient permissions
Audio/video processing slow
- Large files may take time to process
- Consider shorter intervals for video analysis
- Check existing Issues for similar problems
- Create a new issue with detailed error messages and file examples
Aditya Singh
- Artificial Intelligence Engineer
- Project Lead & Full-Stack Developer
- AI/ML Integration & Architecture Design
- Multi-modal Content Processing Specialist
This project stands on the shoulders of these amazing open-source technologies:
Technology | Purpose | Credit |
---|---|---|
React | Frontend Framework | Meta |
Vite | Build Tool & Dev Server | Vite |
Flask | Backend Web Framework | Pallets |
Google Gemini | AI Language Model | Google AI |
LangChain | LLM Orchestration | LangChain AI |
Unstructured | Document Processing | Unstructured IO |
SpeechRecognition | Audio Transcription | Uberi |
PyTesseract | Image OCR | Tesseract OCR |
MoviePy | Video Processing | Zulko |
Pydub | Audio Conversion | Jiaaro |
Pillow | Image Processing | Python Pillow |
Axios | HTTP Client | Axios |
html2canvas | HTML to Canvas | Niklas von Hertzen |
jsPDF | PDF Generation | Parallax |
markdown-to-jsx | Markdown Rendering | ProbablyUp |
- Open Source Community for invaluable tools and libraries
- Google Gemini Team for powerful AI capabilities
- All Future Contributors who will test and improve PolySensor
This tool is designed for content analysis and should be used in compliance with copyright laws and content usage rights. Always ensure you have permission to analyze and process the files you use with this system.