π Automagically transform your documents into beautiful PowerPoint presentations using AI
SlideForge is a multi-agent AI system that automatically generates professional PowerPoint presentations from various document formats (PDF, Word, TXT). It analyzes your documents, extracts key information, creates well-structured slides, and applies appropriate styling - all without manual intervention.
- π Multi-format Support: Process PDF, DOCX, and TXT files
- π Large Document Support: Efficiently handles documents of 100+ pages with intelligent chunking
- π§ AI-Powered Content Extraction: Intelligently extract and synthesize key information
- π Smart Slide Generation: Create well-structured slides with proper hierarchy
- π¨ Automatic Styling: Apply context-appropriate visual designs
- π Processing Pipeline: Track job status from upload to completion
- π User Authentication: Secure access with JWT authentication
- π± RESTful API: Clean API for integration with any client
SlideForge uses a modular, multi-agent architecture:
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Extraction β β Generation β β Optimization β
β Agent ββββββΊβ Agent ββββββΊβ Agent β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β β β
βββββββββββββββ¬βββββββ΄ββββββββββββββ¬ββββββββ
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β Database β β File Storage β
βββββββββββββββββ βββββββββββββββββ
β²
β
βββββββββββββββββ
β FastAPI β
β Backend β
βββββββββββββββββ
β²
β
βββββββββββββββββ
β Client β
β Application β
βββββββββββββββββ
-
π Extraction & Synthesis Agent
- Processes uploaded documents using PyPDF for PDF and python-docx for DOCX files
- Handles large documents (100+ pages) using intelligent chunking and strategic extraction
- Extracts text, structure, and metadata
- Analyzes content using OpenAI o3-mini and Anthropic Claude 3.7 Sonnet via LangChain
- Generates summaries, extracts keywords, and structures content
- Creates a presentation-ready data structure
-
π Slide Generation Agent
- Creates slide structure
- Organizes content hierarchically
- Generates PPTX files
- Creates appropriate sections and summaries
-
β¨ Graphic Optimization Agent
- Analyzes content context
- Selects appropriate visual styles
- Enhances typography and layout
- Applies consistent design principles
- Backend: FastAPI
- Database: SQLAlchemy with SQLite/PostgreSQL
- Authentication: JWT
- AI/ML: LangChain with OpenAI o3-mini and Anthropic Claude 3.7 Sonnet
- Document Processing: PyPDF, python-docx
- Presentation Generation: python-pptx
- Task Processing: Async processing
- Storage: Local filesystem (expandable to S3)
- Python 3.10+
- OpenAI API key
- Anthropic API key
- PostgreSQL (optional, for production)
- Clone the repository
git clone https://github.com/yourusername/slideforge.git
cd slideforge- Install dependencies
pip install -r requirements.txt- Set up environment variables
Create a .env file in the project root:
DEBUG=true
SECRET_KEY=your_secret_key
# LLM API Keys
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
# Uncomment for PostgreSQL
# DATABASE_URI=postgresql://postgres:postgres@localhost/slideforge
- Initialize the database and create a superuser
python setup.pyStart the development server:
python run.pyThe API will be available at http://localhost:8000.
API documentation is available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
curl -X POST "http://localhost:8000/api/auth/register" \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com", "password": "SecurePassword123", "full_name": "John Doe"}'curl -X POST "http://localhost:8000/api/auth/login" \
-H "Content-Type: application/json" \
-d '{"username": "user@example.com", "password": "SecurePassword123"}'curl -X POST "http://localhost:8000/api/documents" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@path/to/your/document.pdf"curl -X POST "http://localhost:8000/api/jobs" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"document_id": 1, "settings": {"style": "corporate"}}'curl -X GET "http://localhost:8000/api/jobs/1" \
-H "Authorization: Bearer YOUR_TOKEN"curl -X GET "http://localhost:8000/api/presentations/1/download" \
-H "Authorization: Bearer YOUR_TOKEN" \
--output presentation.pptxPOST /api/auth/register- Register a new userPOST /api/auth/login- Log in and get JWT tokenGET /api/auth/me- Get current user info
POST /api/documents- Upload a documentGET /api/documents- List documentsGET /api/documents/{id}- Get document detailsDELETE /api/documents/{id}- Delete a document
POST /api/jobs- Create a presentation jobGET /api/jobs- List jobsGET /api/jobs/{id}- Get job statusDELETE /api/jobs/{id}- Cancel a job
GET /api/presentations- List presentationsGET /api/presentations/{id}- Get presentation detailsGET /api/presentations/{id}/download- Download presentationGET /api/presentations/{id}/thumbnail- Get presentation thumbnailDELETE /api/presentations/{id}- Delete a presentation
SlideForge uses state-of-the-art LLMs from OpenAI and Anthropic to process documents:
-
Text Summarization: Uses Anthropic Claude 3.7 Sonnet with step-by-step thinking for comprehensive document summarization. The system prompt instructs Claude to think through its reasoning process in detail before providing a summary.
-
Keyword Extraction: Uses OpenAI o3-mini for efficient and accurate keyword identification, balancing quality and cost-effectiveness.
-
Content Structuring: Uses Anthropic Claude 3.7 Sonnet with thinking to analyze document structure and organize content into a coherent presentation format, with clear sections and priority points.
The LLM integration is managed through LangChain, providing:
- Structured output parsing with Pydantic models
- Context management for accurate processing
- Model fallbacks for reliability
- Special system prompts that enhance Claude's reasoning capabilities
SlideForge implements intelligent strategies to handle large documents:
-
PDF Processing: For large PDFs (30+ pages), the system extracts the table of contents, introduction, conclusion, and strategically distributed content samples to create a comprehensive representation of the document.
-
DOCX Processing: For large Word documents (500+ paragraphs), the system analyzes the document structure, extracts headings, and samples content from key sections to maintain context while keeping processing manageable.
-
TXT Processing: For large text files (1MB+), the system extracts the beginning, end, and strategically distributed chunks from throughout the file.
This approach enables the system to:
- Process arbitrarily large documents without running into token limits
- Capture the most important information from each document
- Maintain context and coherence despite not processing every word
- Optimize LLM usage by focusing on the most relevant content
- π₯οΈ Web-based user interface
- π± Mobile app integration
- π§© Custom template system
- π Integration with cloud storage services
- π More chart and diagram types
- π§ Enhanced AI content extraction
- π Real-time collaboration
- π Multi-language support
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for AI orchestration
- FastAPI for the web framework
- python-pptx for presentation generation
- SQLAlchemy for database ORM
- OpenAI and Anthropic for LLM APIs