Origami

Origami is a high-fidelity automated video generation platform that transforms PDF slides into cinematic narrated content. By leveraging advanced AI for script synthesis, professional Text-to-Speech (TTS), and programmatic video rendering, Origami streamlines the creation of engaging presentations, tutorials, and educational media directly from your static documents.

Features

PDF to Presentation: Upload PDF slides and automatically extract them into a sequence of video scenes.
AI-Powered Scripting: Integrated with Google Gemini AI and WebLLM (Local Browser Inference) to transform fragmented slide notes into coherent, professional scripts.
High-Quality TTS: Supports local and cloud-based Text-to-Speech using Kokoro-js.
- Local Inference: Run TTS entirely locally via Dockerized Kokoro FastAPI.
- Hybrid Voices: Create custom voice blends by mixing two models with adjustable weights.
Rich Media Support: Insert MP4 videos and GIFs seamlessly between slides.
Programmatic Video Rendering: Frame-perfect assembly powered by high-performance canvas rendering.
Smart Audio Engineering:
- Auto-Ducking: Background music volume automatically lowers during voiceovers.
- Normalization: Final render is automatically normalized to YouTube standards (-14 LUFS).
Interactive Slide Editor: Drag-and-drop reordering, real-time preview, and batch script updates.

Getting Started

Prerequisites

Node.js (v20+)
npm or yarn
Docker & Docker Compose (Required for local server-side TTS and deployment)

Installation

Clone the repository:

git clone https://github.com/IslandApps/Origami-AI.git
cd Origami-AI

Install dependencies:
```
npm install
```
Start the development server (runs both Vite and the rendering server):
```
npm run dev
```

The application will be available at http://localhost:3000.

Deployment (Docker)

To deploy this application using Docker, you must first clone the repository, as the image is built locally from the source.

Clone the repository:

git clone https://github.com/IslandApps/Origami-AI.git
cd Origami-AI

Using Docker Compose (Recommended)

A docker-compose.yml file is provided in the root directory. To start the application, run:

docker-compose up -d

Example docker-compose.yml:

services:
  origami-ai:
    build: .
    container_name: origami-ai
    ports:
      - "3000:3000"
    restart: unless-stopped
    environment:
      - PORT=3000
      - NODE_ENV=production

Using Docker CLI

Build the image:
```
docker build -t origami-ai .
```

Run the container:

docker run -d -p 3000:3000 --name origami-ai origami-ai

Self-Hosting with Dokploy

This project is designed to be easily self-hosted using Dokploy. Simply point Dokploy to this repository, and it will automatically detect the Dockerfile and docker-compose.yml to orchestrate the deployment.

The application will be available at http://localhost:3000.

Usage

1. Upload & Analyze

Drag and drop your presentation PDF into the main upload area. The application will process text from each page to create initial slides.

2. Configure & Enhance

Scroll down to the Configure Slides panel to manage your project globally:

Global Settings: Set a global voice (or create a custom Hybrid Voice), adjust post-slide delays, or run batch operations like "Find & Replace".
Media Assets: Click Insert Video to add MP4 clips or GIFs between slides.
Audio Mixing: Upload custom background music or select from the library (e.g., "Modern EDM"). Use the sliders to mix volume levels.

3. Creating the Narrative

In the Slide Editor grid:

AI Scripting: Click the AI Fix Script button (Sparkles icon) to have Gemini rewrite raw slide text into a natural spoken script.
Manual Editing: Edit scripts directly. Highlight specific text sections to generate/regenerate audio for just that part.
Generate Output: Click the Generate TTS button (Speech icon) to create voiceovers.
Preview: Click the Play button to hear the result or click the slide thumbnail to expand the visual preview.

4. Render

Click the Download Video button. The application will:

Pre-process the slide configuration in your browser.
Render frames in parallel using your browser's resources.
Process the final video and audio mix using client-side FFmpeg WASM.
Normalize the audio to -14 LUFS and download the resulting MP4.

Configuration

Open the Settings Modal (Gear Icon) to customize the application:

1. API Keys (Script Generation)

Configure the AI model used for script refinement ("AI Fix Script").

Google Gemini: Built-in and recommended. Requires a Google AI Studio API Key.
Custom/OpenAI-Compatible: Point to any OpenAI-compatible endpoint (e.g., LocalAI, Ollama, vLLM).
- Base URL: Enter your provider's URL (e.g., http://localhost:11434/v1).
- Model Name: Specify the model ID (e.g., llama-3).
- API Key: Enter if required by your provider.

2. Text-to-Speech (TTS)

Engine: Choose between the internal Web Worker (client-side) or a local Dockerized Kokoro instance (faster/server-side).
Audio Defaults: Set default voice models and quantization levels (q4/q8).

3. Background Music Library

You can build your own library of background music tracks that will be available in the dropdown menus:

Navigate to the src/assets/music/ directory.
Paste your .mp3 files here.
The application will automatically detect these files and list them in the UI (e.g., my_cool_track.mp3 becomes "My Cool Track").

Tech Stack

Frontend: React 19, Vite, Tailwind CSS (v4)
Video Engine: FFmpeg WASM (Client-side)
AI: Google Gemini API & WebLLM (Local Browser Inference)
TTS: Kokoro (FastAPI / Web Worker)
Backend: Express.js (serving as a rendering orchestration layer)
Utilities: Lucide React (icons), dnd-kit (drag & drop), pdfjs-dist (PDF processing)

Project Structure

src/components/: React UI components (Slide Editor, Modals, Uploaders).
src/services/: Core logic for AI, TTS, PDF processing, and local storage.
server.ts: Express server handling static file serving and SPA routing.

Roadmap & TODO

YouTube Metadata Generator: Automatically generate optimized titles and descriptions using Gemini.
Thumbnail Generator: Create custom YouTube thumbnails based on slide content.
Voiceover Recording: Support for recording custom voiceovers directly within the app using a microphone.
Header Layout Optimization: Refactor and organize the application header for better aesthetics and usability.

Acknowledgements

This project is made possible by the following incredible open-source libraries and projects:

FFmpeg.wasm: Enabling frame-perfect video assembly directly in the browser.
WebLLM: Bringing high-performance local LLM inference to the web.
Kokoro-js: Providing high-quality, local Text-to-Speech capabilities.
Hugging Face Transformers: Powering state-of-the-art machine learning in the browser.
PDF.js: The standard for parsing and rendering PDF documents.
Lucide React: Beautifully crafted open-source icons.
dnd-kit: A modern, lightweight toolkit for drag-and-drop interfaces.
Dokploy: The open-source platform used for seamless self-hosting and deployment.
Google Antigravity: The AI-powered IDE used to build and refine this project.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
public		public
screenshots		screenshots
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.md		config.md
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
server.ts		server.ts
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.server.json		tsconfig.server.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Origami

Table of Contents

Features

Getting Started

Prerequisites

Installation

Deployment (Docker)

Using Docker Compose (Recommended)

Using Docker CLI

Self-Hosting with Dokploy

Usage

1. Upload & Analyze

2. Configure & Enhance

3. Creating the Narrative

4. Render

Configuration

1. API Keys (Script Generation)

2. Text-to-Speech (TTS)

3. Background Music Library

Tech Stack

Project Structure

Roadmap & TODO

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

IslandApps/Origami-AI

Folders and files

Latest commit

History

Repository files navigation

Origami

Table of Contents

Features

Getting Started

Prerequisites

Installation

Deployment (Docker)

Using Docker Compose (Recommended)

Using Docker CLI

Self-Hosting with Dokploy

Usage

1. Upload & Analyze

2. Configure & Enhance

3. Creating the Narrative

4. Render

Configuration

1. API Keys (Script Generation)

2. Text-to-Speech (TTS)

3. Background Music Library

Tech Stack

Project Structure

Roadmap & TODO

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages