Skip to content

NicholasCone/agnes-ai-video-suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

preview

EchoSync: Multimodal AI Storyboard & Video Prototyper

EchoSync is a self-hosted, open-source creative engine for generating rich, multi-scene video prototypes from a single script or idea. Inspired by the need for accessible video generation, EchoSync goes beyond text-to-video by weaving together AI-generated narration, dynamic subtitles, adaptive background scoring, and a digital anchor persona — all orchestrated through a responsive Web UI. It is designed for storytellers, marketers, educators, and product teams who want to iterate on video concepts rapidly, without subscription fees or cloud dependencies.


Overview 📽️

EchoSync transforms a plain text description into a cohesive video narrative. Unlike conventional tools that produce disjointed clips, EchoSync treats your input as a story to be built scene-by-scene. It uses a pipeline that interprets your script, generates matching visuals, synchronizes a synthetic voiceover, and layers in timing-accurate subtitles. The result is a polished prototype—ready for review, sharing, or further refinement—all running on your own infrastructure.

The digital anchor is a subtle, optional overlay that adds a human-like presence, guiding the viewer through the narrative. This is not a deepfake or a replacement for human talent; it is a tool for rapidly visualizing concepts, testing narrative flow, and iterating on pacing before committing to full production.


Key Features 🚀

Scene-Aware Generation

EchoSync analyzes your script to identify scene breaks, tone shifts, and key action points. It then generates visual sequences that match the narrative arc, ensuring each scene transitions smoothly into the next.

Multimodal Narration Engine

A built-in text-to-speech system creates natural, expressive narration with customizable voice profiles. The narration timing is locked to scene duration and subtitle cues, eliminating manual sync work.

Dynamic Subtitle & Captioning System

Subtitles are not static text. EchoSync supports real-time captioning with animated text effects, language switching, and position overrides. This makes the video accessible to a wider audience without additional editing.

Digital Anchor Persona

An optional AI-driven anchor avatar can be placed in a corner of the video. The avatar’s gestures and lip movements are generated from the narration audio, providing a visual guide for viewers. This feature is ideal for explainer videos or news-style formats.

Offline-First, Self-Hosted Architecture

Everything runs locally. No data leaves your machine. EchoSync is designed for privacy-conscious creators, teams with compliance requirements, or anyone who wants complete control over their media assets.

Mobile-Responsive Web UI

The interface adapts to any screen size, allowing you to manage projects, review outputs, and tweak parameters from a tablet or phone. The UI is built for low-latency previews, even on less powerful devices.

Multilingual Support

The pipeline supports over 30 languages for both narration and subtitle generation. The digital anchor can also switch languages in real-time based on the script metadata.


Download

🛡️ Privacy & Security

Because EchoSync runs on your own server, your scripts, generated videos, and training data never reach a third party. We encourage all users to review the security configuration in the documentation to lock down their instance properly. The MIT license ensures you are free to modify the system for enterprise compliance, but the core team does not collect telemetry or usage data.


Getting Started 🧭

Assuming you have a machine with Docker and a modern GPU (optional but recommended).

EchoSync is distributed as a single Docker image with one volume mount for video output. The Web UI will guide you through your first project.

  1. Pull the image and run the container with port mapping.
  2. Access the UI at localhost:8501.
  3. Paste your script into the input box or upload a .txt file.
  4. Select narration voice and subtitle language.
  5. Hit ‘Generate’ and watch the storyboard populate in real time.

That is it. No API keys, no cloud registration, no hidden fees. The first generation may take a few moments as the model loads, but subsequent runs will be faster.


Use Cases 🎯

  • Marketing Teams: Rapidly prototype video ads without hiring an editor for every iteration.
  • Educators: Create explainer videos with accurate subtitles in multiple languages.
  • Indie Game Developers: Storyboard cutscenes and dialogue sequences for testing narrative flow.
  • Content Creators: Generate video drafts for review before committing to a full recording session.
  • Product Managers: Visualize product walkthroughs for documentation or investor updates.

Why Self-Host? 🏠

We believe creative tools should not require a monthly subscription to be useful. EchoSync removes the barrier of per-video costs, rate limits, and data sovereignty concerns. You invest once in hardware and maintain your own creative library. The generated content belongs entirely to you.


Multilingual & Accessibility 🌍

EchoSync ships with support for English, Spanish, Mandarin, Arabic, Hindi, French, German, Japanese, Korean, Portuguese, Russian, and more. The subtitle engine supports right-to-left scripts and complex diacritics. We welcome community contributions for additional language models and accent variants.


Community & Support 💬

  • Documentation: Complete API reference, UI walkthroughs, and troubleshooting guides are included in the repository.
  • Issue Tracker: Found a bug? Have a feature request? Open an issue. We aim to respond within 48 hours.
  • Discussion Board: Share your generated prototypes, ask for workflow advice, or discuss scene composition techniques.
  • 24/7 Customer Support: For enterprise deployments, we offer paid support SLA with dedicated engineers. Contact us via the repository’s security email for inquiries.

License & Legal ⚖️

This project is released under the MIT License. You are free to use, modify, and distribute the software for any purpose, commercial or private. We do, however, encourage you to review the model licenses for any bundled neural network weights, as some may have their own non-commercial restrictions.

Full MIT License Text

Disclaimer

EchoSync is an AI-assisted tool, not a replacement for human creativity or judgment. The generated content should be reviewed for accuracy, appropriateness, and adherence to local laws before publication. The project maintainers are not liable for misuse of the software, including but not limited to the generation of misleading, harmful, or copyrighted material. Users are solely responsible for the content they create.


Future Roadmap (2026)

  • Scene interpolation AI: Smoothly generate transition frames between scenes.
  • Custom avatar import: Upload your own character design for the digital anchor.
  • Real-time collaborative editing: Multiple users on the same project timeline.
  • Audio track layering: Add background music and sound effects via prompt.

Download

Releases

No releases published

Packages

 
 
 

Contributors

Languages