Generate narrated infographics locally using Google Gemini for images, scripts, and text-to-speech. Enter any topic and get a full-screen infographic with an AI-generated narration script and audio.
- AI-generated infographics - Gemini generates the image, script, and narration from a single topic or question
- Text-to-speech narration - Gemini voices bring the infographic to life
- Multiple aspect ratios - 16:9 (landscape) and 9:16 (portrait)
- 4 voice options - Achird, Aoede, Charon, Laomedeia
- MP4 export - Download your infographic as a shareable video
- Local storage - Everything is saved locally, no cloud accounts needed
- Node.js 20.9+ (required by Next.js 16)
- npm (bundled with Node.js)
- Windows, macOS, or Linux. FFmpeg is bundled through npm dependencies; no separate system FFmpeg install is required.
-
Clone the repository
git clone https://github.com/Cp557/audiblegraphics.git cd audiblegraphics -
Install dependencies
npm install
-
Add your API key
Open
.envand fill in your key. The Google account or project behind this key must have billing enabled for Gemini API usage.GEMINI_API_KEY=your-gemini-api-key-here -
Generate voice samples
npm run generate:voice-samples
-
Start the app
npm run dev
- Gemini - aistudio.google.com -> Get API key. Make sure billing is enabled for the API key's Google Cloud project before generating infographics or voice samples.
- Enter a topic or question in the input box
- Choose a voice and aspect ratio
- Click Generate - Gemini creates the script, image, and narration audio
- Your infographic is saved locally and listed in the sidebar
- Optionally download as MP4 video
Each infographic is stored as its own folder under public/uploads/, named after the topic:
public/uploads/
history-of-rome/
image.jpg
audio.mp3
meta.json <- title, speaker notes, aspect ratio, creation date
video.mp4 <- only present if you exported it
black-holes-explained/
image.jpg
audio.mp3
meta.json
Everything is gitignored - your generated content stays local.
- Next.js 16 (App Router)
- TypeScript
- Tailwind CSS v4
- shadcn/ui
- Google Gemini (
@google/genai) - image, script, and text-to-speech generation - ffmpeg-static + Sharp - MP4 video export (bundled, no system install needed)
