Paste. Listen. Learn. — AI-powered learning platform that transforms any text into an interactive podcast, visual lesson, quiz, and AI tutor.
Paste your study material and PodLearn generates four learning outputs:
- Summary — Key concepts extracted and organized at a glance
- Podcast — A two-voice (host + expert) audio conversation with synced transcript, click-to-seek, and a visual lesson panel that highlights concepts as they're discussed
- Quiz — Auto-generated multiple-choice questions with explanations and scoring
- Chat — An AI tutor that answers follow-up questions based on your material
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) + React 19 |
| Styling | Tailwind CSS 4 |
| AI Content | Google Gemini 2.5 Flash (streaming structured output via Vercel AI SDK) |
| Text-to-Speech | ElevenLabs API (dual voices, batched requests) |
| Image Generation | Runware API |
| Validation | Zod 4 |
| Deployment | Vercel |
- Node.js 18+
- API keys for: Google Gemini, ElevenLabs, Runware
git clone https://github.com/prameshbajra/podlearn.git
cd podlearn
npm installCreate a .env.local file:
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_key
ELEVENLABS_API_KEY=your_elevenlabs_key
RUNWARE_API_KEY=your_runware_keynpm run devOpen http://localhost:3000.
src/
├── app/
│ ├── api/
│ │ ├── generate-content/ # Gemini streaming (summary + podcast script + quiz)
│ │ ├── generate-audio/ # ElevenLabs TTS with batched dual-voice
│ │ ├── generate-image/ # Runware AI image generation
│ │ ├── generate-annotations/ # Concept-to-image region mapping
│ │ └── chat/ # AI tutor chat endpoint
│ ├── layout.tsx
│ └── page.tsx # Main app orchestration
├── components/
│ ├── AudioPlayer.tsx # Web Audio API playback with seek
│ ├── SyncedDiagram.tsx # Visual lesson synced to podcast
│ ├── PodcastPanel.tsx # Transcript + audio + diagram
│ ├── SummaryPanel.tsx # Key concepts display
│ ├── QuizPanel.tsx # Interactive quiz with scoring
│ ├── ChatPanel.tsx # AI tutor chat interface
│ ├── InputForm.tsx # Text input with difficulty/language
│ ├── SplashScreen.tsx # Animated launch screen
│ ├── Header.tsx # App header
│ └── TabNavigation.tsx # Tab switching
└── lib/
├── schemas.ts # Zod schemas for all AI outputs
└── mock-data.ts # Development mock data
- Deterministic concept sync — Gemini generates
conceptMappingindices at content creation time, so the visual lesson panel stays perfectly in sync with the podcast audio - Dual-voice podcast — Host and expert voices via ElevenLabs with crossfade transitions
- Click-to-seek transcript — Click any line in the transcript to jump to that point in the audio
- Streaming generation — Content streams in real-time as Gemini generates it
- Difficulty levels — Beginner, Intermediate, and Advanced content generation
- Multi-language support — Generate content in different languages
- User pastes study material (50–1000 words) and picks a difficulty level
- Gemini 2.5 Flash streams a structured JSON response containing the summary, podcast script (with
conceptMappingindices), and quiz — all in one pass - ElevenLabs converts each script turn into audio (host and expert voices, batched in groups of 4)
- Runware generates a concept diagram based on the material
- The frontend stitches audio segments with crossfade, syncs the visual lesson panel to playback time, and renders everything across four tabs