Skip to content

Sauravbhusal1/PodLearn

 
 

Repository files navigation

PodLearn

Paste. Listen. Learn. — AI-powered learning platform that transforms any text into an interactive podcast, visual lesson, quiz, and AI tutor.

What It Does

Paste your study material and PodLearn generates four learning outputs:

  1. Summary — Key concepts extracted and organized at a glance
  2. Podcast — A two-voice (host + expert) audio conversation with synced transcript, click-to-seek, and a visual lesson panel that highlights concepts as they're discussed
  3. Quiz — Auto-generated multiple-choice questions with explanations and scoring
  4. Chat — An AI tutor that answers follow-up questions based on your material

Tech Stack

Layer Technology
Framework Next.js 16 (App Router) + React 19
Styling Tailwind CSS 4
AI Content Google Gemini 2.5 Flash (streaming structured output via Vercel AI SDK)
Text-to-Speech ElevenLabs API (dual voices, batched requests)
Image Generation Runware API
Validation Zod 4
Deployment Vercel

Getting Started

Prerequisites

  • Node.js 18+
  • API keys for: Google Gemini, ElevenLabs, Runware

Setup

git clone https://github.com/prameshbajra/podlearn.git
cd podlearn
npm install

Create a .env.local file:

GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_key
ELEVENLABS_API_KEY=your_elevenlabs_key
RUNWARE_API_KEY=your_runware_key

Run

npm run dev

Open http://localhost:3000.

Project Structure

src/
├── app/
│   ├── api/
│   │   ├── generate-content/   # Gemini streaming (summary + podcast script + quiz)
│   │   ├── generate-audio/     # ElevenLabs TTS with batched dual-voice
│   │   ├── generate-image/     # Runware AI image generation
│   │   ├── generate-annotations/ # Concept-to-image region mapping
│   │   └── chat/               # AI tutor chat endpoint
│   ├── layout.tsx
│   └── page.tsx                # Main app orchestration
├── components/
│   ├── AudioPlayer.tsx         # Web Audio API playback with seek
│   ├── SyncedDiagram.tsx       # Visual lesson synced to podcast
│   ├── PodcastPanel.tsx        # Transcript + audio + diagram
│   ├── SummaryPanel.tsx        # Key concepts display
│   ├── QuizPanel.tsx           # Interactive quiz with scoring
│   ├── ChatPanel.tsx           # AI tutor chat interface
│   ├── InputForm.tsx           # Text input with difficulty/language
│   ├── SplashScreen.tsx        # Animated launch screen
│   ├── Header.tsx              # App header
│   └── TabNavigation.tsx       # Tab switching
└── lib/
    ├── schemas.ts              # Zod schemas for all AI outputs
    └── mock-data.ts            # Development mock data

Key Features

  • Deterministic concept sync — Gemini generates conceptMapping indices at content creation time, so the visual lesson panel stays perfectly in sync with the podcast audio
  • Dual-voice podcast — Host and expert voices via ElevenLabs with crossfade transitions
  • Click-to-seek transcript — Click any line in the transcript to jump to that point in the audio
  • Streaming generation — Content streams in real-time as Gemini generates it
  • Difficulty levels — Beginner, Intermediate, and Advanced content generation
  • Multi-language support — Generate content in different languages

How It Works

  1. User pastes study material (50–1000 words) and picks a difficulty level
  2. Gemini 2.5 Flash streams a structured JSON response containing the summary, podcast script (with conceptMapping indices), and quiz — all in one pass
  3. ElevenLabs converts each script turn into audio (host and expert voices, batched in groups of 4)
  4. Runware generates a concept diagram based on the material
  5. The frontend stitches audio segments with crossfade, syncs the visual lesson panel to playback time, and renders everything across four tabs

About

An AI study tool that transforms dense text into dual-voice podcasts with synchronized visual diagrams and interactive quizzes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 97.4%
  • CSS 2.0%
  • JavaScript 0.6%