Turn documents into audiobooks. Upload a PDF, DOCX, TXT, or Markdown file and get back a high-quality MP3 — powered by your choice of TTS provider.
- Multiple file formats — PDF (with OCR fallback for scanned docs), DOCX, TXT, Markdown
- 4 TTS providers — Edge TTS (free), Google TTS (free), OpenAI TTS, ElevenLabs
- Script formatting — cleans up extraction noise and optionally rewrites text for natural listening via OpenAI
- Async processing — upload and poll; no blocking the UI during long conversions
- Conversion history — play, download, or delete past conversions
| Requirement | Notes |
|---|---|
| Python 3.11+ | |
| Node.js 18+ | |
| PostgreSQL | Any recent version |
| ffmpeg + ffprobe | Must be on PATH — used for audio stitching and duration |
| Poppler | Optional — only needed for scanned/image-based PDFs |
| Tesseract OCR | Optional — only needed for scanned/image-based PDFs |
git clone https://github.com/Otitodev/amara.git
cd amara/backend
python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
pip install -r requirements.txtcreatedb amaracp .env.example .env
# Edit .env — at minimum set DATABASE_URLuvicorn main:app --reload --port 8000The server creates the database tables automatically on first start.
cd ../frontend
npm install
npm run devOpen http://localhost:5173.
| Provider | Free | API Key | Quality | Notes |
|---|---|---|---|---|
edge |
Yes | No | Good | Microsoft Edge TTS — default |
gtts |
Yes | No | Basic | Google Translate TTS |
openai |
No | OPENAI_API_KEY |
Excellent | OpenAI tts-1 model |
elevenlabs |
10k chars/mo | ELEVENLABS_API_KEY |
Best | Multilingual, natural voices |
Set the default in .env with TTS_PROVIDER=edge. Users can override per-upload in the UI.
All settings live in .env (copy from .env.example):
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://postgres:postgres@localhost:5432/amara |
PostgreSQL connection string |
TTS_PROVIDER |
edge |
Default TTS provider (edge, gtts, openai, elevenlabs) |
TTS_VOICE |
en-US-AriaNeural |
Default voice for the configured provider |
OPENAI_API_KEY |
(empty) | Required for openai TTS and OpenAI script formatting |
ELEVENLABS_API_KEY |
(empty) | Required for elevenlabs TTS |
SCRIPT_MODE |
audiobook |
Default script mode (audiobook or faithful) |
SCRIPT_FORMATTER_PROVIDER |
openai |
openai uses LLM rewriting; anything else uses local regex cleanup |
SCRIPT_FORMATTER_MODEL |
gpt-4o-mini |
OpenAI model used for script formatting |
MAX_FILE_SIZE_MB |
20 |
Upload size limit |
AUDIO_DIR |
backend/audio_files |
Where generated MP3s are stored |
amara/
├── backend/
│ ├── main.py # FastAPI app
│ ├── config.py # Settings (pydantic-settings)
│ ├── models.py # SQLAlchemy Conversion model
│ ├── schemas.py # Pydantic response schemas
│ ├── routers/
│ │ ├── uploads.py # POST /api/upload
│ │ └── jobs.py # GET/DELETE /api/jobs, GET /api/audio
│ ├── services/
│ │ ├── extractor.py # PDF / DOCX / text extraction
│ │ ├── pipeline.py # Orchestrates extract → format → TTS → store
│ │ ├── storage.py # Local disk audio storage
│ │ ├── formatter/ # Script cleanup / OpenAI rewriting
│ │ └── tts/ # TTS provider implementations
│ ├── migrations/ # SQL migrations (auto-applied on startup)
│ └── tests/
├── frontend/
│ └── src/
│ ├── api.js
│ ├── App.jsx
│ └── components/
│ ├── UploadForm.jsx
│ ├── JobStatus.jsx
│ ├── ConversionHistory.jsx
│ └── AudioPlayer.jsx
├── .env.example
└── README.md
# Backend tests
cd backend
python -m unittest discover tests
# Lint / format
ruff check .
ruff format .
# Frontend lint
cd frontend
npm run lintPull requests are welcome. For large changes, open an issue first to discuss the approach. Please run the backend tests and linter before submitting.