Skip to content

Otitodev/Amara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amara

Turn documents into audiobooks. Upload a PDF, DOCX, TXT, or Markdown file and get back a high-quality MP3 — powered by your choice of TTS provider.

License: MIT


Features

  • Multiple file formats — PDF (with OCR fallback for scanned docs), DOCX, TXT, Markdown
  • 4 TTS providers — Edge TTS (free), Google TTS (free), OpenAI TTS, ElevenLabs
  • Script formatting — cleans up extraction noise and optionally rewrites text for natural listening via OpenAI
  • Async processing — upload and poll; no blocking the UI during long conversions
  • Conversion history — play, download, or delete past conversions

Prerequisites

Requirement Notes
Python 3.11+
Node.js 18+
PostgreSQL Any recent version
ffmpeg + ffprobe Must be on PATH — used for audio stitching and duration
Poppler Optional — only needed for scanned/image-based PDFs
Tesseract OCR Optional — only needed for scanned/image-based PDFs

Quick Start

1. Clone and set up the backend

git clone https://github.com/Otitodev/amara.git
cd amara/backend

python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

pip install -r requirements.txt

2. Create the database

createdb amara

3. Configure environment

cp .env.example .env
# Edit .env — at minimum set DATABASE_URL

4. Start the backend

uvicorn main:app --reload --port 8000

The server creates the database tables automatically on first start.

5. Start the frontend

cd ../frontend
npm install
npm run dev

Open http://localhost:5173.


TTS Providers

Provider Free API Key Quality Notes
edge Yes No Good Microsoft Edge TTS — default
gtts Yes No Basic Google Translate TTS
openai No OPENAI_API_KEY Excellent OpenAI tts-1 model
elevenlabs 10k chars/mo ELEVENLABS_API_KEY Best Multilingual, natural voices

Set the default in .env with TTS_PROVIDER=edge. Users can override per-upload in the UI.


Configuration

All settings live in .env (copy from .env.example):

Variable Default Description
DATABASE_URL postgresql://postgres:postgres@localhost:5432/amara PostgreSQL connection string
TTS_PROVIDER edge Default TTS provider (edge, gtts, openai, elevenlabs)
TTS_VOICE en-US-AriaNeural Default voice for the configured provider
OPENAI_API_KEY (empty) Required for openai TTS and OpenAI script formatting
ELEVENLABS_API_KEY (empty) Required for elevenlabs TTS
SCRIPT_MODE audiobook Default script mode (audiobook or faithful)
SCRIPT_FORMATTER_PROVIDER openai openai uses LLM rewriting; anything else uses local regex cleanup
SCRIPT_FORMATTER_MODEL gpt-4o-mini OpenAI model used for script formatting
MAX_FILE_SIZE_MB 20 Upload size limit
AUDIO_DIR backend/audio_files Where generated MP3s are stored

Project Structure

amara/
├── backend/
│   ├── main.py                  # FastAPI app
│   ├── config.py                # Settings (pydantic-settings)
│   ├── models.py                # SQLAlchemy Conversion model
│   ├── schemas.py               # Pydantic response schemas
│   ├── routers/
│   │   ├── uploads.py           # POST /api/upload
│   │   └── jobs.py              # GET/DELETE /api/jobs, GET /api/audio
│   ├── services/
│   │   ├── extractor.py         # PDF / DOCX / text extraction
│   │   ├── pipeline.py          # Orchestrates extract → format → TTS → store
│   │   ├── storage.py           # Local disk audio storage
│   │   ├── formatter/           # Script cleanup / OpenAI rewriting
│   │   └── tts/                 # TTS provider implementations
│   ├── migrations/              # SQL migrations (auto-applied on startup)
│   └── tests/
├── frontend/
│   └── src/
│       ├── api.js
│       ├── App.jsx
│       └── components/
│           ├── UploadForm.jsx
│           ├── JobStatus.jsx
│           ├── ConversionHistory.jsx
│           └── AudioPlayer.jsx
├── .env.example
└── README.md

Development

# Backend tests
cd backend
python -m unittest discover tests

# Lint / format
ruff check .
ruff format .

# Frontend lint
cd frontend
npm run lint

Contributing

Pull requests are welcome. For large changes, open an issue first to discuss the approach. Please run the backend tests and linter before submitting.


License

MIT

About

documents into audiobooks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors