Hindi Language Learning App

A sentence-by-sentence reader of Hindi text from Wikisource, displaying each sentence in four simultaneous layers: Devanagari script, Roman transliteration, word-for-word gloss, and English translation. Includes audio playback, per-user bookmarks, and reading statistics.

See PROJECT.md for full design rationale and architecture decisions.

Prerequisites

Docker (for PostgreSQL)
Python 3.10+
Node.js 18+

External Services

Three services are used by the pipeline (one-time setup to populate the database):

Azure Translator — translates sentences and produces word-level alignment. Sign up at portal.azure.com, create a Translator resource, and configure:

AZURE_TRANSLATOR_KEY=<key>
AZURE_TRANSLATOR_REGION=eastus

Google Cloud Text-to-Speech — generates Hindi pronunciation audio. Sign up at console.cloud.google.com, enable the Cloud Text-to-Speech API, and configure:

GOOGLE_CLOUD_API_KEY=<key>

AWS SES — sends magic link authentication emails. Create an AWS account, configure SES, and set:

AWS_ACCESS_KEY_ID=<key>
AWS_SECRET_ACCESS_KEY=<secret>
AWS_SES_REGION=us-east-1
FROM_EMAIL=noreply@yourdomain.com

To use a custom domain (recommended over Gmail):

# Verify your domain — creates DKIM CNAME records to add at your DNS provider
AWS_PROFILE=<profile> aws sesv2 create-email-identity --email-identity yourdomain.com --dkim-signing-attributes SigningAttributesOrigin=AWS_SES --region us-east-1

# Check DKIM verification status (wait ~24h after adding DNS records)
AWS_PROFILE=<profile> aws sesv2 get-email-identity --email-identity yourdomain.com --region us-east-1 --query 'DkimAttributes.Status' --output text

# Request production access (removes sandbox sending restrictions)
AWS_PROFILE=<profile> aws sesv2 put-account-details \
  --mail-type TRANSACTIONAL \
  --website-url https://yourdomain.com \
  --use-case-description "Magic link authentication for registered users" \
  --contact-language EN \
  --region us-east-1

Getting Started

1. Start PostgreSQL

docker compose up -d

2. Backend

From the backend directory:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — set SECRET_KEY and AWS SES credentials at minimum
alembic upgrade head
python -m uvicorn app.main:app --reload

API runs at http://localhost:8000

3. Frontend

From the frontend directory:

npm install
npm run dev

App runs at http://localhost:5173

4. Pipeline (loading text content)

From the pipeline directory with the backend .venv active:

# Step 1 — fetch raw text from Wikisource
python fetch_text.py "सप्तसरोज/नमक का दारोगा"

# Step 2 — segment, translate, insert sentences
python process_sentences.py <slug>

# Step 3 — enrich each word with dictionary-level gloss
python enrich_glosses.py

# Step 4 — generate and store audio
python generate_audio.py

Steps 2–4 produce sentence-level English translations, word-level alignments, dictionary definitions, and audio files.

After generating audio, upload to S3 for production serving:

# Upload audio files to S3 (public-read)
aws s3 sync data/audio/ s3://<your-bucket>/audio/ --acl public-read --profile <profile>

Set AUDIO_S3_URL=https://<your-bucket>.s3.amazonaws.com in both local .env and Heroku config. The backend will redirect /audio/<path> to S3 when this variable is set, or serve the local file as a fallback.

Deployment

Backend (Heroku)

# Create Heroku app (first time only)
heroku create hi-api

# Add PostgreSQL database
heroku addons:create heroku-postgresql:essential-0 -a hi-api

# Configure environment variables from your .env
heroku config:set AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -a hi-api
heroku config:set AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -a hi-api
heroku config:set FROM_EMAIL=$FROM_EMAIL -a hi-api
heroku config:set AZURE_TRANSLATOR_KEY=$AZURE_TRANSLATOR_KEY -a hi-api
heroku config:set GOOGLE_CLOUD_API_KEY=$GOOGLE_CLOUD_API_KEY -a hi-api
heroku config:set FRONTEND_URL=https://hi.jbm.eco -a hi-api
heroku config:set AWS_SES_REGION=us-east-1 -a hi-api
heroku config:set AUDIO_S3_URL=https://<your-bucket>.s3.amazonaws.com -a hi-api

# Deploy
git push heroku main

# Run database migrations
heroku run "cd backend && alembic upgrade head" -a hi-api

# Enable automatic SSL certificate management
heroku certs:auto:enable -a hi-api

# Set custom domain
heroku domains:add hi-api.jbm.eco -a hi-api

Update your DNS provider to point hi-api.jbm.eco to the DNS target shown by heroku domains -a hi-api.

To migrate data from a local Docker-based PostgreSQL to Heroku:

docker exec hi-db-1 pg_dump -U postgres hindi_app | psql $(heroku config:get DATABASE_URL -a hi-api)

Frontend (GitHub Pages)

The frontend automatically deploys via GitHub Actions when you push to main. The workflow:

Builds the React app with VITE_API_URL=https://hi-api.jbm.eco
Deploys the frontend/dist directory to GitHub Pages

To set up the custom domain for GitHub Pages:

Go to repository Settings → Pages → Source → GitHub Actions (not "Deploy from a branch")
Set Custom domain to hi.jbm.eco
Update your DNS provider to point hi.jbm.eco to GitHub Pages (CNAME to brantmerrell.github.io)

Project Structure

hi/
├── backend/
│   ├── app/
│   │   ├── models.py        — SQLAlchemy models (all tables)
│   │   ├── schemas.py       — Pydantic request/response schemas
│   │   ├── database.py      — async engine and session dependency
│   │   ├── main.py          — FastAPI app, CORS, router registration
│   │   └── routes/
│   │       ├── auth.py      — magic link request + verify endpoints
│   │       ├── stories.py   — list stories, list sentences
│   │       ├── sentences.py — get single sentence with word alignment
│   │       └── stats.py     — reading statistics for current user
│   ├── alembic/             — database migrations
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── SentenceView.tsx  — four-layer sentence display
│       │   ├── WordGloss.tsx     — word-by-word toggle view
│       │   ├── GlossCell.tsx     — inline-editable word gloss with override support
│       │   ├── Navigation.tsx    — previous / next sentence
│       │   └── AudioPlayer.tsx   — Hindi pronunciation playback
│       └── pages/
│           ├── Reader.tsx        — main reading page
│           ├── Stats.tsx         — reading statistics with gloss override editing
│           └── Auth.tsx          — magic link email entry
├── pipeline/
│   ├── fetch_text.py        — fetch Premchand stories from Wikisource / Internet Archive
│   ├── process_sentences.py — segment, translate (sentence-level), insert sentences + word alignment
│   ├── enrich_glosses.py    — per-word dictionary translation; populates lemmas + word_senses
│   └── generate_audio.py    — Google Cloud TTS; saves MP3s and updates sentences.audio_path
├── docker-compose.yml       — PostgreSQL 16
├── PROJECT.md               — design rationale and architecture
└── characters.md            — Devanagari character reference for the developer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hindi Language Learning App

Prerequisites

External Services

Getting Started

1. Start PostgreSQL

2. Backend

3. Frontend

4. Pipeline (loading text content)

Deployment

Backend (Heroku)

Frontend (GitHub Pages)

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
pipeline		pipeline
.gitignore		.gitignore
CNAME		CNAME
PROJECT.md		PROJECT.md
Procfile		Procfile
README.md		README.md
characters.md		characters.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Folders and files

Latest commit

History

Repository files navigation

Hindi Language Learning App

Prerequisites

External Services

Getting Started

1. Start PostgreSQL

2. Backend

3. Frontend

4. Pipeline (loading text content)

Deployment

Backend (Heroku)

Frontend (GitHub Pages)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages