Skip to content

brantmerrell/hi

Repository files navigation

Hindi Language Learning App

A sentence-by-sentence reader of Hindi text from Wikisource, displaying each sentence in four simultaneous layers: Devanagari script, Roman transliteration, word-for-word gloss, and English translation. Includes audio playback, per-user bookmarks, and reading statistics.

See PROJECT.md for full design rationale and architecture decisions.


Prerequisites

  • Docker (for PostgreSQL)
  • Python 3.10+
  • Node.js 18+

External Services

Three services are used by the pipeline (one-time setup to populate the database):

Azure Translator — translates sentences and produces word-level alignment. Sign up at portal.azure.com, create a Translator resource, and configure:

AZURE_TRANSLATOR_KEY=<key>
AZURE_TRANSLATOR_REGION=eastus

Google Cloud Text-to-Speech — generates Hindi pronunciation audio. Sign up at console.cloud.google.com, enable the Cloud Text-to-Speech API, and configure:

GOOGLE_CLOUD_API_KEY=<key>

AWS SES — sends magic link authentication emails. Create an AWS account, configure SES, and set:

AWS_ACCESS_KEY_ID=<key>
AWS_SECRET_ACCESS_KEY=<secret>
AWS_SES_REGION=us-east-1
FROM_EMAIL=noreply@yourdomain.com

To use a custom domain (recommended over Gmail):

# Verify your domain — creates DKIM CNAME records to add at your DNS provider
AWS_PROFILE=<profile> aws sesv2 create-email-identity --email-identity yourdomain.com --dkim-signing-attributes SigningAttributesOrigin=AWS_SES --region us-east-1

# Check DKIM verification status (wait ~24h after adding DNS records)
AWS_PROFILE=<profile> aws sesv2 get-email-identity --email-identity yourdomain.com --region us-east-1 --query 'DkimAttributes.Status' --output text

# Request production access (removes sandbox sending restrictions)
AWS_PROFILE=<profile> aws sesv2 put-account-details \
  --mail-type TRANSACTIONAL \
  --website-url https://yourdomain.com \
  --use-case-description "Magic link authentication for registered users" \
  --contact-language EN \
  --region us-east-1

Getting Started

1. Start PostgreSQL

docker compose up -d

2. Backend

From the backend directory:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — set SECRET_KEY and AWS SES credentials at minimum
alembic upgrade head
python -m uvicorn app.main:app --reload

API runs at http://localhost:8000

3. Frontend

From the frontend directory:

npm install
npm run dev

App runs at http://localhost:5173

4. Pipeline (loading text content)

From the pipeline directory with the backend .venv active:

# Step 1 — fetch raw text from Wikisource
python fetch_text.py "सप्तसरोज/नमक का दारोगा"

# Step 2 — segment, translate, insert sentences
python process_sentences.py <slug>

# Step 3 — enrich each word with dictionary-level gloss
python enrich_glosses.py

# Step 4 — generate and store audio
python generate_audio.py

Steps 2–4 produce sentence-level English translations, word-level alignments, dictionary definitions, and audio files.

After generating audio, upload to S3 for production serving:

# Upload audio files to S3 (public-read)
aws s3 sync data/audio/ s3://<your-bucket>/audio/ --acl public-read --profile <profile>

Set AUDIO_S3_URL=https://<your-bucket>.s3.amazonaws.com in both local .env and Heroku config. The backend will redirect /audio/<path> to S3 when this variable is set, or serve the local file as a fallback.


Deployment

Backend (Heroku)

# Create Heroku app (first time only)
heroku create hi-api

# Add PostgreSQL database
heroku addons:create heroku-postgresql:essential-0 -a hi-api

# Configure environment variables from your .env
heroku config:set AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -a hi-api
heroku config:set AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -a hi-api
heroku config:set FROM_EMAIL=$FROM_EMAIL -a hi-api
heroku config:set AZURE_TRANSLATOR_KEY=$AZURE_TRANSLATOR_KEY -a hi-api
heroku config:set GOOGLE_CLOUD_API_KEY=$GOOGLE_CLOUD_API_KEY -a hi-api
heroku config:set FRONTEND_URL=https://hi.jbm.eco -a hi-api
heroku config:set AWS_SES_REGION=us-east-1 -a hi-api
heroku config:set AUDIO_S3_URL=https://<your-bucket>.s3.amazonaws.com -a hi-api

# Deploy
git push heroku main

# Run database migrations
heroku run "cd backend && alembic upgrade head" -a hi-api

# Enable automatic SSL certificate management
heroku certs:auto:enable -a hi-api

# Set custom domain
heroku domains:add hi-api.jbm.eco -a hi-api

Update your DNS provider to point hi-api.jbm.eco to the DNS target shown by heroku domains -a hi-api.

To migrate data from a local Docker-based PostgreSQL to Heroku:

docker exec hi-db-1 pg_dump -U postgres hindi_app | psql $(heroku config:get DATABASE_URL -a hi-api)

Frontend (GitHub Pages)

The frontend automatically deploys via GitHub Actions when you push to main. The workflow:

  1. Builds the React app with VITE_API_URL=https://hi-api.jbm.eco
  2. Deploys the frontend/dist directory to GitHub Pages

To set up the custom domain for GitHub Pages:

  1. Go to repository Settings → Pages → Source → GitHub Actions (not "Deploy from a branch")
  2. Set Custom domain to hi.jbm.eco
  3. Update your DNS provider to point hi.jbm.eco to GitHub Pages (CNAME to brantmerrell.github.io)

Project Structure

hi/
├── backend/
│   ├── app/
│   │   ├── models.py        — SQLAlchemy models (all tables)
│   │   ├── schemas.py       — Pydantic request/response schemas
│   │   ├── database.py      — async engine and session dependency
│   │   ├── main.py          — FastAPI app, CORS, router registration
│   │   └── routes/
│   │       ├── auth.py      — magic link request + verify endpoints
│   │       ├── stories.py   — list stories, list sentences
│   │       ├── sentences.py — get single sentence with word alignment
│   │       └── stats.py     — reading statistics for current user
│   ├── alembic/             — database migrations
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── SentenceView.tsx  — four-layer sentence display
│       │   ├── WordGloss.tsx     — word-by-word toggle view
│       │   ├── GlossCell.tsx     — inline-editable word gloss with override support
│       │   ├── Navigation.tsx    — previous / next sentence
│       │   └── AudioPlayer.tsx   — Hindi pronunciation playback
│       └── pages/
│           ├── Reader.tsx        — main reading page
│           ├── Stats.tsx         — reading statistics with gloss override editing
│           └── Auth.tsx          — magic link email entry
├── pipeline/
│   ├── fetch_text.py        — fetch Premchand stories from Wikisource / Internet Archive
│   ├── process_sentences.py — segment, translate (sentence-level), insert sentences + word alignment
│   ├── enrich_glosses.py    — per-word dictionary translation; populates lemmas + word_senses
│   └── generate_audio.py    — Google Cloud TTS; saves MP3s and updates sentences.audio_path
├── docker-compose.yml       — PostgreSQL 16
├── PROJECT.md               — design rationale and architecture
└── characters.md            — Devanagari character reference for the developer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors