A full-stack web application that converts audio files and voice recordings to text using AI-powered transcription (Whisper Large V3 model via Groq API).
- Frontend: audexai.vercel.app
- Backend: audex-backend.onrender.com
Audex is a full-stack web application that converts audio files and voice recordings into text using AI-powered transcription. It leverages the Whisper Large V3 model via Groq's ultra-fast inference API to deliver lightning-fast, highly accurate transcriptions in 28+ languages.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Upload β β Spring Boot β β Groq API β
β Audio File ββββββΆβ Backend ββββββΆβ (Whisper V3) β
β or Record β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βββββββββββββββββββββββββ
β Transcription JSON
βΌ
βββββββββββββββββββ
β React Frontend β
β Display Result β
βββββββββββββββββββ
- Audio Input: User uploads an audio file (MP3, WAV, M4A, FLAC, OGG, WebM) OR records live audio using browser microphone
- Language Selection: User selects transcription language from 28 supported languages
- API Request: Frontend sends audio as
multipart/form-datato Spring Boot backend - Groq Processing: Backend forwards the audio to Groq's Whisper API for transcription
- Response: Transcribed text is returned and displayed with word/character stats
- Export: User can copy text or download as
.txt,.doc, or.json
| Feature | Description |
|---|---|
| π€ Live Recording | Record audio directly in browser using MediaRecorder API |
| π File Upload | Drag & drop or click to upload audio files |
| π 28+ Languages | English, Spanish, French, Hindi, Chinese, Arabic, etc. |
| π Dark/Light Theme | Toggle between themes with smooth transitions |
| π Statistics | Word count, character count, language info |
| π Multiple Export | Copy to clipboard, download as TXT/DOC/JSON |
| π¨ Waveform Visualizer | Real-time audio visualization during recording |
| β‘ Fast Processing | Groq's inference is 10x faster than OpenAI |
| Technology | Purpose |
|---|---|
| React 19 | UI library with hooks |
| Vite 6 | Fast build tool & dev server |
| Axios | HTTP client for API calls |
| FileSaver.js | Client-side file downloads |
| CSS Variables | Dark/Light theme system |
| Web Audio API | Live recording & waveform visualization |
| Technology | Purpose |
|---|---|
| Spring Boot 4.0 | Java REST API framework |
| Java 21 | Latest LTS version |
| Spring WebFlux | Reactive HTTP client |
| RestTemplate | Multipart file forwarding |
| Groq API | AI transcription (Whisper Large V3) |
| Service | Component |
|---|---|
| Vercel | Frontend hosting (React) |
| Render | Backend hosting (Docker/Java) |
| GitHub | Source control & CI/CD trigger |
Unlike traditional OpenAI Whisper API, Audex uses Groq's LPU (Language Processing Unit) which provides:
- ~10x faster inference than GPU-based solutions
- Near real-time transcription
- Free tier with generous limits
- Modern Spring Boot 4.0 with Java 21
- Clean separation of concerns
- Production-ready Docker deployment
- Uses MediaRecorder API for recording
- Web Audio API for real-time waveform visualization
- No external dependencies for audio capture
- Plain text (
.txt) - Microsoft Word compatible (
.doc) - Structured JSON with metadata (
.json)
- Audio files are processed and immediately deleted
- No permanent storage on server
- Secure HTTPS communication
Audex/
βββ Audex-Frontend/ # React + Vite
β βββ src/
β β βββ App.jsx # Main component
β β βββ AudioUploder.jsx # Core transcription UI (700+ lines)
β β βββ App.css # Theming & styles (1500+ lines)
β β βββ index.css # Global styles
β βββ package.json
β βββ vite.config.js
β
βββ Audex-Backend/ # Spring Boot
β βββ src/main/java/com/audio/transcribe/
β β βββ AudioTranscribeApplication.java
β β βββ TranscriptionController.java # POST /api/transcribe
β β βββ WebConfig.java # CORS configuration
β β βββ WebClientConfig.java
β βββ Dockerfile
β βββ pom.xml
β
βββ README.md
- Node.js 18+
- Java 21+
- Maven 3.9+
cd Audex-Backend
./mvnw spring-boot:runThe backend will start on http://localhost:8080
cd Audex-Frontend
npm install
npm run devThe frontend will start on http://localhost:5173
Request:
Content-Type: multipart/form-data
- file: (audio file)
- language: "en" | "es" | "hi" | ... (optional, default: "en")
Response:
{
"text": "Transcribed audio content..."
}| Language | Code | Language | Code |
|---|---|---|---|
| English | en | Japanese | ja |
| Spanish | es | Korean | ko |
| French | fr | Chinese | zh |
| German | de | Arabic | ar |
| Italian | it | Hindi | hi |
| Portuguese | pt | Turkish | tr |
| Dutch | nl | Vietnamese | vi |
| Polish | pl | Thai | th |
| Russian | ru | Indonesian | id |
| Swedish | sv | Ukrainian | uk |
| Danish | da | Czech | cs |
| Finnish | fi | Greek | el |
| Norwegian | no | Hebrew | he |
| Romanian | ro | Hungarian | hu |
- Students - Transcribe lectures and study materials
- Journalists - Convert interviews to text
- Content Creators - Generate subtitles and captions
- Accessibility - Make audio content accessible
- Researchers - Transcribe qualitative interviews
- Podcasters - Create show notes and transcripts
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β React Components β β
β β β’ Theme Toggle (Dark/Light) β β
β β β’ File Upload with Drag & Drop β β
β β β’ Live Recording with Waveform β β
β β β’ Language Selector (28 options) β β
β β β’ Result Display with Stats β β
β β β’ Export Options (Copy/Download) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β Axios POST β
β βΌ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
HTTPS Request
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Spring Boot REST Controller β β
β β β’ Receive multipart file β β
β β β’ Create temp file β β
β β β’ Forward to Groq API β β
β β β’ Return transcription β β
β β β’ Delete temp file β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β RestTemplate β
β βΌ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
HTTPS Request
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GROQ API β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Whisper Large V3 Model β β
β β β’ 10x faster than GPU inference β β
β β β’ High accuracy transcription β β
β β β’ Multi-language support β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Contributions, issues, and feature requests are welcome!
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
Ayush Gupta
- GitHub: @ayuxsh009
β Star this repository if you found it helpful!