Skip to content

ayuxsh009/Audex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Audex - AI-Powered Audio Transcription

A full-stack web application that converts audio files and voice recordings to text using AI-powered transcription (Whisper Large V3 model via Groq API).

Audex React Spring Boot Vite Java

🌐 Live Demo


πŸ“– Project Overview

Audex is a full-stack web application that converts audio files and voice recordings into text using AI-powered transcription. It leverages the Whisper Large V3 model via Groq's ultra-fast inference API to deliver lightning-fast, highly accurate transcriptions in 28+ languages.


πŸ”§ How It Works

Architecture Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Upload   β”‚     β”‚  Spring Boot    β”‚     β”‚    Groq API     β”‚
β”‚   Audio File    │────▢│    Backend      │────▢│  (Whisper V3)   β”‚
β”‚   or Record     β”‚     β”‚                 β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚                       β”‚
                                β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚    Transcription JSON
                                β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  React Frontend β”‚
                        β”‚  Display Result β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step-by-Step Process

  1. Audio Input: User uploads an audio file (MP3, WAV, M4A, FLAC, OGG, WebM) OR records live audio using browser microphone
  2. Language Selection: User selects transcription language from 28 supported languages
  3. API Request: Frontend sends audio as multipart/form-data to Spring Boot backend
  4. Groq Processing: Backend forwards the audio to Groq's Whisper API for transcription
  5. Response: Transcribed text is returned and displayed with word/character stats
  6. Export: User can copy text or download as .txt, .doc, or .json

✨ Key Features

Feature Description
🎀 Live Recording Record audio directly in browser using MediaRecorder API
πŸ“ File Upload Drag & drop or click to upload audio files
🌍 28+ Languages English, Spanish, French, Hindi, Chinese, Arabic, etc.
πŸŒ™ Dark/Light Theme Toggle between themes with smooth transitions
πŸ“Š Statistics Word count, character count, language info
πŸ“‹ Multiple Export Copy to clipboard, download as TXT/DOC/JSON
🎨 Waveform Visualizer Real-time audio visualization during recording
⚑ Fast Processing Groq's inference is 10x faster than OpenAI

πŸ› οΈ Tech Stack

Frontend

Technology Purpose
React 19 UI library with hooks
Vite 6 Fast build tool & dev server
Axios HTTP client for API calls
FileSaver.js Client-side file downloads
CSS Variables Dark/Light theme system
Web Audio API Live recording & waveform visualization

Backend

Technology Purpose
Spring Boot 4.0 Java REST API framework
Java 21 Latest LTS version
Spring WebFlux Reactive HTTP client
RestTemplate Multipart file forwarding
Groq API AI transcription (Whisper Large V3)

Deployment

Service Component
Vercel Frontend hosting (React)
Render Backend hosting (Docker/Java)
GitHub Source control & CI/CD trigger

πŸš€ Novelty & Unique Aspects

1. Groq-Powered Speed

Unlike traditional OpenAI Whisper API, Audex uses Groq's LPU (Language Processing Unit) which provides:

  • ~10x faster inference than GPU-based solutions
  • Near real-time transcription
  • Free tier with generous limits

2. Full-Stack Java + React Architecture

  • Modern Spring Boot 4.0 with Java 21
  • Clean separation of concerns
  • Production-ready Docker deployment

3. Browser-Native Recording

  • Uses MediaRecorder API for recording
  • Web Audio API for real-time waveform visualization
  • No external dependencies for audio capture

4. Multi-Format Export

  • Plain text (.txt)
  • Microsoft Word compatible (.doc)
  • Structured JSON with metadata (.json)

5. Privacy-First Design

  • Audio files are processed and immediately deleted
  • No permanent storage on server
  • Secure HTTPS communication

πŸ“ Project Structure

Audex/
β”œβ”€β”€ Audex-Frontend/              # React + Vite
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx              # Main component
β”‚   β”‚   β”œβ”€β”€ AudioUploder.jsx     # Core transcription UI (700+ lines)
β”‚   β”‚   β”œβ”€β”€ App.css              # Theming & styles (1500+ lines)
β”‚   β”‚   └── index.css            # Global styles
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.js
β”‚
β”œβ”€β”€ Audex-Backend/               # Spring Boot
β”‚   β”œβ”€β”€ src/main/java/com/audio/transcribe/
β”‚   β”‚   β”œβ”€β”€ AudioTranscribeApplication.java
β”‚   β”‚   β”œβ”€β”€ TranscriptionController.java   # POST /api/transcribe
β”‚   β”‚   β”œβ”€β”€ WebConfig.java                 # CORS configuration
β”‚   β”‚   └── WebClientConfig.java
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── pom.xml
β”‚
└── README.md

πŸš€ Getting Started

Prerequisites

  • Node.js 18+
  • Java 21+
  • Maven 3.9+

Backend Setup

cd Audex-Backend
./mvnw spring-boot:run

The backend will start on http://localhost:8080

Frontend Setup

cd Audex-Frontend
npm install
npm run dev

The frontend will start on http://localhost:5173


πŸ”Œ API Endpoint

POST /api/transcribe

Request:

Content-Type: multipart/form-data
- file: (audio file)
- language: "en" | "es" | "hi" | ... (optional, default: "en")

Response:

{
  "text": "Transcribed audio content..."
}

🌍 Supported Languages

Language Code Language Code
English en Japanese ja
Spanish es Korean ko
French fr Chinese zh
German de Arabic ar
Italian it Hindi hi
Portuguese pt Turkish tr
Dutch nl Vietnamese vi
Polish pl Thai th
Russian ru Indonesian id
Swedish sv Ukrainian uk
Danish da Czech cs
Finnish fi Greek el
Norwegian no Hebrew he
Romanian ro Hungarian hu

🎯 Use Cases

  1. Students - Transcribe lectures and study materials
  2. Journalists - Convert interviews to text
  3. Content Creators - Generate subtitles and captions
  4. Accessibility - Make audio content accessible
  5. Researchers - Transcribe qualitative interviews
  6. Podcasters - Create show notes and transcripts

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         FRONTEND                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  React Components                                     β”‚  β”‚
β”‚  β”‚  β€’ Theme Toggle (Dark/Light)                         β”‚  β”‚
β”‚  β”‚  β€’ File Upload with Drag & Drop                      β”‚  β”‚
β”‚  β”‚  β€’ Live Recording with Waveform                      β”‚  β”‚
β”‚  β”‚  β€’ Language Selector (28 options)                    β”‚  β”‚
β”‚  β”‚  β€’ Result Display with Stats                         β”‚  β”‚
β”‚  β”‚  β€’ Export Options (Copy/Download)                    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                           β”‚                                 β”‚
β”‚                     Axios POST                              β”‚
β”‚                           β–Ό                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                      HTTPS Request
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         BACKEND                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Spring Boot REST Controller                          β”‚  β”‚
β”‚  β”‚  β€’ Receive multipart file                            β”‚  β”‚
β”‚  β”‚  β€’ Create temp file                                  β”‚  β”‚
β”‚  β”‚  β€’ Forward to Groq API                               β”‚  β”‚
β”‚  β”‚  β€’ Return transcription                              β”‚  β”‚
β”‚  β”‚  β€’ Delete temp file                                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                           β”‚                                 β”‚
β”‚                     RestTemplate                            β”‚
β”‚                           β–Ό                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                      HTTPS Request
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      GROQ API                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Whisper Large V3 Model                               β”‚  β”‚
β”‚  β”‚  β€’ 10x faster than GPU inference                     β”‚  β”‚
β”‚  β”‚  β€’ High accuracy transcription                       β”‚  β”‚
β”‚  β”‚  β€’ Multi-language support                            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🀝 Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is open source and available under the MIT License.


πŸ‘¨β€πŸ’» Author

Ayush Gupta


⭐ Star this repository if you found it helpful!

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors