A comprehensive web application that combines speech-to-text functionality with intelligent grocery list management. Built with React, TypeScript, and powered by OpenAI's GPT models for natural language processing.
(A GIF of the application in action would be a great addition here!)
- β Real-time speech transcription with auto-restart capability
- β Intelligent grocery list management using natural language
- β Recipe ingredient extraction from text and URLs
- β Persistent storage - your grocery list survives server restarts
- β Automatic backup system - protects against data loss
- β Manual text editing and clearing
- β Copy transcribed text to clipboard (with fallback and status)
- β Modern, responsive user interface
- π§ AI-powered grocery list processing - understands natural language like "add milk", "remove bread", "take out the last item"
- π Recipe parsing - paste recipe text and automatically extract ingredients
- π URL recipe parsing - paste recipe URLs and extract ingredients automatically
- π Smart duplicate prevention - avoids adding duplicate items to your list
- πΎ Persistent data storage - file-based storage that can be easily migrated to a database
- π‘οΈ Automatic backups - creates timestamped backups before each save operation
- π Backup restoration - restore from any previous backup via API
- ποΈ Auto-restart recording - configurable option to keep recording continuously
- Uses the browser's built-in Web Speech API (
SpeechRecognition
) for real-time speech transcription - React with TypeScript for type-safe, maintainable code
- Custom React Hook (
useSpeechRecognition
) manages speech recognition state and logic - All audio processing happens directly in the browser for privacy and speed
- Express.js API server with CORS support
- OpenAI GPT-4o-mini integration for natural language understanding
- File-based persistent storage (easily replaceable with database)
- Web scraping capabilities for recipe URL parsing using Cheerio
- Grocery List Processing: Converts natural language into structured grocery list updates
- Recipe Parsing: Extracts and normalizes ingredients from recipe text
- URL Recipe Parsing: Fetches and parses recipes from web URLs
- Smart Deduplication: Prevents duplicate items using fuzzy matching
- Framework: React 19 with TypeScript
- Build Tool: Vite
- Testing: Vitest & React Testing Library
- Styling: CSS Modules
- Runtime: Node.js with Express.js
- AI: OpenAI GPT-4o-mini via LangChain
- Storage: File-based JSON storage (fs-extra)
- Web Scraping: Cheerio for HTML parsing
- Testing: Supertest for API testing
The Web Speech API is currently a non-standard technology. This application is fully functional in Chromium-based browsers:
- β Google Chrome
- β Microsoft Edge
- β Mozilla Firefox (Not Supported)
- β Apple Safari (Not Supported)
A future update will include a cloud-based fallback to support all browsers.
- Node.js 18+
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd speech-to-text-react
-
Install dependencies
npm install
-
Set up environment variables Create a
.env
file in the project root:OPENAI_API_KEY=your_openai_api_key_here PORT=8787
-
Start the backend server
npm run server
-
Start the frontend development server (in a new terminal)
npm run dev
-
Open your browser Navigate to
http://localhost:5173
(or the port shown in your terminal)
npm run dev
- Starts the frontend development servernpm run server
- Starts the backend API servernpm run build
- Builds the app for production
npm run test
- Runs all tests in watch modenpm run test:run
- Runs all tests oncenpm run test:ui
- Opens the Vitest UI for interactive testingnpm run test:coverage
- Runs tests with coverage report
npm run lint
- Runs ESLintnpm run format
- Formats code with Prettier
The application includes comprehensive test coverage:
- Unit Tests: Individual component and hook testing
- Integration Tests: End-to-end workflow testing
- API Tests: Backend endpoint testing with persistent storage
- Component Tests: UI interaction and state management testing
Run tests with:
npm run test:run
speech-to-text-react/
βββ src/
β βββ components/ # React components
β β βββ GroceryPane.tsx # Grocery list management
β β βββ TranscriptionPad.tsx # Speech-to-text interface
β βββ hooks/ # Custom React hooks
β β βββ useSpeechRecognition.ts
β βββ __tests__/ # Frontend tests
β βββ App.tsx # Main application component
βββ server/
β βββ index.ts # Express API server
β βββ storage.ts # Persistent storage abstraction
β βββ backup.ts # Backup management system
β βββ __tests__/ # Backend tests
βββ data/ # Persistent storage directory
β βββ grocery-list.json # Main grocery list data
β βββ backups/ # Automatic backup files
βββ public/ # Static assets
OPENAI_API_KEY
- Required for AI-powered featuresPORT
- Backend server port (default: 8787)
The app uses file-based storage by default, stored in data/grocery-list.json
. This can be easily replaced with a database by implementing the GroceryListStorage
interface in server/storage.ts
.
npm run build
Ensure your production environment has:
- Node.js 18+
- OpenAI API key configured
- Port 8787 available (or configure different port)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Run tests (
npm run test:run
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
POST /api/grocery
- Process natural language grocery instructionsGET /api/grocery
- Retrieve current grocery listDELETE /api/grocery
- Clear the grocery list
POST /api/recipe
- Extract ingredients from recipe textPOST /api/recipe-url
- Extract ingredients from recipe URL
GET /api/backups
- List all available backupsPOST /api/backups/create
- Create a manual backupPOST /api/backups/restore/:backupName
- Restore from a specific backup
- Audio Processing: All speech recognition happens in your browser - audio never leaves your device
- Data Storage: Grocery lists are stored locally on your server
- API Keys: OpenAI API key is only used for text processing, not audio
Issue | Severity | Status |
---|---|---|
No browser fallback for non-Chromium browsers | Medium | Planned |
No accessibility (a11y) review | Medium | Planned |
No mobile device testing | Low | Planned |
No error boundary for React | Low | Planned |
No cloud-based fallback for speech recognition | Medium | Planned |
No privacy policy or data handling notice | Low | Planned |
Legend: High = blocks core use; Medium = impacts key features or UX; Low = minor or edge-case issue.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ using React, TypeScript, and OpenAI