TanStack · joksas · May 26, 2026 · May 26, 2026 · May 27, 2026 · May 27, 2026
diff --git a/.changeset/feat-groq-ai-transcription.md b/.changeset/feat-groq-ai-transcription.md
@@ -0,0 +1,7 @@
+---
+'@tanstack/ai-groq': minor
+---
+
+Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible,
+but its transcription endpoint additionally accepts HTTP URLs as input, so this
+is implemented as a custom integration rather than going through the SDK.
diff --git a/docs/adapters/groq.md b/docs/adapters/groq.md
@@ -2,7 +2,7 @@
 title: Groq
 id: groq-adapter
 order: 6
-description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses — Llama and other open-weight models via @tanstack/ai-groq."
+description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses and Whisper transcription — Llama and other open-weight models via @tanstack/ai-groq."
 keywords:
   - tanstack ai
   - groq
@@ -11,9 +11,11 @@ keywords:
   - low latency
   - adapter
   - llm
+  - whisper
+  - transcription
 ---
 
-The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference.
+The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference and Whisper-based audio transcription.
 
 ## Installation
 
@@ -108,6 +110,32 @@ const stream = chat({
 });
 ```
 
+## Transcription
+
+Groq exposes Whisper-based speech-to-text via `groqTranscription()` and the `generateTranscription()` activity. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (forwarded directly to Groq without re-uploading).
+
+```typescript
+import { generateTranscription } from "@tanstack/ai";
+import { groqTranscription } from "@tanstack/ai-groq";
+
+const result = await generateTranscription({
+  adapter: groqTranscription("whisper-large-v3-turbo"),
+  audio: "https://example.com/recording.mp3",
+  language: "en",
+});
+
+console.log(result.text);
+
+// verbose_json (the default) populates language, duration, and timestamped segments
+for (const segment of result.segments ?? []) {
+  console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`);
+}
+```
+
+Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`. Supported `responseFormat` values: `json`, `text`, `verbose_json` (default). `srt` and `vtt` are not supported by Groq.
+
+See [Transcription](../media/transcription) for the full API.
+
 ## Model Options
 
 Groq supports various provider-specific options:
@@ -197,11 +225,14 @@ Creates a Groq chat adapter with an explicit API key.
 
 **Returns:** A Groq chat adapter instance.
 
+### `groqTranscription(model, config?)` / `createGroqTranscription(model, apiKey, config?)`
+
+Creates a Groq transcription (speech-to-text) adapter. The short form reads `GROQ_API_KEY` from the environment; the `create*` form takes an explicit API key. Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`.
+
 ## Limitations
 
 - **Text-to-Speech**: Groq does not currently expose a TTS adapter. Use OpenAI, Gemini, ElevenLabs, or fal for speech generation.
 - **Image Generation**: Groq does not support image generation. Use OpenAI, Gemini, or fal for image generation.
-- **Transcription**: Groq does not currently expose a transcription adapter through TanStack AI.
 
 ## Next Steps
 

diff --git a/docs/media/transcription.md b/docs/media/transcription.md
@@ -2,7 +2,7 @@
 title: Transcription
 id: transcription
 order: 4
-description: "Transcribe audio to text with OpenAI Whisper and GPT-4o-transcribe via TanStack AI's generateTranscription() API."
+description: "Transcribe audio to text with OpenAI Whisper, GPT-4o-transcribe, Groq Whisper, and fal.ai STT models via TanStack AI's generateTranscription() API."
 keywords:
   - tanstack ai
   - transcription
@@ -11,18 +11,21 @@ keywords:
   - whisper
   - generateTranscription
   - openai
+  - groq
+  - fal
 ---
 
 # Audio Transcription
 
-TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models.
+TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models, Groq's hosted Whisper models, and fal.ai STT models.
 
 ## Overview
 
 Audio transcription is handled by transcription adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI.
 
 Currently supported:
 - **OpenAI**: Whisper-1, GPT-4o-transcribe, GPT-4o-mini-transcribe
+- **Groq**: whisper-large-v3-turbo, whisper-large-v3
 - **fal.ai**: Whisper, Wizper, speech-to-text turbo, ElevenLabs speech-to-text
 
 ## Basic Usage
@@ -76,6 +79,31 @@ const result = await generateTranscription({
 })
 ```
 
+### Groq Transcription
+
+Groq hosts Whisper large-v3 and large-v3-turbo on its fast inference stack. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (which is forwarded to Groq without re-uploading).
+
+```typescript
+import { generateTranscription } from '@tanstack/ai'
+import { groqTranscription } from '@tanstack/ai-groq'
+
+const result = await generateTranscription({
+  adapter: groqTranscription('whisper-large-v3-turbo'),
+  audio: 'https://example.com/recording.mp3',
+  language: 'en',
+})
+
+console.log(result.text)
+console.log(result.language)
+
+// verbose_json is the default — segments carry segment-level start/end timestamps
+for (const segment of result.segments ?? []) {
+  console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
+}
+```
+
+> **Note:** Groq supports `responseFormat` values `json`, `text`, and `verbose_json` (default). `srt` and `vtt` are not supported — passing them throws. Provider-specific `modelOptions` are `temperature` and `timestamp_granularities` (`['word']`, `['segment']`, or both).
+
 ### fal.ai Transcription
 
 fal.ai offers Whisper, Wizper, and other STT models. The `audio` input accepts a URL, `File`, `Blob`, or `ArrayBuffer` (auto-wrapped in a `Blob`).
@@ -171,16 +199,18 @@ interface TranscriptionResult {
   text: string         // Full transcribed text
   language?: string    // Detected/specified language
   duration?: number    // Audio duration in seconds
-  segments?: Array<{   // Timestamped segments
+  segments?: Array<{   // Segment-level timestamps
+    id: number         // Segment identifier
     start: number      // Start time in seconds
     end: number        // End time in seconds
     text: string       // Segment text
-    words?: Array<{    // Word-level timestamps
-      word: string
-      start: number
-      end: number
-      confidence?: number
-    }>
+    confidence?: number // Confidence score (0-1)
+    speaker?: string   // Speaker identifier, if diarization is enabled
+  }>
+  words?: Array<{      // Word-level timestamps
+    word: string
+    start: number
+    end: number
   }>
 }
 ```