Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/feat-groq-ai-transcription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@tanstack/ai-groq': minor
---

Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible,
but its transcription endpoint additionally accepts HTTP URLs as input, so this
is implemented as a custom integration rather than going through the SDK.
37 changes: 34 additions & 3 deletions docs/adapters/groq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Groq
id: groq-adapter
order: 6
description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses — Llama and other open-weight models via @tanstack/ai-groq."
description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses and Whisper transcription — Llama and other open-weight models via @tanstack/ai-groq."
keywords:
- tanstack ai
- groq
Expand All @@ -11,9 +11,11 @@ keywords:
- low latency
- adapter
- llm
- whisper
- transcription
---

The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference.
The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference and Whisper-based audio transcription.

## Installation

Expand Down Expand Up @@ -108,6 +110,32 @@ const stream = chat({
});
```

## Transcription

Groq exposes Whisper-based speech-to-text via `groqTranscription()` and the `generateTranscription()` activity. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (forwarded directly to Groq without re-uploading).

```typescript
import { generateTranscription } from "@tanstack/ai";
import { groqTranscription } from "@tanstack/ai-groq";

const result = await generateTranscription({
adapter: groqTranscription("whisper-large-v3-turbo"),
audio: "https://example.com/recording.mp3",
language: "en",
});

console.log(result.text);

// verbose_json (the default) populates language, duration, and timestamped segments
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`);
}
```

Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`. Supported `responseFormat` values: `json`, `text`, `verbose_json` (default). `srt` and `vtt` are not supported by Groq.

See [Transcription](../media/transcription) for the full API.

## Model Options

Groq supports various provider-specific options:
Expand Down Expand Up @@ -197,11 +225,14 @@ Creates a Groq chat adapter with an explicit API key.

**Returns:** A Groq chat adapter instance.

### `groqTranscription(model, config?)` / `createGroqTranscription(model, apiKey, config?)`

Creates a Groq transcription (speech-to-text) adapter. The short form reads `GROQ_API_KEY` from the environment; the `create*` form takes an explicit API key. Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`.

## Limitations

- **Text-to-Speech**: Groq does not currently expose a TTS adapter. Use OpenAI, Gemini, ElevenLabs, or fal for speech generation.
- **Image Generation**: Groq does not support image generation. Use OpenAI, Gemini, or fal for image generation.
- **Transcription**: Groq does not currently expose a transcription adapter through TanStack AI.

## Next Steps

Expand Down
48 changes: 39 additions & 9 deletions docs/media/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Transcription
id: transcription
order: 4
description: "Transcribe audio to text with OpenAI Whisper and GPT-4o-transcribe via TanStack AI's generateTranscription() API."
description: "Transcribe audio to text with OpenAI Whisper, GPT-4o-transcribe, Groq Whisper, and fal.ai STT models via TanStack AI's generateTranscription() API."
keywords:
- tanstack ai
- transcription
Expand All @@ -11,18 +11,21 @@ keywords:
- whisper
- generateTranscription
- openai
- groq
- fal
---

# Audio Transcription

TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models.
TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models, Groq's hosted Whisper models, and fal.ai STT models.

## Overview

Audio transcription is handled by transcription adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI.

Currently supported:
- **OpenAI**: Whisper-1, GPT-4o-transcribe, GPT-4o-mini-transcribe
- **Groq**: whisper-large-v3-turbo, whisper-large-v3
- **fal.ai**: Whisper, Wizper, speech-to-text turbo, ElevenLabs speech-to-text

## Basic Usage
Expand Down Expand Up @@ -76,6 +79,31 @@ const result = await generateTranscription({
})
```

### Groq Transcription

Groq hosts Whisper large-v3 and large-v3-turbo on its fast inference stack. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (which is forwarded to Groq without re-uploading).

```typescript
import { generateTranscription } from '@tanstack/ai'
import { groqTranscription } from '@tanstack/ai-groq'

const result = await generateTranscription({
adapter: groqTranscription('whisper-large-v3-turbo'),
audio: 'https://example.com/recording.mp3',
language: 'en',
})

console.log(result.text)
console.log(result.language)

// verbose_json is the default — segments carry segment-level start/end timestamps
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
}
```

> **Note:** Groq supports `responseFormat` values `json`, `text`, and `verbose_json` (default). `srt` and `vtt` are not supported — passing them throws. Provider-specific `modelOptions` are `temperature` and `timestamp_granularities` (`['word']`, `['segment']`, or both).

### fal.ai Transcription

fal.ai offers Whisper, Wizper, and other STT models. The `audio` input accepts a URL, `File`, `Blob`, or `ArrayBuffer` (auto-wrapped in a `Blob`).
Expand Down Expand Up @@ -171,16 +199,18 @@ interface TranscriptionResult {
text: string // Full transcribed text
language?: string // Detected/specified language
duration?: number // Audio duration in seconds
segments?: Array<{ // Timestamped segments
segments?: Array<{ // Segment-level timestamps
id: number // Segment identifier
start: number // Start time in seconds
end: number // End time in seconds
text: string // Segment text
words?: Array<{ // Word-level timestamps
word: string
start: number
end: number
confidence?: number
}>
confidence?: number // Confidence score (0-1)
speaker?: string // Speaker identifier, if diarization is enabled
}>
words?: Array<{ // Word-level timestamps
word: string
start: number
end: number
}>
}
```
Expand Down
Loading