Factory recipe: audio encoder + speech head (Whisper-style)

## Summary

Factory recipe to add audio understanding and/or speech output to any text-only LLM. Audio encoder for input, optional vocoder head for output.

## Approach

### Audio Input (hearing)
- **Whisper-style encoder** or distilled variant
- Train projection layer: audio embeddings → LLM token space
- Same pattern as vision — frozen base, train the bridge

### Audio Output (speech) 
- Train a small vocoder head on the LLM's hidden states
- Or use adapter approach: LLM generates speech tokens → decoder produces audio
- Could leverage existing TTS as bridge while training native capability

## Factory Integration

- New forge profile type: `audio-encoder` and/or `speech-head`
- Recipes composable: vision + audio + personality on same base model
- Validation: transcription accuracy, speech naturalness (MOS scores)

## Why

A local model that can hear and speak natively — not through TTS/STT bridges — is qualitatively different. Lower latency, better understanding of tone/emphasis, natural conversation. The factory already handles the training infra; this is just a new recipe.

## Constraints

- Audio data is large — need efficient data loading pipeline
- Real-time inference needs AudioWorklet integration (already architected)
- Speech head training needs paired text-audio data

## Related

- #582 (Native multimodal pipeline)
- #576 (Factory SCADA)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factory recipe: audio encoder + speech head (Whisper-style) #650

Summary

Approach

Audio Input (hearing)

Audio Output (speech)

Factory Integration

Why

Constraints

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Factory recipe: audio encoder + speech head (Whisper-style) #650

Description

Summary

Approach

Audio Input (hearing)

Audio Output (speech)

Factory Integration

Why

Constraints

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions