Add AWS provider support for Speech-to-Text and Text-to-Speech using Amazon Transcribe and Amazon Polly

### Feature Description

I would like Flowise to support AWS as a first-class provider for both Speech-to-Text and Text-to-Speech:

- **STT:** Amazon Transcribe
- **TTS:** Amazon Polly

Flowise already supports audio upload and speech workflows, but AWS is not currently available as a provider for these capabilities. Adding AWS support would make Flowise easier to adopt in organizations already using AWS for infrastructure, IAM, compliance, logging, and data residency.


### Feature Category

Integration

### Problem Statement

Many teams run Flowise in AWS environments and prefer to keep voice processing inside the same cloud provider for:

- centralized IAM and credential management
- regional/data residency requirements
- easier compliance review
- consolidated billing and observability
- use of existing AWS services such as S3, KMS, CloudWatch, IAM roles, and private networking

A common setup would be:

1. User records audio in the Flowise chat widget.
2. Flowise sends the audio to an STT provider.
3. Amazon Transcribe converts speech to text.
4. The text is processed by the chatflow.
5. The assistant response can optionally be converted to audio.
6. Amazon Polly generates the final speech output.

### Proposed Solution

Add a new `AWS` provider option for both STT and TTS configuration.

#### Speech-to-Text: Amazon Transcribe

Suggested configuration fields:

- AWS Region
- AWS Credentials or existing Flowise credential reference
- Language code, for example `en-US`, `es-ES` (could be multiple in Transcribe)
- Optional automatic language identification
- Optional custom vocabulary
- Optional vocabulary filter
- Optional content redaction / PII redaction where supported
- Optional S3 bucket configuration if batch transcription requires temporary object storage

Initial implementation could support uploaded audio files first. Streaming transcription could be added later as a separate enhancement.

#### Text-to-Speech: Amazon Polly

Suggested configuration fields:

- AWS Region
- AWS Credentials or existing Flowise credential reference
- Voice ID, for example `Joanna`, `Matthew`, `Lucia`
- Engine, where supported: `standard`, `neural`, `long-form`, `generative`
- Output format: `mp3`, `ogg_vorbis`, `pcm`
- Sample rate
- Text type: plain text or SSML
- Optional lexicons

The generated audio should integrate with the existing Flowise TTS response flow so the chat embed can keep using the current TTS playback behavior.


### Mockups or References

_No response_

### Additional Context

### Acceptance criteria

- AWS appears as a selectable provider for STT.
- AWS appears as a selectable provider for TTS.
- Amazon Transcribe can process an audio message uploaded through the chat.
- Amazon Polly can synthesize a chat response into playable audio.
- Credentials are handled only server-side.
- Configuration is documented.
- Errors from AWS are surfaced clearly in Flowise logs and API responses.
- The implementation works with the public chat embed without exposing AWS credentials.

### References

- Amazon Transcribe documentation: https://aws.amazon.com/transcribe/
- Amazon Polly documentation: https://aws.amazon.com/polly/
- Flowise audio upload documentation: https://docs.flowiseai.com/using-flowise/uploads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AWS provider support for Speech-to-Text and Text-to-Speech using Amazon Transcribe and Amazon Polly #6436

Feature Description

Feature Category

Problem Statement

Proposed Solution

Speech-to-Text: Amazon Transcribe

Text-to-Speech: Amazon Polly

Mockups or References

Additional Context

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add AWS provider support for Speech-to-Text and Text-to-Speech using Amazon Transcribe and Amazon Polly #6436

Description

Feature Description

Feature Category

Problem Statement

Proposed Solution

Speech-to-Text: Amazon Transcribe

Text-to-Speech: Amazon Polly

Mockups or References

Additional Context

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions