Skip to content

Add AWS provider support for Speech-to-Text and Text-to-Speech using Amazon Transcribe and Amazon Polly #6436

@rubenquintanabravo

Description

@rubenquintanabravo

Feature Description

I would like Flowise to support AWS as a first-class provider for both Speech-to-Text and Text-to-Speech:

  • STT: Amazon Transcribe
  • TTS: Amazon Polly

Flowise already supports audio upload and speech workflows, but AWS is not currently available as a provider for these capabilities. Adding AWS support would make Flowise easier to adopt in organizations already using AWS for infrastructure, IAM, compliance, logging, and data residency.

Feature Category

Integration

Problem Statement

Many teams run Flowise in AWS environments and prefer to keep voice processing inside the same cloud provider for:

  • centralized IAM and credential management
  • regional/data residency requirements
  • easier compliance review
  • consolidated billing and observability
  • use of existing AWS services such as S3, KMS, CloudWatch, IAM roles, and private networking

A common setup would be:

  1. User records audio in the Flowise chat widget.
  2. Flowise sends the audio to an STT provider.
  3. Amazon Transcribe converts speech to text.
  4. The text is processed by the chatflow.
  5. The assistant response can optionally be converted to audio.
  6. Amazon Polly generates the final speech output.

Proposed Solution

Add a new AWS provider option for both STT and TTS configuration.

Speech-to-Text: Amazon Transcribe

Suggested configuration fields:

  • AWS Region
  • AWS Credentials or existing Flowise credential reference
  • Language code, for example en-US, es-ES (could be multiple in Transcribe)
  • Optional automatic language identification
  • Optional custom vocabulary
  • Optional vocabulary filter
  • Optional content redaction / PII redaction where supported
  • Optional S3 bucket configuration if batch transcription requires temporary object storage

Initial implementation could support uploaded audio files first. Streaming transcription could be added later as a separate enhancement.

Text-to-Speech: Amazon Polly

Suggested configuration fields:

  • AWS Region
  • AWS Credentials or existing Flowise credential reference
  • Voice ID, for example Joanna, Matthew, Lucia
  • Engine, where supported: standard, neural, long-form, generative
  • Output format: mp3, ogg_vorbis, pcm
  • Sample rate
  • Text type: plain text or SSML
  • Optional lexicons

The generated audio should integrate with the existing Flowise TTS response flow so the chat embed can keep using the current TTS playback behavior.

Mockups or References

No response

Additional Context

Acceptance criteria

  • AWS appears as a selectable provider for STT.
  • AWS appears as a selectable provider for TTS.
  • Amazon Transcribe can process an audio message uploaded through the chat.
  • Amazon Polly can synthesize a chat response into playable audio.
  • Credentials are handled only server-side.
  • Configuration is documented.
  • Errors from AWS are surfaced clearly in Flowise logs and API responses.
  • The implementation works with the public chat embed without exposing AWS credentials.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions