[requesting help] Audio speech-to-text using AWS Transcription.

**Describe the bug**
I'm trying to use AWS Transcription instead of Whisper as shown in t[his example](https://github.com/Chainlit/cookbook/blob/main/audio-assistant/app.py#L30C22-L32). But I keep getting back empty text. I suspect I'm getting the encoding or sample rate wrong and I wasn't able to find much info the properties of captured audio in chainlit docs.

**To Reproduce**
Riffing off [this example](https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/develop/examples/simple_mic.py), I'm doing:

```python
class TranscriptionEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        print("Transcript event fired")
        results = transcript_event.transcript.results
        for result in results:
            for alt in result.alternatives:
                print(alt.transcript)
                
@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.AudioChunk):
    if not cl.user_session.get("stream"):
        # transcription init
        transcribe_client = TranscribeStreamingClient(region="us-west-2")
        stream = await transcribe_client.start_stream_transcription(
            language_code="en-US",
            media_sample_rate_hz=44100,
            media_encoding="pcm",
        )
        cl.user_session.set("stream", stream)
        handler = TranscriptionEventHandler(stream.output_stream)
        cl.user_session.set("handler", handler)
        await handler.handle_events()

    print("CHUNK FIRED")
    stream = cl.user_session.get("stream")
    await stream.input_stream.send_audio_event(audio_chunk=chunk.data)


@cl.on_audio_end
async def on_audio_end(elements: list[ElementBased]):
    print("END FIRED")
    stream = cl.user_session.get("stream")
    await stream.input_stream.end_stream()
```

**Expected behavior**
AWS transcription returns text transcription of captured audio.


**Additional context**
I'm looking for an extra pair of eyes essentially, maybe some pointers on what direction to go, to troubleshoot, or debugging tips.
I reckon if I can get this working, it'd be a useful addition to the cookbooks?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[requesting help] Audio speech-to-text using AWS Transcription. #1084

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[requesting help] Audio speech-to-text using AWS Transcription. #1084

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions