Open
Description
Describe the bug
I'm trying to use AWS Transcription instead of Whisper as shown in this example. But I keep getting back empty text. I suspect I'm getting the encoding or sample rate wrong and I wasn't able to find much info the properties of captured audio in chainlit docs.
To Reproduce
Riffing off this example, I'm doing:
class TranscriptionEventHandler(TranscriptResultStreamHandler):
async def handle_transcript_event(self, transcript_event: TranscriptEvent):
print("Transcript event fired")
results = transcript_event.transcript.results
for result in results:
for alt in result.alternatives:
print(alt.transcript)
@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.AudioChunk):
if not cl.user_session.get("stream"):
# transcription init
transcribe_client = TranscribeStreamingClient(region="us-west-2")
stream = await transcribe_client.start_stream_transcription(
language_code="en-US",
media_sample_rate_hz=44100,
media_encoding="pcm",
)
cl.user_session.set("stream", stream)
handler = TranscriptionEventHandler(stream.output_stream)
cl.user_session.set("handler", handler)
await handler.handle_events()
print("CHUNK FIRED")
stream = cl.user_session.get("stream")
await stream.input_stream.send_audio_event(audio_chunk=chunk.data)
@cl.on_audio_end
async def on_audio_end(elements: list[ElementBased]):
print("END FIRED")
stream = cl.user_session.get("stream")
await stream.input_stream.end_stream()
Expected behavior
AWS transcription returns text transcription of captured audio.
Additional context
I'm looking for an extra pair of eyes essentially, maybe some pointers on what direction to go, to troubleshoot, or debugging tips.
I reckon if I can get this working, it'd be a useful addition to the cookbooks?