Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[requesting help] Audio speech-to-text using AWS Transcription. #1084

Open
foragerr opened this issue Jun 17, 2024 · 0 comments
Open

[requesting help] Audio speech-to-text using AWS Transcription. #1084

foragerr opened this issue Jun 17, 2024 · 0 comments

Comments

@foragerr
Copy link

foragerr commented Jun 17, 2024

Describe the bug
I'm trying to use AWS Transcription instead of Whisper as shown in this example. But I keep getting back empty text. I suspect I'm getting the encoding or sample rate wrong and I wasn't able to find much info the properties of captured audio in chainlit docs.

To Reproduce
Riffing off this example, I'm doing:

class TranscriptionEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        print("Transcript event fired")
        results = transcript_event.transcript.results
        for result in results:
            for alt in result.alternatives:
                print(alt.transcript)
                
@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.AudioChunk):
    if not cl.user_session.get("stream"):
        # transcription init
        transcribe_client = TranscribeStreamingClient(region="us-west-2")
        stream = await transcribe_client.start_stream_transcription(
            language_code="en-US",
            media_sample_rate_hz=44100,
            media_encoding="pcm",
        )
        cl.user_session.set("stream", stream)
        handler = TranscriptionEventHandler(stream.output_stream)
        cl.user_session.set("handler", handler)
        await handler.handle_events()

    print("CHUNK FIRED")
    stream = cl.user_session.get("stream")
    await stream.input_stream.send_audio_event(audio_chunk=chunk.data)


@cl.on_audio_end
async def on_audio_end(elements: list[ElementBased]):
    print("END FIRED")
    stream = cl.user_session.get("stream")
    await stream.input_stream.end_stream()

Expected behavior
AWS transcription returns text transcription of captured audio.

Additional context
I'm looking for an extra pair of eyes essentially, maybe some pointers on what direction to go, to troubleshoot, or debugging tips.
I reckon if I can get this working, it'd be a useful addition to the cookbooks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant