how to translate realtime pcm data to text? #24

jackleibest · 2020-08-14T10:03:21Z

hi, I just wanna translate pcm data to text in realtime, the pcm data is decoded by ffmpeg from live stream, however I can't get the result successfully. can you fix it? here are the codes, feed the pcm data all the time, and translate to texts in 5 seconds:

var stream *astideepspeech.Stream
func detectVoice(sample []byte){
if stream == nil {
m, _ := astideepspeech.New(model)
if err := m.SetBeamWidth(beamWidth); err != nil {
fmt.Println(fmt.Sprintf("Failed setting beam width: %v", err))
return
}
if err := m.EnableExternalScorer(scorer); err != nil {
fmt.Println(fmt.Sprintf("Failed enabling external scorer: %v", err))
return
}
if err := m.SetScorerAlphaBeta(alpha, beta); err != nil {
fmt.Println(fmt.Sprintf("Failed setting scorer hyperparameters: %v", err))
return
}
var err error
stream,err = m.NewStream()
if err != nil {
fmt.Println(fmt.Sprintf("Failed create stream: %v", err))
return
}
}
var d []int16
for _, v := range sample {
d = append(d, int16(v))
}
stream.FeedAudioContent(d)
}

func init(){
Println("get stt result in 5 seconds..........")
go func(){
var ch chan int
ticker := time.NewTicker(time.Second * 5)
go func() {
for range ticker.C {
if stream!=nil{
result,err := stream.IntermediateDecode()
if err != nil {
fmt.Println(fmt.Sprintf("Failed converting speech to text: %v", err))
return
}
fmt.Println("result: ", result)
}
}
ch <- 1
}()
<-ch
}()
}

asticode · 2020-08-14T12:58:08Z

Your detectVoice function is never called. Is that normal ?Your detectVoice function is never called. Is that normal ?

jackleibest · 2020-08-14T22:07:28Z

actually it was called by other function whenever the pcm data decoded by ffmpeg

asticode · 2020-08-17T10:09:06Z

Can you paste a fully working example that I can run myself ?

jackleibest · 2020-08-19T10:24:42Z

my code is working on a stream server with ffmpeg sdk, so it can't working stand alone, can you show us an example for live stream translate?

asticode · 2020-08-19T16:32:16Z

I've never used the live stream portion of DeepSpeech so I'd need some time to work out an example.

Before that, can you confirm that audio data you're feeding DeepSpeech is 16 bits / 16kHz PCM samples ?

jackleibest · 2020-08-20T00:47:05Z

yeah, surely, just static } av_packet_unref(&pkt); return res; } resample the audio to 16bits / 16kHz pcm data, here is the c lang code for aac to pcm: int aacdec_decode(aacdec_t *m, uint8_t *data, int len) { AVPacket pkt; av_init_packet(&pkt); pkt.data = data; pkt.size = len; //av_log(m->ctx, AV_LOG_DEBUG, "decode %p\n", m); int got; int res = avcodec_decode_audio4(m->ctx, m->f, &got, &pkt); if(got){ if(m->swr == NULL){ m->src_rate = m->ctx->sample_rate; m->dst_rate = 16000; m->swr = swr_alloc(); av_opt_set_int(m->swr, "in_channel_layout", m->ctx->channel_layout, 0); av_opt_set_int(m->swr, "in_sample_rate", m->src_rate, 0); av_opt_set_sample_fmt(m->swr, "in_sample_fmt", m->ctx->sample_fmt, 0); av_opt_set_int(m->swr, "out_channel_layout", AV_CH_LAYOUT_MONO, 0); av_opt_set_int(m->swr, "out_sample_rate", m->dst_rate, 0); av_opt_set_sample_fmt(m->swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0); swr_init(m->swr); } int dst_linesize; int dst_nb_channels = av_get_channel_layout_nb_channels(AV_CH_LAYOUT_MONO); int src_nb_samples = m->f->nb_samples; m->frameBuff = (uint8_t *)av_malloc(MAX_AUDIO_FRAME_SIZE*2); int dst_nb_samples = av_rescale_rnd(swr_get_delay(m->swr, m->src_rate) + src_nb_samples, m->dst_rate, m->src_rate, AV_ROUND_UP); av_samples_alloc_array_and_samples(m->frameBuff, &dst_linesize, dst_nb_channels,dst_nb_samples, AV_SAMPLE_FMT_S16, 0); int len = swr_convert(m->swr, &m->frameBuff, dst_nb_samples, (const uint8_t **)m->f->data, src_nb_samples); int total = av_samples_get_buffer_size(&dst_linesize, dst_nb_channels, len, AV_SAMPLE_FMT_S16, 1);

asticode · 2020-08-21T10:03:35Z

What do you get at fmt.Println("result: ", result) ?

Also, what happens if you save all samples to a .wav file and use this lib's cmd to translate it ? In other words, does the non-streaming translation work with your samples ?

jackleibest · 2020-08-22T00:58:50Z

just get the response: "result: ". I did not try out the non-streaming translation with my samples, because I failed to save the pcm to wav file successfully, but I can play the raw pcm with audacity or ffplay whenever save them to PCM file.

asticode · 2020-08-26T16:11:12Z

Thing is, we need to be sure your samples give correct results with the non-streaming part of this lib.

To save your samples to a wav file you can get inspiration from this bit of code.

Once it's done, let me know what results you get from the non-streaming translation.

jackleibest · 2020-08-28T03:54:27Z

ok, thanks, I will try it in someday, too busy on job things at present.~~

asticode self-assigned this Aug 19, 2020

asticode added the question label Aug 19, 2020

asticode closed this as completed Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to translate realtime pcm data to text? #24

how to translate realtime pcm data to text? #24

jackleibest commented Aug 14, 2020 •

edited

asticode commented Aug 14, 2020

jackleibest commented Aug 14, 2020

asticode commented Aug 17, 2020

jackleibest commented Aug 19, 2020

asticode commented Aug 19, 2020

jackleibest commented Aug 20, 2020

asticode commented Aug 21, 2020

jackleibest commented Aug 22, 2020 •

edited

asticode commented Aug 26, 2020

jackleibest commented Aug 28, 2020

how to translate realtime pcm data to text? #24

how to translate realtime pcm data to text? #24

Comments

jackleibest commented Aug 14, 2020 • edited

asticode commented Aug 14, 2020

jackleibest commented Aug 14, 2020

asticode commented Aug 17, 2020

jackleibest commented Aug 19, 2020

asticode commented Aug 19, 2020

jackleibest commented Aug 20, 2020

asticode commented Aug 21, 2020

jackleibest commented Aug 22, 2020 • edited

asticode commented Aug 26, 2020

jackleibest commented Aug 28, 2020

jackleibest commented Aug 14, 2020 •

edited

jackleibest commented Aug 22, 2020 •

edited