Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to translate realtime pcm data to text? #24

Closed
jackleibest opened this issue Aug 14, 2020 · 10 comments
Closed

how to translate realtime pcm data to text? #24

jackleibest opened this issue Aug 14, 2020 · 10 comments
Assignees
Labels

Comments

@jackleibest
Copy link

jackleibest commented Aug 14, 2020

hi, I just wanna translate pcm data to text in realtime, the pcm data is decoded by ffmpeg from live stream, however I can't get the result successfully. can you fix it? here are the codes, feed the pcm data all the time, and translate to texts in 5 seconds:

var stream *astideepspeech.Stream
func detectVoice(sample []byte){
if stream == nil {
m, _ := astideepspeech.New(model)
if err := m.SetBeamWidth(beamWidth); err != nil {
fmt.Println(fmt.Sprintf("Failed setting beam width: %v", err))
return
}
if err := m.EnableExternalScorer(scorer); err != nil {
fmt.Println(fmt.Sprintf("Failed enabling external scorer: %v", err))
return
}
if err := m.SetScorerAlphaBeta(alpha, beta); err != nil {
fmt.Println(fmt.Sprintf("Failed setting scorer hyperparameters: %v", err))
return
}
var err error
stream,err = m.NewStream()
if err != nil {
fmt.Println(fmt.Sprintf("Failed create stream: %v", err))
return
}
}
var d []int16
for _, v := range sample {
d = append(d, int16(v))
}
stream.FeedAudioContent(d)
}

func init(){
Println("get stt result in 5 seconds..........")
go func(){
var ch chan int
ticker := time.NewTicker(time.Second * 5)
go func() {
for range ticker.C {
if stream!=nil{
result,err := stream.IntermediateDecode()
if err != nil {
fmt.Println(fmt.Sprintf("Failed converting speech to text: %v", err))
return
}
fmt.Println("result: ", result)
}
}
ch <- 1
}()
<-ch
}()
}

@asticode
Copy link
Owner

Your detectVoice function is never called. Is that normal ?Your detectVoice function is never called. Is that normal ?

@jackleibest
Copy link
Author

actually it was called by other function whenever the pcm data decoded by ffmpeg

@asticode
Copy link
Owner

Can you paste a fully working example that I can run myself ?

@jackleibest
Copy link
Author

my code is working on a stream server with ffmpeg sdk, so it can't working stand alone, can you show us an example for live stream translate?

@asticode asticode self-assigned this Aug 19, 2020
@asticode
Copy link
Owner

I've never used the live stream portion of DeepSpeech so I'd need some time to work out an example.

Before that, can you confirm that audio data you're feeding DeepSpeech is 16 bits / 16kHz PCM samples ?

@jackleibest
Copy link
Author

yeah, surely, just resample the audio to 16bits / 16kHz pcm data, here is the c lang code for aac to pcm:
static int aacdec_decode(aacdec_t *m, uint8_t *data, int len) {
AVPacket pkt;
av_init_packet(&pkt);
pkt.data = data;
pkt.size = len;
//av_log(m->ctx, AV_LOG_DEBUG, "decode %p\n", m);
int got;
int res = avcodec_decode_audio4(m->ctx, m->f, &got, &pkt);
if(got){
if(m->swr == NULL){
m->src_rate = m->ctx->sample_rate;
m->dst_rate = 16000;
m->swr = swr_alloc();
av_opt_set_int(m->swr, "in_channel_layout", m->ctx->channel_layout, 0);
av_opt_set_int(m->swr, "in_sample_rate", m->src_rate, 0);
av_opt_set_sample_fmt(m->swr, "in_sample_fmt", m->ctx->sample_fmt, 0);
av_opt_set_int(m->swr, "out_channel_layout", AV_CH_LAYOUT_MONO, 0);
av_opt_set_int(m->swr, "out_sample_rate", m->dst_rate, 0);
av_opt_set_sample_fmt(m->swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(m->swr);
}
int dst_linesize;
int dst_nb_channels = av_get_channel_layout_nb_channels(AV_CH_LAYOUT_MONO);
int src_nb_samples = m->f->nb_samples;
m->frameBuff = (uint8_t *)av_malloc(MAX_AUDIO_FRAME_SIZE*2);
int dst_nb_samples = av_rescale_rnd(swr_get_delay(m->swr, m->src_rate) + src_nb_samples, m->dst_rate, m->src_rate, AV_ROUND_UP);
av_samples_alloc_array_and_samples(m->frameBuff, &dst_linesize, dst_nb_channels,dst_nb_samples, AV_SAMPLE_FMT_S16, 0);
int len = swr_convert(m->swr, &m->frameBuff, dst_nb_samples, (const uint8_t **)m->f->data, src_nb_samples);
int total = av_samples_get_buffer_size(&dst_linesize, dst_nb_channels, len, AV_SAMPLE_FMT_S16, 1);
}
av_packet_unref(&pkt);
return res;
}

@asticode
Copy link
Owner

What do you get at fmt.Println("result: ", result) ?

Also, what happens if you save all samples to a .wav file and use this lib's cmd to translate it ? In other words, does the non-streaming translation work with your samples ?

@jackleibest
Copy link
Author

jackleibest commented Aug 22, 2020

just get the response: "result: ". I did not try out the non-streaming translation with my samples, because I failed to save the pcm to wav file successfully, but I can play the raw pcm with audacity or ffplay whenever save them to PCM file.

@asticode
Copy link
Owner

Thing is, we need to be sure your samples give correct results with the non-streaming part of this lib.

To save your samples to a wav file you can get inspiration from this bit of code.

Once it's done, let me know what results you get from the non-streaming translation.

@jackleibest
Copy link
Author

ok, thanks, I will try it in someday, too busy on job things at present.~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants