Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文的deploy问题 #73

Closed
yyhlvdl opened this issue Dec 11, 2017 · 43 comments
Closed

中文的deploy问题 #73

yyhlvdl opened this issue Dec 11, 2017 · 43 comments

Comments

@yyhlvdl
Copy link

yyhlvdl commented Dec 11, 2017

终于,我在docker中启动了服务器和客户端,然后说了一段中文,出现这样的错误:
Exception happened during processing of request from ('127.0.0.1', 59312)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 652, in init
self.handle()
File "deploy/demo_server.py", line 108, in handle
(finish_time - start_time, transcript))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-48: ordinal not in range(128)
我个人觉得,可以把识别结果存储到一个文件中,没必要打印出来,当然,如果作者可以解决打印的问题,就更好了。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

@john81529 是一条case都办法打印?如果是的话,建议你简单验证一下,python环境打印中文是否正常(UTF-8)编码

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

这个问题解决了,然后我尝试说些话,然后都只有一个识别结果:虎。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

@john81529 你用的模型是?

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

你们发布的aishell模型,lm模型,vocab_txt

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

很奇怪,这个模型我们有试过demo,你能贴一下log看看吗,server端的

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

yyh@yyh-System-Product-Name:~/DeepSpeech$ python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.15
beam_size: 500
beta: 0.35
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
speech_save_dir: record
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1211 15:07:48.239626  3479 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-11 15:07:50,822 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 15:07:50,823 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 15:07:50,823 layers.py:7411] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 15:07:50,824 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 15:07:50,824 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 15:07:50,824 layers.py:7411] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-11 15:07:55,229 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-11 15:08:11,793 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 15:08:11,793 model.py:242] end initializing scorer. Start decoding ...
Response Time: 18.827325
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
Response Time: 2.891270
('Warm-up Test Case %d: %s', 2, u'asset/data/aishell/wav/test/S0908/BAC009S0908W0175.wav')
Response Time: 2.643388
-----------------------------------------------------------
ASR Server Started.
Received utterance[length=159744] from 127.0.0.1, saved to record/20171211071436_127.0.0.1.wav.
Response Time: 1.431363, Transcript: 虎
Received utterance[length=106496] from 127.0.0.1, saved to record/20171211071448_127.0.0.1.wav.
Response Time: 0.939872, Transcript: 虎
Received utterance[length=4096] from 127.0.0.1, saved to record/20171211071525_127.0.0.1.wav.
Response Time: 0.020550, Transcript: 虎
Received utterance[length=4096] from 127.0.0.1, saved to record/20171211071526_127.0.0.1.wav.
Response Time: 0.025439, Transcript: 虎
Received utterance[length=8192] from 127.0.0.1, saved to record/20171211071527_127.0.0.1.wav.
Response Time: 0.088828, Transcript: 虎
Received utterance[length=118784] from 127.0.0.1, saved to record/20171211072039_127.0.0.1.wav.
Response Time: 1.058048, Transcript: 虎

为了减少时间,我把cutoff_prob改成0.99了,可能这个原因吗

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

warm-up的预测结果为什么没有显示?可以同样打印出来看看吗,从时长上看你录制的音频有些异常
另外贴log的时候,最好用引用,否则显示混乱

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

yyh@yyh-System-Product-Name:~/DeepSpeech$ python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.5
beam_size: 500
beta: 0.3
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
speech_save_dir: record
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1211 16:06:02.273478  3771 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-11 16:06:02,643 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:06:02,643 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:06:02,644 layers.py:7411] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:06:02,644 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 16:06:02,645 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 16:06:02,645 layers.py:7411] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-11 16:06:06,261 model.py:243] begin to initialize the external scorer for decoding
[INFO 2017-12-11 16:06:17,210 model.py:254] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 16:06:17,210 model.py:255] end initializing scorer. Start decoding ...
Response Time: 13.063597, Transcript: 虎
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
Response Time: 2.911145, Transcript: 虎
('Warm-up Test Case %d: %s', 2, u'asset/data/aishell/wav/test/S0908/BAC009S0908W0175.wav')
Response Time: 2.662492, Transcript: 虎
-----------------------------------------------------------
ASR Server Started.

是这样的

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

这个结果完全不正常,这里需要验证一下:1. 是否是中文打印问题 2. 是否模型本身的问题
建议跑一下infer程序,看输出是否正常

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

我是使用aishell的test文件,进行warm-up的。‘’另外贴log的时候,最好用引用,否则显示混乱‘’,我是在网页上直接用github的。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

我将服务器处理后的结果写在一个txt文件中,然后显示:虎。应该不是中文打印问题。另外,我按照你们最新更改后的代码更改了,运行infer.py,出现这样的问题:
yyh@yyh-System-Product-Name:~/DeepSpeech$ python infer.py
----------- Configuration Arguments -----------
alpha: 2.5
beam_size: 500
beta: 0.3
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
error_rate_type: wer
infer_manifest: asset/preprocess/test
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_proc_bsearch: 8
num_rnn_layers: 3
num_samples: 10
rnn_layer_size: 2048
share_rnn_weights: True
specgram_type: linear
trainer_count: 1
use_gpu: True
use_gru: False
vocab_path: asset/preprocess/vocab.txt

I1211 16:24:19.635944 6278 Util.cpp:166] commandline: --use_gpu=True --rnn_use_batch=True --trainer_count=1
Traceback (most recent call last):
File "infer.py", line 126, in
main()
File "infer.py", line 122, in main
infer()
File "infer.py", line 73, in infer
num_conv_layers=args.num_conv_layers)
TypeError: init() got an unexpected keyword argument 'num_conv_layers'

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

在修改后,infer目前的结果:

yyh@yyh-System-Product-Name:~/DeepSpeech$ python infer.py
-----------  Configuration Arguments -----------
alpha: 2.5
beam_size: 500
beta: 0.3
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
error_rate_type: wer
infer_manifest: asset/preprocess/test
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_proc_bsearch: 8
num_rnn_layers: 3
num_samples: 10
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
trainer_count: 1
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
------------------------------------------------
I1211 16:27:12.588616  6714 Util.cpp:166] commandline:  --use_gpu=True --rnn_use_batch=True --trainer_count=1 
[INFO 2017-12-11 16:27:13,119 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:27:13,120 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:27:13,121 layers.py:7411] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 16:27:13,122 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 16:27:13,122 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 16:27:13,123 layers.py:7411] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 16:27:17,467 model.py:243] begin to initialize the external scorer for decoding
[INFO 2017-12-11 16:27:17,556 model.py:254] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 16:27:17,556 model.py:255] end initializing scorer. Start decoding ...

Target Transcription: 推行 统一 的 标准 操作 规程 和 技术 规范
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 大力 发展 农业 职业 培养
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 提高 防汛 抗旱 减灾 能力
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 使 其 市值 分秒 间 蒸发 近 四百亿 美元
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 完善 机耕 道 农田 防护 林 等 设施
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 众人 一 起 为 寿 寿星 女 庆生
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 失 孤 等 影片 的 上映
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 对于 谋求 转型 发展 怀揣 创新 型 国家 梦想 的 中国 来说
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 荞麦 窝窝 头 一零 月 二零 日
Output Transcription: 虎
Current error rate [wer] = 1.000000

Target Transcription: 瞬间 将 苹果 估价 拉 低 了 至少 百分 之 六
Output Transcription: 虎
Current error rate [wer] = 1.000000
[INFO 2017-12-11 16:27:25,266 infer.py:114] finish inference

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

有点诡异,能否告知一下repo的commit id?

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

是进行infer的语音文件的id吗?

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

不是,是你使用的repo的最近一次commit的id,类似于

commit 4bf526e78d8531551fce1f4d8bfb119e297812d7

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

这我真的不记得了,下载代码估计有半个月了。不过,我今天按照pull request的两种更改前后,infer都是那样的问题。请问你们是什么id,要不我直接复现你们的测试成功的id吧。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

@yyhlvdl 很好判断,你执行一下
git log
就能拿到最近的commit id,而且不建议直接用pull request的代码,因为还没经过review,可能会有修改

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

yyh@yyh-System-Product-Name:~/DeepSpeech$ git log
commit 4bf526e
Merge: f9ebff7 23e4483
Author: Yibing Liu liuyibing01@baidu.com
Date: Fri Dec 8 20:45:26 2017 +0800

Merge pull request #66 from lispc/develop

fix a comment in audio_featurizer.py

commit 23e4483
Author: lispc mycinbrin@gmail.com
Date: Fri Dec 8 20:20:39 2017 +0800

fix a comment in audio_featurizer.py

commit f9ebff7
Merge: 907898a 20e2258
Author: Yang yaming mxscmxsc@gmail.com
Date: Wed Dec 6 03:58:59 2017 -0600

Merge pull request #58 from pkuyym/fix-56

Simplify parallel part for data processing and fix abnormal exit.

:

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

add_arg('use_gru', bool, True, "Use GRUs instead of simple RNNs.")
add_arg('use_gpu', bool, True, "Use GPU or not.")
add_arg('share_rnn_weights',bool, False, "Share input-hidden weights across "
"bi-directional RNNs. Not for GRU.")
这3行,我也根据下载的aishell模型进行更改了,和原来的代码不一致

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

@yyhlvdl 我用最新的repo验证了一下模型效果,是没问题的,这是我的log

-----------  Configuration Arguments -----------
alpha: 2.6
beam_size: 300
beta: 5.0
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
error_rate_type: cer
infer_manifest: data/aishell/manifest.test
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: models/aishell/mean_std.npz
model_path: models/aishell/params.tar.gz
num_conv_layers: 2
num_proc_bsearch: 8
num_rnn_layers: 3
num_samples: 10
rnn_layer_size: 1024
share_rnn_weights: 0
specgram_type: linear
trainer_count: 1
use_gpu: 1
use_gru: 1
vocab_path: models/aishell/vocab.txt
------------------------------------------------
I1211 19:13:57.352583  8539 Util.cpp:166] commandline:  --use_gpu=1 --rnn_use_batch=True --trainer_count=1
[INFO 2017-12-11 19:13:59,120 layers.py:2558] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 19:13:59,122 layers.py:3085] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 19:13:59,124 layers.py:7091] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 19:13:59,125 layers.py:2558] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 19:13:59,127 layers.py:3085] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 19:13:59,127 layers.py:7091] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 19:14:07,219 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-11 19:14:07,480 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 19:14:07,481 model.py:242] end initializing scorer. Start decoding ...

Target Transcription: 机场严查匿打火机过安检放在鞋子里算藏匿
Output Transcription: 机场严查拟打火机过安检放在鞋子里算藏匿
Current error rate [cer] = 0.052632

Target Transcription: 使得他们的速度变慢了
Output Transcription: 使得他们的速度变慢了
Current error rate [cer] = 0.000000

Target Transcription: 新京报记者从首都国际机场公安分局相关人员处获悉
Output Transcription: 新京报记者从首都国际机场公安分局相关人员处获悉
Current error rate [cer] = 0.000000

Target Transcription: 个人寄快递必须登记有效的身份证件
Output Transcription: 个人既快递必须登记有效的身份证件
Current error rate [cer] = 0.062500

Target Transcription: 目前挂牌的只有几宗土地
Output Transcription: 目前挂牌的只有几宗土地
Current error rate [cer] = 0.000000

Target Transcription: 顺利获得了冬奥会的主办权
Output Transcription: 顺利获得了冬奥会的主办权
Current error rate [cer] = 0.000000

Target Transcription: 也就是我们常说的超级月亮
Output Transcription: 也就是我们常说的超级月亮
Current error rate [cer] = 0.000000

Target Transcription: 但是由于直销改为经销
Output Transcription: 但是由于直销改为经销
Current error rate [cer] = 0.000000

Target Transcription: 而是因为她也中了天价的招
Output Transcription: 而是因为它也重了天价的招
Current error rate [cer] = 0.166667

Target Transcription: 稳增长措施需更全面地考虑化解楼市风险问题
Output Transcription: 稳增长措施需更全面地考虑化解楼市风险问题
Current error rate [cer] = 0.000000

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

就是直接在浏览器上下载代码,而不是git获得代码吗?

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

恩,这两种方式获取的代码一致

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

Target Transcription: 推行 统一 的 标准 操作 规程 和 技术 规范
Output Transcription: 推行统一的标准操作工场和技术规范
Current error rate [wer] = 1.000000

Target Transcription: 大力 发展 农业 职业 培养
Output Transcription: 大力发展农业职业培养
Current error rate [wer] = 1.000000

Target Transcription: 提高 防汛 抗旱 减灾 能力
Output Transcription: 提高防汛抗旱简单能力
Current error rate [wer] = 1.000000

Target Transcription: 使 其 市值 分秒 间 蒸发 近 四百亿 美元
Output Transcription: 使其市值分秒天蒸发近四百亿美元
Current error rate [wer] = 1.000000

Target Transcription: 完善 机耕 道 农田 防护 林 等 设施
Output Transcription: 完善一根到农田防护林等设施
Current error rate [wer] = 1.000000

Target Transcription: 众人 一 起 为 寿 寿星 女 庆生
Output Transcription: 众人一起为受寿星庆生
Current error rate [wer] = 1.000000

Target Transcription: 失 孤 等 影片 的 上映
Output Transcription: 事故等影片的上映
Current error rate [wer] = 1.000000

Target Transcription: 对于 谋求 转型 发展 怀揣 创新 型 国家 梦想 的 中国 来说
Output Transcription: 对于谋求转型发展怀揣创新型国家梦想的中国来说
Current error rate [wer] = 1.000000

Target Transcription: 荞麦 窝窝 头 一零 月 二零 日
Output Transcription: 美国一零月二零
Current error rate [wer] = 1.000000

Target Transcription: 瞬间 将 苹果 估价 拉 低 了 至少 百分 之 六
Output Transcription: 瞬间将苹果股价拉低了至少百分之六
Current error rate [wer] = 1.000000
就结果看,是将rnn_layer_size由2048改成1024,就可以了

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

不过,我好奇的是:
share_rnn_weights: 0
use_gpu: 1
use_gru: 1
这三个参数,应该是true,false选择一个,怎么出现0,1的

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

打印问题

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

在我执行客户端后,发现识别我自己的声音特别费时间,等待了很久,请问你们实验的时候,每次语音识别时间是多久呢?

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

硬件不同,也不好直接对比,这边k40m显卡,cutoff_prob=0.99,一条样本不到1s的响应时间吧
请确认你的cutoff_prob不是1.0,设成0.99就好

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

yyh@yyh-System-Product-Name:~/DeepSpeech-develop$ python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.6
beam_size: 500
beta: 5.0
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 1024
share_rnn_weights: False
specgram_type: linear
speech_save_dir: result
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1211 20:03:43.777797  3471 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-11 20:03:45,935 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:03:45,936 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:03:45,936 layers.py:7411] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:03:45,937 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 20:03:45,937 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 20:03:45,938 layers.py:7411] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-11 20:03:50,260 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-11 20:04:12,140 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 20:04:12,140 model.py:242] end initializing scorer. Start decoding ...
Response Time: 22.201647, Transcript: 政府性违约可能性不大
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
Response Time: 0.328615, Transcript: 尽管有央行降息等各方利好刺激
('Warm-up Test Case %d: %s', 2, u'asset/data/aishell/wav/test/S0908/BAC009S0908W0175.wav')
Response Time: 0.246450, Transcript: 进而对稳定中国经济有正面作用
-----------------------------------------------------------
ASR Server Started.
Received utterance[length=139264] from 127.0.0.1, saved to result/20171211120425_127.0.0.1.wav.
Response Time: 0.066784, Transcript: 
Received utterance[length=155648] from 127.0.0.1, saved to result/20171211120926_127.0.0.1.wav.
Response Time: 0.080960, Transcript: 

然后不动了。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

@yyhlvdl 信息太少,看不出问题,建议先debug一下看

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

服务器显示:

yyh@yyh-System-Product-Name:~/DeepSpeech-develop$ python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.6
beam_size: 500
beta: 5.0
cutoff_prob: 0.99
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 1024
share_rnn_weights: False
specgram_type: linear
speech_save_dir: result
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1211 20:17:06.338968  3478 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-11 20:17:08,464 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:17:08,464 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:17:08,465 layers.py:7411] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-11 20:17:08,465 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 20:17:08,466 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-11 20:17:08,466 layers.py:7411] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-11 20:17:12,363 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-11 20:17:33,157 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-11 20:17:33,157 model.py:242] end initializing scorer. Start decoding ...
Response Time: 21.175204, Transcript: 政府性违约可能性不大
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
Response Time: 0.210205, Transcript: 尽管有央行降息等各方利好刺激
('Warm-up Test Case %d: %s', 2, u'asset/data/aishell/wav/test/S0908/BAC009S0908W0175.wav')
Response Time: 0.173023, Transcript: 进而对稳定中国经济有正面作用
-----------------------------------------------------------
ASR Server Started.
Received utterance[length=163840] from 127.0.0.1, saved to result/20171211121743_127.0.0.1.wav.
Response Time: 0.077384, Transcript: 

客户端显示:
yyh@yyh-System-Product-Name:~/DeepSpeech-develop$ python -u deploy/demo_client.py
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
Start Recording ... Speech[length=163840] Sent.
Recognition Results:

然而,存储的语音文件播放的话很正常

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

这看不出来呀,建议自己debug一下看,如果存储音频播放正常,可以手动构造一下manifest试试infer是否合理
从log里面看warm-up很正常,证明模型本身问题不大

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

不好意思,我忘记打开麦克风了,所以录制的是系统声音。
然后,我说:今天的天气真好,
识别结果为:近年来的市场行,
不过,也算是可以了,多谢。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 11, 2017

因为数据量小,所以对抗噪声能力可能不如人意,你测试时,尽量找个安静环境,然后说话声大些,发音清晰些

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 11, 2017

多谢多谢。

@pkuyym pkuyym closed this as completed Dec 11, 2017
@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 12, 2017

目前,我用的是2.8g的那个语言模型,采用80g的那个语言模型效果会不会好一点?

@Pelhans
Copy link

Pelhans commented Feb 8, 2018

@yyhlvdl 您好,我运行demo_server后也给出
' UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-48: ordinal not in range(128)'
的错误,然后将transcript改成transcript.encode("utf-8")后打印结果就一直都是
'煎熬煎熬煎熬煎熬煎熬‘
请问您是怎么解决解码问题的呢?

@yyhlvdl
Copy link
Author

yyhlvdl commented Feb 8, 2018

不好意思,我实际上没有解决这个问题,我在代码中将其注释掉了

@DmytroSytro
Copy link

I solved the problem by encoding:
transcript.encode('utf-8'). And then I also had to change encoding in terminal by hand to utf-8

@Pelhans
Copy link

Pelhans commented Mar 2, 2018

@yyhlvdl 好的~感谢帮助~

@Pelhans
Copy link

Pelhans commented Mar 2, 2018

@DimaMcar Change encoding in terminal means set LANG=zh_CN.UTF-8 ?

@Pelhans
Copy link

Pelhans commented Mar 2, 2018

@yyhlvdl @DimaMcar I solved this problem by using mean_std.npz and vocab.txt in modle package instead of the generated file from run_data.sh....thanks for your help~ @pkuyym maybe it is a small bug about encoding...

@zhaoqxu-eth
Copy link

@Pelhans What do you mean by modle package? I also faced your problem. Thanks!

@Pelhans
Copy link

Pelhans commented Aug 16, 2018

@xuzhaoqing 就是说 mean_std.npz and vocab.txt 这两个文件不用自己通过run_data.sh生成的,而是用他提供的模型压缩包里解压出来带的那俩。

Jackwaterveg pushed a commit to Jackwaterveg/DeepSpeech that referenced this issue Jan 29, 2022
…s_trian_se_resnext3

[multi process] Update se_resnext
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants