Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aishell的deploy的问题 #61

Closed
yyhlvdl opened this issue Dec 5, 2017 · 26 comments
Closed

aishell的deploy的问题 #61

yyhlvdl opened this issue Dec 5, 2017 · 26 comments

Comments

@yyhlvdl
Copy link

yyhlvdl commented Dec 5, 2017

我直接使用你们发布的aishell模型,执行python deploy/demo_server.py,然后出现了错误:

root@095d9ada1b1d:/DeepSpeech# python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.15
beam_size: 500
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
speech_save_dir: demo_cache
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1205 10:14:34.175657    15 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-05 10:14:35,626 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,626 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,627 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,627 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 10:14:35,628 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 10:14:35,628 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-05 10:14:42,337 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-05 10:14:50,941 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-05 10:14:50,941 model.py:242] end initializing scorer. Start decoding ...
Traceback (most recent call last):
  File "deploy/demo_server.py", line 224, in <module>
    main()
  File "deploy/demo_server.py", line 220, in main
    start_server()
  File "deploy/demo_server.py", line 204, in start_server
    num_test_cases=3)
  File "deploy/demo_server.py", line 143, in warm_up_test
    (finish_time - start_time, transcript))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-94: ordinal not in range(128)

于是,我将transcript注释掉,重新执行,然后可以继续了。只是

[INFO 2017-12-05 10:46:41,054 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-05 10:46:42,193 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-05 10:46:42,193 model.py:242] end initializing scorer. Start decoding ...
Response Time: 1174.020508
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')

一个文件就需要1174s,这么长的时间,请问,有办法可以提速吗?

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 5, 2017

而且,在按照新的更改,就是简化了数据并行处理的新的代码后,这个时间更是长的夸张,难道,真的是必须要很高的硬件配置吗?我的一个gpu感觉完全看不到尽头

@kuke
Copy link
Contributor

kuke commented Dec 5, 2017

@john81529 这个时间不正常,通常来讲一段语言的识别时间短于1s。你可以先跑一下run_infer_golden.sh这个脚本,看平均处理一条语言的时间是多少。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 5, 2017

嗯。感觉aishell模型的确是不如librisoeech的模型,非常期待cn1.2k模型。很期待可以实现中文的自己语音的识别。像librispeech那样的显得很完整的体系。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 5, 2017

@john81529 是这样的,你把cutoff_prob设成0.99看看?应该可以提速很多,cn1.2k的cer表现还可以,但是对真实人声的识别效果不太稳定,推测跟training数据的关系较大

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 5, 2017

嗯。中文语音数据的确是没有非常好的,至少开源的比较少。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

root@b203ab126266:/DeepSpeech# python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.15
beam_size: 500
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
speech_save_dir: demo_cache
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1205 11:37:54.343217    15 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-05 11:37:55,704 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 11:37:55,705 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 11:37:55,705 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 11:37:55,706 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 11:37:55,706 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 11:37:55,706 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-05 11:37:59,694 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-05 11:37:59,770 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-05 11:37:59,770 model.py:242] end initializing scorer. Start decoding ...
Response Time: 1542.750508
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
Response Time: 1605.342387
('Warm-up Test Case %d: %s', 2, u'asset/data/aishell/wav/test/S0908/BAC009S0908W0175.wav')
Response Time: 1449.373175
-----------------------------------------------------------
ASR Server Started.

其实还好,也不是太费时间。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

只是,在客户端出现这样的错误:
python -u deploy/demo_client.py
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
Start Recording ... Traceback (most recent call last):
File "deploy/demo_client.py", line 58, in callback
sock.connect((args.host_ip, args.host_port))
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused
我是在命令端打开的这个文件,因为docker没有pynput这个模块。同时,我的服务器和客户端是一台机器。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 6, 2017

这个看起来是硬件问题或者驱动问题?

ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map

这个话,最好确认,client连server是没问题的?

File "deploy/demo_client.py", line 58, in callback
sock.connect((args.host_ip, args.host_port))
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

前者还好说,alsa报错应该不影响使用。主要是后者,client链接server好像不成功,我现在使用了127.0.0.1来用,正在等待程序完成。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

依然还是链接不成功。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

我觉得,按照代码,宿主机无法访问docker造成的,请问,你们当时测试的时候,是什么样的情况?我的服务器是在docker中启动,而客户端是在宿主机中启动。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 6, 2017

这个应该是docker问题了,host怎么跟docker通信,你可以搜索一下看看

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

嗯。应该是这个问题,不过,真心希望paddlepaddle可以开发出更舒适的安装模式,虽然docker下运行的确很方便。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 6, 2017

@john81529 我们本地都setup过ds2的环境,都没有遇到很大的问题

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

就是说,你们不是在docker环境中运行的deepspeech,而是用源码编译或者deb安装的paddlepaddle?然而,不得不说,我从官网上下载源码和deb,从来没有成功过,上次您给我的cur链接也是下载不了,所以,我只能选择docker下运行。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 6, 2017

嗯,是的,paddlepaddle是源码编译或者pip install,deepspeech从源码编译,对平台兼容度还不错

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

不是说,pip install的版本比较低吗?而且,最新版的paddle的deb包居然没有,我还是试试源码编译吧。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 6, 2017

@john81529 pip install的版本并不低,我记得之前给过你链接,从里面下载就可以。

@yyhlvdl
Copy link
Author

yyhlvdl commented Dec 6, 2017

实不相瞒,我从官网和您给的链接里,各种版本都下载不成功,我先源码编译。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 7, 2017

@john81529 可能哪里出错了,我们这里测试:源码编译和pip install都工作的很好。

@kuke
Copy link
Contributor

kuke commented Dec 7, 2017

@john81529 你是否是用git clone https://github.com/PaddlePaddle/Paddle.git的方式获取的源码?其它方式都不能保证获取的源码是最新的。

@pkuyym
Copy link
Contributor

pkuyym commented Dec 7, 2017

@john81529 当然不是,你的错误很明显呀,就是下载mkldnn的包出错了

CMakeFiles/extern_mklml.dir/build.make:88: recipe for target 'third_party/mklml/src/extern_mklml-stamp/extern_mklml-download' failed

可以参考 PaddlePaddle/Paddle#5508

以及,不同的问题可以开新的issue哈

@pkuyym
Copy link
Contributor

pkuyym commented Dec 7, 2017

Paddle的编译问题,建议你去Paddle下面发issue哈
https://github.com/PaddlePaddle/Paddle/issues

以及,你这么长的log,最好加引用把格式抹掉,否则显示很糟糕

@yyhlvdl yyhlvdl closed this as completed Dec 7, 2017
@Pelhans
Copy link

Pelhans commented Feb 5, 2018

@yyhlvdl 请问你是怎么解决docker和主机通信的问题的? 我的报错和您一样,都是
socket.error: [Errno 111] Connection refused

@yyhlvdl
Copy link
Author

yyhlvdl commented Feb 5, 2018

@Pelhans 在命令中加这么一句话net=host,

@Pelhans
Copy link

Pelhans commented Feb 5, 2018

@yyhlvdl 谢谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants