Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多进程调用LTP失败 #42

Closed
hitakaken opened this issue May 17, 2016 · 7 comments
Closed

多进程调用LTP失败 #42

hitakaken opened this issue May 17, 2016 · 7 comments

Comments

@hitakaken
Copy link

我想用Python采用多进程池调用LTP

from multiprocessing import Pool

if __name__ == '__main__':
    p = Pool(int(arguments[u'--count']))
    for page in range(start, end):
        p.apply_async(segement_task, args=(arguments[u'LTP_DATA_MODEL'], page, ))
    print u'Waiting for all subprocesses done...'
    p.close()
    p.join

在子进程中采用如下代码加载

def segement_task(model_path, page):
    segmentor = Segmentor()
    segmentor.load(os.path.join(model_path, "cws.model"))
    print u'load segmentor'
    postagger = Postagger()
    postagger.load(os.path.join(model_path, "pos.model"))
    print u'load postagger'
    parser = Parser()
    parser.load(os.path.join(model_path, "parser.model"))
    print u'load parser'
    recognizer = NamedEntityRecognizer()
    recognizer.load(os.path.join(model_path, "ner.model"))
    print u'load recognizer'
    labeller = SementicRoleLabeller()
    labeller.load(os.path.join(model_path, "srl/"))
    print u'load labeller'

界面只打印出 load segmentor 就退出了

请问应该如何多进程调用

@Oneplus
Copy link
Member

Oneplus commented May 17, 2016

各个模块应该用单件,不需要重复load。

On Tue, May 17, 2016 at 1:23 PM Cao ke notifications@github.com wrote:

我想用Python采用多进程池调用LTP

from multiprocessing import Pool
if name == 'main':
p = Pool(int(arguments[u'--count']))
for page in range(start, end):
p.apply_async(segement_task, args=(arguments[u'LTP_DATA_MODEL'], page, ))
print u'Waiting for all subprocesses done...'
p.close()
p.join

在子进程中采用如下代码加载

def segement_task(model_path, page):
segmentor = Segmentor()
segmentor.load(os.path.join(model_path, "cws.model"))
print u'load segmentor'
postagger = Postagger()
postagger.load(os.path.join(model_path, "pos.model"))
print u'load postagger'
parser = Parser()
parser.load(os.path.join(model_path, "parser.model"))
print u'load parser'
recognizer = NamedEntityRecognizer()
recognizer.load(os.path.join(model_path, "ner.model"))
print u'load recognizer'
labeller = SementicRoleLabeller()
labeller.load(os.path.join(model_path, "srl/"))
print u'load labeller'

界面只打印出 load segmentor 就退出了

请问应该如何多进程调用


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#42

@hitakaken
Copy link
Author

如果选择在全局加载LTP模块后,多进程开启子进程后是直接跳出的

Waiting for all subprocesses done...
Finish!

@Oneplus
Copy link
Member

Oneplus commented May 17, 2016

import os
from multiprocessing import Pool
from pyltp import Segmentor, Postagger, Parser, NamedEntityRecognizer, SementicRoleLabeller

model_path = '/data/ltp/ltp-models/ltp_data/'

segmentor = Segmentor()
segmentor.load(os.path.join(model_path, "cws.model"))
print u'loaded segmentor'

postagger = Postagger()
postagger.load(os.path.join(model_path, "pos.model"))
print u'loaded postagger'

parser = Parser()
parser.load(os.path.join(model_path, "parser.model"))
print u'loaded parser'

recognizer = NamedEntityRecognizer()
recognizer.load(os.path.join(model_path, "ner.model"))
print u'loaded recognizer'

labeller = SementicRoleLabeller()
labeller.load(os.path.join(model_path, "srl/"))
print u'loaded labeller'

def task(page):
    print 'input: ', page
    result = segmentor.segment(page)
    print 'seg for %s: %s' % (page, '|'.join([str(_) for _ in result]))

    result1 = postagger.postag(result)
    print 'pos for %s: %s' % (page, '|'.join([str(_) for _ in result1]))


if __name__ == '__main__':
    pages = ['测试句子一。',
            '测试句子二。',
            '测试句子三。',
            '测试句子四。'
            ]
    p = Pool(3)

    for page in pages:
        p.apply_async(task, args=(page, ))
    print u'Waiting for all subprocesses done...'
    p.close()
    p.join()

我写了个测试,这是我的结果

[yjliu@gpu03 coling2016]$ python test.py     
loaded segmentor
loaded postagger
loaded parser
loaded recognizer
loaded labeller
Waiting for all subprocesses done...
input:  测试句子一。
input:  测试句子二。
input:  测试句子三。
seg for 测试句子三。: 测试|句子|三|。
seg for 测试句子二。: 测试|句子|二|。
seg for 测试句子一。: 测试|句子一|。
pos for 测试句子一。: v|n|wp
pos for 测试句子三。: v|n|m|wp
pos for 测试句子二。: v|n|m|wp
input:  测试句子四。
seg for 测试句子四。: 测试|句子|四|。
pos for 测试句子四。: v|n|m|wp

供参考

@hitakaken
Copy link
Author

原来把部件作为参数传递了,所以出错了,这样Python多进程调用就没问题了,但是这样LTP本身还是单进程进行分析是吗?

@Oneplus
Copy link
Member

Oneplus commented May 17, 2016

ltp本身的多线程机制是多线程共享读相同模型指针。不太好分析在multiprocessing条件下ltp的行为是什么样的。

@hitakaken
Copy link
Author

hitakaken commented May 17, 2016

受教了,应该还是单进程模式,我这任务管理器中cpu只有1个核占用比较高

@xjtushilei
Copy link

ltp的共享读一个模型,多线程速度受影响吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants