Skip to content

Latest commit

 

History

History
80 lines (52 loc) · 2.61 KB

pipeline.rst

File metadata and controls

80 lines (52 loc) · 2.61 KB

Pipelines

Kernel Pipeline

The connect drivers of sentence segmentation, word segmentation, part-of-speech tagging, named-entity recognition, and sentence parsing.

The is the workspace of with input/output data. Note that will store the result into in-place.

The will compute all necessary dependencies. For example, if one calls get_ner with only raw-text input, the pipeline will automatically calls get_text, get_ws, get_pos.

image

from ckipnlp.pipeline import CkipPipeline, CkipDocument

pipeline = CkipPipeline()
doc = CkipDocument(raw='中文字耶,啊哈哈哈')

# Word Segmentation
pipeline.get_ws(doc)
print(doc.ws)
for line in doc.ws:
    print(line.to_text())

# Part-of-Speech Tagging
pipeline.get_pos(doc)
print(doc.pos)
for line in doc.pos:
    print(line.to_text())

# Named-Entity Recognition
pipeline.get_ner(doc)
print(doc.ner)

# Constituency Parsing
pipeline.get_conparse(doc)
print(doc.conparse)

################################################################

from ckipnlp.container.util.wspos import WsPosParagraph

# Word Segmentation & Part-of-Speech Tagging
for line in WsPosParagraph.to_text(doc.ws, doc.pos):
    print(line)

To customize the driver (e.g. disable CUDA in ), you may pass the options to the pipeline:

pipeline = CkipPipeline(opts = {'word_segmenter': {'disable_cuda': True}})

Please refer each driver's documentation for the extra options.

Co-Reference Pipeline

The is a extension of by providing coreference resolution. The pipeline first do named-entity recognition as do, followed by alignment algorithms to fix the word-segmentation and part-of-speech tagging outputs, and then do coreference resolution based sentence parsing result.

The is the workspace of with input/output data. Note that will store the result into .

image

from ckipnlp.pipeline import CkipCorefPipeline, CkipDocument

pipeline = CkipCorefPipeline()
doc = CkipDocument(raw='畢卡索他想,完蛋了')

# Co-Reference
corefdoc = pipeline(doc)
print(corefdoc.coref)
for line in corefdoc.coref:
    print(line.to_text())