Skip to content

Commit

Permalink
Update documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
emfomy committed Apr 23, 2020
1 parent 752301b commit 05c354d
Show file tree
Hide file tree
Showing 34 changed files with 2,325 additions and 78 deletions.
1 change: 0 additions & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ disable =
bad-continuation,
duplicate-code,
logging-fstring-interpolation,
missing-docstring,
too-few-public-methods,
too-many-ancestors,
too-many-branches,
Expand Down
114 changes: 112 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
Introduction
============

Official CKIP CoreNLP Toolkits

Features
--------

- Sentence Segmentation
- Word Segmentation
- Part-of-Speech Tagging
- Sentence Parsing
- Named-Entity Recognition
- Co-Reference Delectation

Git
---

Expand Down Expand Up @@ -74,6 +86,9 @@ External Links

- `Online Demo <https://ckip.iis.sinica.edu.tw/service/corenlp>`_

Installation
============

Requirements
------------

Expand All @@ -83,14 +98,109 @@ Requirements
* `CkipTagger <https://pypi.org/project/ckiptagger>`_ 0.1.1+ [Optional, Recommended]
* `CkipClassic <https://ckip-classic.readthedocs.io>`_ 1.0+ [Optional]

Tool Requirements
-----------------

================================ ========== ============ =============
Tool Built-in CkipTagger CkipClassic
================================ ========== ============ =============
Sentence Segmentation ✔
Word Segmentation† ✔ ✔
Part-of-Speech Tagging† ✔ ✔
Sentence Parsing ✔
Named-Entity Recognition ✔
Co-Reference Delectation‡ ✔ ✔ ✔
================================ ========== ============ =============

- † These tools require only one of either backends.
- ‡ Co-Reference implementation does not require any backend, but requires results from word segmentation, part-of-speech tagging, sentence parsing, and named-entity recognition.

Installation via Pip
--------------------

- No backend (not recommended): ``pip install ckipnlp``.
- With CkipTagger backend (recommended): ``pip install ckipnlp[tagger]``.
- With CkipClassic backend: ``pip install ckipnlp[classic]``.
- With both backend: ``pip install ckipnlp[tagger,classic]``.

Please refer https://ckip-classic.readthedocs.io for CkipClassic installation guide.

Usage
=====

See http://ckipnlp.readthedocs.io/en/latest/_api/ckipnlp.html for API details.

Pipeline
--------

.. image:: ../_static/image/pipeline.svg

.. code-block:: python
import ckipnlp
print(ckipnlp.__name__, ckipnlp.__version__)
################################################################
from ckipnlp.pipeline import CkipPipeline, CkipDocument
pipeline = CkipPipeline()
doc = CkipDocument(
raw='中文字喔,啊哈哈哈',
)
# Word Segmentation
pipeline.get_ws(doc)
print(doc.ws)
for line in doc.ws:
print(line.to_text())
# Part-of-Speech Tagging
pipeline.get_pos(doc)
print(doc.pos)
for line in doc.pos:
print(line.to_text())
# Sentence Parsing
pipeline.get_parsed(doc)
print(doc.parsed)
# Named-Entity Recognition
pipeline.get_ner(doc)
print(doc.ner)
################################################################
from ckipnlp.container.wspos import WsPosParagraph
# Word Segmentation & Part-of-Speech Tagging
for line in WsPosParagraph.to_text(doc.ws, doc.pos):
print(line)
Co-Reference Pipeline
---------------------

.. image:: ../_static/image/coref_pipeline.svg

.. code-block:: python
import ckipnlp
print(ckipnlp.__name__, ckipnlp.__version__)
################################################################
from ckipnlp.pipeline import CkipCorefPipeline, CkipDocument
pipeline = CkipCorefPipeline()
doc = CkipDocument(
raw='畢卡索他想,完蛋了',
)
FAQ
===
# Co-Reference
corefdoc = pipeline(doc)
print(corefdoc.coref)
for line in corefdoc.coref:
print(line.to_text())
License
=======
Expand Down
4 changes: 4 additions & 0 deletions ckipnlp/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

"""
The Official CKIP CoreNLP Toolkits.
"""

__author_name__ = 'Mu Yang'
__author_email__ = 'emfomy@gmail.com'
__copyright__ = '2018-2020 CKIP Lab'
Expand Down
20 changes: 10 additions & 10 deletions ckipnlp/container/coref.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ class CorefToken(_BaseTuple, _CorefToken):
.. code-block:: python
{
'word': '畢卡索', # token word
'word': '畢卡索', # token word
'coref': (0, 'source'), # coref ID and type
'idx': 2, # node index
}
Expand Down Expand Up @@ -106,9 +106,9 @@ class CorefSentence(_BaseSentence):
.. code-block:: python
[
{ word: '畢卡索', coref: (0, 'source'), idx: 2, }, # coref-token 1
{ word: '他', coref: (0, 'target'), idx: 3, }, # coref-token 2
{ word: '想', coref: None, idx: 4, }, # coref-token 3
{ 'word': '畢卡索', 'coref': (0, 'source'), 'idx': 2, }, # coref-token 1
{ 'word': '他', 'coref': (0, 'target'), 'idx': 3, }, # coref-token 2
{ 'word': '想', 'coref': None, 'idx': 4, }, # coref-token 3
]
List format
Expand Down Expand Up @@ -154,14 +154,14 @@ class CorefParagraph(_BaseList):
[
[ # Sentence 1
{ word: '畢卡索', coref: (0, 'source'), idx: 2, },
{ word: '他', coref: (0, 'target'), idx: 3, },
{ word: '想', coref: None, idx: 4, },
{ 'word': '畢卡索', 'coref': (0, 'source'), 'idx': 2, },
{ 'word': '他', 'coref': (0, 'target'), 'idx': 3, },
{ 'word': '想', 'coref': None, 'idx': 4, },
],
[ # Sentence 2
{ word: None, coref: (0, 'zero'), None, },
{ word: '完蛋', coref: None, idx: 1, },
{ word: '了', coref: None, idx: 2, },
{ 'word': None, 'coref': (0, 'zero'), None, },
{ 'word': '完蛋', 'coref': None, 'idx': 1, },
{ 'word': '了', 'coref': None, 'idx': 2, },
],
]
Expand Down
6 changes: 3 additions & 3 deletions ckipnlp/container/ner.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class _NerToken(_NamedTuple):
idx: _Tuple[int, int]

class NerToken(_BaseTuple, _NerToken):
"""A NER token.
"""A named-entity recognition token.
Attributes
----------
Expand Down Expand Up @@ -101,7 +101,7 @@ def to_tagger(self):
################################################################################################################################

class NerSentence(_BaseSentence):
"""A list of NER sentence.
"""A named-entity recognition sentence.
.. admonition:: Data Structure Examples
Expand Down Expand Up @@ -158,7 +158,7 @@ def to_tagger(self):
################################################################################################################################

class NerParagraph(_BaseList):
"""A list of NER sentence.
"""A list of named-entity recognition sentence.
.. admonition:: Data Structure Examples
Expand Down
10 changes: 10 additions & 0 deletions ckipnlp/container/tree/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

"""
This module implements specialized tree containers for CKIPNLP.
"""

__author__ = 'Mu Yang <http://muyang.pro>'
__copyright__ = '2018-2020 CKIP Lab'
__license__ = 'CC BY-NC-SA 4.0'
4 changes: 2 additions & 2 deletions ckipnlp/container/tree/parsed.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding:utf-8 -*-

"""
This module provides containers for parsed trees.
This module provides tree containers for sentence parsing.
"""

__author__ = 'Mu Yang <http://muyang.pro>'
Expand Down Expand Up @@ -220,7 +220,7 @@ def __repr__(self):
)

@property
def head_first(self):
def head_first(self): # pylint: disable=missing-docstring
return self.head.identifier <= self.tail.identifier

def to_dict(self):
Expand Down
8 changes: 4 additions & 4 deletions ckipnlp/container/wspos.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ def from_text(cls, data):
Parameters
----------
data : str
text such as ``'中文字(Na)\u3000喔(T)'``.
text such as ``'中文字(Na)\\u3000喔(T)'``.
Returns
-------
Expand All @@ -169,7 +169,7 @@ def to_text(word, pos):
Returns
-------
str
text such as ``'中文字(Na)\u3000喔(T)'``.
text such as ``'中文字(Na)\\u3000喔(T)'``.
"""
return _sentence_to_text((word, pos,))

Expand All @@ -191,7 +191,7 @@ def from_text(cls, data):
Parameters
----------
data : Sequence[str]
list of sentences such as ``'中文字(Na)\u3000喔(T)'``.
list of sentences such as ``'中文字(Na)\\u3000喔(T)'``.
Returns
-------
Expand All @@ -216,6 +216,6 @@ def to_text(word, pos):
Returns
-------
List[str]
list of sentences such as ``'中文字(Na)\u3000喔(T)'``.
list of sentences such as ``'中文字(Na)\\u3000喔(T)'``.
"""
return list(_paragraph_to_text((word, pos,)))
6 changes: 0 additions & 6 deletions ckipnlp/data/__init__.py

This file was deleted.

2 changes: 2 additions & 0 deletions ckipnlp/data/coref/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

# pylint: disable=missing-docstring

__author__ = 'Mu Yang <http://muyang.pro>'
__copyright__ = '2018-2020 CKIP Lab'
__license__ = 'CC BY-NC-SA 4.0'
Expand Down
2 changes: 1 addition & 1 deletion ckipnlp/data/coref/_human_words.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# pylint: disable=too-many-lines
# pylint: disable=missing-docstring, too-many-lines

HUMAN_WORDS = {
'一代紅顏',
Expand Down
2 changes: 2 additions & 0 deletions ckipnlp/data/coref/_pronoun_words.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# pylint: disable=missing-docstring

PRONOUN_1ST_SINGLE_WORDS = { # speaker|說話者 \ 我們|we
'余',
'吾',
Expand Down
2 changes: 2 additions & 0 deletions ckipnlp/data/coref/_self_words.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# pylint: disable=missing-docstring

SELF_WORDS = {
'一己',
'小我',
Expand Down
2 changes: 2 additions & 0 deletions ckipnlp/data/parsed.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

# pylint: disable=missing-docstring

__author__ = 'Mu Yang <http://muyang.pro>'
__copyright__ = '2018-2020 CKIP Lab'
__license__ = 'CC BY-NC-SA 4.0'
Expand Down
2 changes: 1 addition & 1 deletion ckipnlp/driver/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding:utf-8 -*-

"""
This module implements specialized drivers for CKIPNLP.
This module implements drivers for CKIPNLP.
"""

__author__ = 'Mu Yang <http://muyang.pro>'
Expand Down

0 comments on commit 05c354d

Please sign in to comment.