Skip to content

Commit

Permalink
Update documentations.
Browse files Browse the repository at this point in the history
  • Loading branch information
emfomy committed Apr 27, 2020
1 parent 062ce79 commit a563ecb
Show file tree
Hide file tree
Showing 12 changed files with 227 additions and 272 deletions.
251 changes: 18 additions & 233 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
Introduction
============

Official CKIP CoreNLP Toolkits
------------------------------

Features
--------
^^^^^^^^

- Sentence Segmentation
- Word Segmentation
Expand All @@ -14,7 +12,7 @@ Features
- Co-Reference Delectation

Git
---
^^^

https://github.com/ckiplab/ckipnlp

Expand All @@ -41,7 +39,7 @@ https://github.com/ckiplab/ckipnlp
.. |GitHub Watchers| image:: https://img.shields.io/github/watchers/ckiplab/ckipnlp.svg?style=social&label=Watch&maxAge=3600

PyPI
----
^^^^

https://pypi.org/project/ckipnlp

Expand All @@ -65,7 +63,7 @@ https://pypi.org/project/ckipnlp
.. |PyPI Status| image:: https://img.shields.io/pypi/status/ckipnlp.svg?maxAge=3600

Documentation
-------------
^^^^^^^^^^^^^

https://ckipnlp.readthedocs.io/

Expand All @@ -75,34 +73,34 @@ https://ckipnlp.readthedocs.io/
:target: http://ckipnlp.readthedocs.io

Contributers
------------
^^^^^^^^^^^^

* `Mu Yang <http://muyang.pro>`_ at `CKIP <https://ckip.iis.sinica.edu.tw>`_ (Author & Maintainer)
* `Wei-Yun Ma <https://www.iis.sinica.edu.tw/pages/ma/>`_ at `CKIP <https://ckip.iis.sinica.edu.tw>`_ (Maintainer)
* `DouglasWu <dgrey1116@gmail.com>`_

External Links
--------------
^^^^^^^^^^^^^^

- `Online Demo <https://ckip.iis.sinica.edu.tw/service/corenlp>`_

Installation
============
------------

Requirements
------------
^^^^^^^^^^^^

* `Python <http://www.python.org>`_ 3.6+
* `TreeLib <https://treelib.readthedocs.io>`_ 1.5+

* `CkipTagger <https://pypi.org/project/ckiptagger>`_ 0.1.1+ [Optional, Recommended]
* `CkipClassic <https://ckip-classic.readthedocs.io>`_ 1.0+ [Optional]

Tool Requirements
-----------------
Driver Requirements
^^^^^^^^^^^^^^^^^^^

================================ ======== ========== ===========
Tool Built-in CkipTagger CkipClassic
Driver Built-in CkipTagger CkipClassic
================================ ======== ========== ===========
Sentence Segmentation ✔
Word Segmentation† ✔ ✔
Expand All @@ -112,241 +110,28 @@ Named-Entity Recognition ✔
Co-Reference Delectation‡ ✔ ✔ ✔
================================ ======== ========== ===========

- † These tools require only one of either backends.
- † These drivers require only one of either backends.
- ‡ Co-Reference implementation does not require any backend, but requires results from word segmentation, part-of-speech tagging, sentence parsing, and named-entity recognition.

Installation via Pip
--------------------
^^^^^^^^^^^^^^^^^^^^

- No backend (not recommended): ``pip install ckipnlp``.
- With CkipTagger backend (recommended): ``pip install ckipnlp[tagger]``
- With CkipClassic backend: Please refer https://ckip-classic.readthedocs.io/en/latest/src/readme.html#installation for CkipClassic installation guide.

Usage
=====

See https://ckipnlp.readthedocs.io/en/latest/_api/ckipnlp.html for API details.

Pipelines
---------

Core Pipeline
^^^^^^^^^^^^^

.. image:: _static/image/pipeline.svg

.. code-block:: python
from ckipnlp.pipeline import CkipPipeline, CkipDocument
pipeline = CkipPipeline()
doc = CkipDocument(raw='中文字喔,啊哈哈哈')
# Word Segmentation
pipeline.get_ws(doc)
print(doc.ws)
for line in doc.ws:
print(line.to_text())
# Part-of-Speech Tagging
pipeline.get_pos(doc)
print(doc.pos)
for line in doc.pos:
print(line.to_text())
# Named-Entity Recognition
pipeline.get_ner(doc)
print(doc.ner)
# Sentence Parsing
pipeline.get_parsed(doc)
print(doc.parsed)
################################################################
from ckipnlp.container.util.wspos import WsPosParagraph
# Word Segmentation & Part-of-Speech Tagging
for line in WsPosParagraph.to_text(doc.ws, doc.pos):
print(line)
Co-Reference Pipeline
^^^^^^^^^^^^^^^^^^^^^

.. image:: _static/image/coref_pipeline.svg

.. code-block:: python
from ckipnlp.pipeline import CkipCorefPipeline, CkipDocument
pipeline = CkipCorefPipeline()
doc = CkipDocument(raw='畢卡索他想,完蛋了')
# Co-Reference
corefdoc = pipeline(doc)
print(corefdoc.coref)
for line in corefdoc.coref:
print(line.to_text())
Containers
----------

The container objects provides following methods:

- |from_text|, |to_text| for plain-text format conversions;
- |from_dict|, |to_dict| for dictionary-like format conversions;
- |from_list|, |to_list| for list-like format conversions;
- |from_json|, |to_json| for JSON format conversions (based-on dictionary-like format conversions).

The following are the interfaces, where ``CONTAINER_CLASS`` refers to the container class.

.. code-block:: python
obj = CONTAINER_CLASS.from_text(plain_text)
plain_text = obj.to_text()
-----

obj = CONTAINER_CLASS.from_dict({ key: value })
dict_obj = obj.to_dict()
obj = CONTAINER_CLASS.from_list([ value1, value2 ])
list_obj = obj.to_list()
obj = CONTAINER_CLASS.from_json(json_str)
json_str = obj.to_json()
Note that not all container provide all above methods. Here is the table of implemented methods. Please refer the documentation of each container for detail formats.

======================== ======================== ============ ========================
Container Item from/to text from/to dict, list, json
======================== ======================== ============ ========================
|TextParagraph| |str| ✔ ✔
|SegSentence| |str| ✔ ✔
|SegParagraph| |SegSentence| ✔ ✔
|NerToken| ✘ ✔
|NerSentence| |NerToken| ✔
|NerParagraph| |NerSentence| ✔
|ParsedParagraph| |str| ✔ ✔
|CorefToken| ✘ only to ✔
|CorefSentence| |CorefToken| only to ✔
|CorefParagraph| |CorefSentence| only to ✔
======================== ======================== ============ ========================

WS with POS
^^^^^^^^^^^

There are also conversion routines for word-segmentation and POS containers jointly. For example, |WsPosToken| provides routines for a word (|str|) with POS-tag (|str|):

.. code-block:: python
ws_obj, pos_obj = WsPosToken.from_text('中文字(Na)')
plain_text = WsPosToken.to_text(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_dict({ 'word': '中文字', 'pos': 'Na', })
dict_obj = WsPosToken.to_dict(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_list([ '中文字', 'Na' ])
list_obj = WsPosToken.to_list(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_json(json_str)
json_str = WsPosToken.to_json(ws_obj, pos_obj)
Similarly, |WsPosSentence|/|WsPosParagraph| provides routines for word-segmented and POS sentence/paragraph (|SegSentence|/|SegParagraph|) respectively.

Parsed Tree
^^^^^^^^^^^

In addition to |ParsedParagraph|, we have implemented tree utilities base on `TreeLib <https://treelib.readthedocs.io>`_.

|ParsedTree| is the tree structure of a parsed sentence. One may use |from_text| and |to_text| for plain-text conversion; |from_dict|, |to_dict| for dictionary-like object conversion; and also |from_json|, |to_json| for JSON string conversion.

The |ParsedTree| is a `TreeLib <https://treelib.readthedocs.io>`_ tree with |ParsedNode| as its nodes. The data of these nodes is stored in a |ParsedNodeData| (accessed by ``node.data``), which is a tuple of ``role`` (semantic role), ``pos`` (part-of-speech tagging), ``word``.

|ParsedTree| provides useful methods: |get_heads| finds the head words of the sentence; |get_relations| extracts all relations in the sentence; |get_subjects| returns the subjects of the sentence.

.. code-block:: python
from ckipnlp.container import ParsedTree
# 我的早餐、午餐和晚餐都在那場比賽中被吃掉了
tree_text = 'S(goal:NP(possessor:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nab(DUMMY1:Nab(DUMMY1:Nab:早餐|Head:Caa:、|DUMMY2:Naa:午餐)|Head:Caa:和|DUMMY2:Nab:晚餐))|quantity:Dab:都|condition:PP(Head:P21:在|DUMMY:GP(DUMMY:NP(Head:Nac:比賽)|Head:Ng:中))|agent:PP(Head:P02:被)|Head:VC31:吃掉|aspect:Di:了)'
tree = ParsedTree.from_text(tree_text, normalize=False)
print('Show Tree')
tree.show()
print('Get Heads of {}'.format(tree[5]))
print('-- Semantic --')
for head in tree.get_heads(5, semantic=True): print(repr(head))
print('-- Syntactic --')
for head in tree.get_heads(5, semantic=False): print(repr(head))
print()
print('Get Relations of {}'.format(tree[0]))
print('-- Semantic --')
for rel in tree.get_relations(0, semantic=True): print(repr(rel))
print('-- Syntactic --')
for rel in tree.get_relations(0, semantic=False): print(repr(rel))
print()
# 我和食物真的都很不開心
tree_text = 'S(theme:NP(DUMMY1:NP(Head:Nhaa:我)|Head:Caa:和|DUMMY2:NP(Head:Naa:食物))|evaluation:Dbb:真的|quantity:Dab:都|degree:Dfa:很|negation:Dc:不|Head:VH21:開心)'
tree = ParsedTree.from_text(tree_text, normalize=False)
print('Show Tree')
tree.show()
print('Get get_subjects of {}'.format(tree[0]))
print('-- Semantic --')
for subject in tree.get_subjects(0, semantic=True): print(repr(subject))
print('-- Syntactic --')
for subject in tree.get_subjects(0, semantic=False): print(repr(subject))
print()
- See https://ckipnlp.readthedocs.io/en/latest/main/usage.html for Usage.
- See https://ckipnlp.readthedocs.io/en/latest/_api/ckipnlp.html for API details.

License
=======
-------

|CC BY-NC-SA 4.0|

Copyright (c) 2018-2020 `CKIP Lab <https://ckip.iis.sinica.edu.tw>`_ under the `CC BY-NC-SA 4.0 License <http://creativecommons.org/licenses/by-nc-sa/4.0/>`_.

.. |CC BY-NC-SA 4.0| image:: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png
:target: http://creativecommons.org/licenses/by-nc-sa/4.0/



.. |from_text| replace:: ``from_text()``
.. |to_text| replace:: ``to_text()``
.. |from_dict| replace:: ``from_dict()``
.. |to_dict| replace:: ``to_dict()``
.. |from_list| replace:: ``from_list()``
.. |to_list| replace:: ``to_list()``
.. |from_json| replace:: ``from_json()``
.. |to_json| replace:: ``to_json()``

.. |get_heads| replace:: ``get_heads()``
.. |get_relations| replace:: ``get_relations()``
.. |get_subjects| replace:: ``get_subjects()``

.. |str| replace:: ``str``

.. |TextParagraph| replace:: ``TextParagraph``
.. |SegSentence| replace:: ``SegSentence``
.. |SegParagraph| replace:: ``SegParagraph``
.. |NerToken| replace:: ``NerToken``
.. |NerSentence| replace:: ``NerSentence``
.. |NerParagraph| replace:: ``NerParagraph``
.. |ParsedParagraph| replace:: ``ParsedParagraph``
.. |CorefToken| replace:: ``CorefToken``
.. |CorefSentence| replace:: ``CorefSentence``
.. |CorefParagraph| replace:: ``CorefParagraph``

.. |WsPosToken| replace:: ``WsPosToken``
.. |WsPosSentence| replace:: ``WsPosSentence``
.. |WsPosParagraph| replace:: ``WsPosParagraph``

.. |ParsedNodeData| replace:: ``ParsedNodeData``
.. |ParsedNode| replace:: ``ParsedNode``
.. |ParsedRelation| replace:: ``ParsedRelation``
.. |ParsedTree| replace:: ``ParsedTree``
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['./_static', '../_static']
html_static_path = ['./_static']
html_extra_path = ['../LICENSE']
html_css_files = ['./custom.css']

Expand Down
File renamed without changes
File renamed without changes
5 changes: 3 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ CKIP CoreNLP
.. toctree::
:caption: Overview

readme
main/readme
main/usage
main/tag

.. toctree::
:caption: Contents
Expand All @@ -14,6 +16,5 @@ CKIP CoreNLP
.. toctree::
:caption: Appendix

src/tag
genindex
py-modindex
4 changes: 4 additions & 0 deletions docs/main/readme.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Introduction
============

.. include:: ../../README.rst
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit a563ecb

Please sign in to comment.