Skip to content

direct-phonology/spacy-och

Repository files navigation

spacy-och

ci pypi

the Old Chinese (och) language for the spaCy NLP library.

installation

requires spacy v3.

$ pip install spacy-och

usage

this package currently doesn't include trained models and is intended for basic NLP usage only, via nlp.blank(). it tokenizes texts by character and supports the Token.like_num and Token.is_stop attributes.

>>> import spacy
>>> nlp = spacy.blank("och")
>>> from spacy_och.examples import sentences
>>> doc = nlp(sentences[0])
>>> doc.text
子曰:「上下无常非為邪也進退无恆非離群也君子進德脩業欲及時也故无咎。」
>>> [t for t in doc if t.is_stop] # all stop words
[, :, , , 。, , , 。, 、, , , , , 。]

more functionality is coming soon!

developing

after cloning the repository:

$ pip install -e ".[dev]"
$ pre-commit install

building

build a source archive and distribution for a release:

$ rm -rf dist/*
$ python -m build

publish the release on test PyPI (useful for making sure everything worked):

$ python -m twine upload --repository testpypi dist/*

if everything looks ok, upload to the real PyPI:

$ python -m twine upload dist/*

license

code is licensed under the MIT license. some lookups data is derived from files licensed under the unicode data files and software license.