SpacySentencizer takes a DocumentArray, and for each Document:
- Checks if it has
.text
attribute - If so, sentencize it
- Store each sentence as a chunk of the Document
- In English a
.
(full stop/period) comes at the end of every sentence. - However,
.
is also used in:- URLs:
docs.jina.ai
- Decimals:
3.14
- Initials:
J.R.R Tolkien
,H. Sapiens
- Abbreviations:
Turn to p. 13
- URLs:
- This means that Vanilla Sentencizer tries to split things that aren't sentences
SpacySentencizer should also work for other languages, though I haven't yet tested that
from jina import Flow
f = Flow().add(uses='jinahub+docker://SpacySentencizer')
from jina import Flow
f = Flow().add(uses='jinahub://SpacySentencizer')
- To override
__init__
args & kwargs, use.add(..., uses_with: {'key': 'value'})
- To override class metas, use
.add(..., uses_metas: {'key': 'value})